sed — A Powerful Stream Editor in Xnix

sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed’s ability to filter text in a pipeline which particularly distinguishes it from other types of editors.

Preface

I started to use sed after more than 10 years when I start to work. — Maybe I can use that as this article title. :)

To be honest, sed is one too advanced and magic command for me when I just start to work. After more than 10 years, I became quite comfortable to use xNix and most of the commands there. Then, I started using sed and felt its power — there are quite a few famous book for it, you know, it’s just one command in my eye.

sed is available by default in most distributions of xNix, as it’s one of GNU packages.

Flexibility often comes with complexity which is one of reason for me to hesitate to learn sed — I feel a bit fear. Maybe most of people have the same feeling with me.

OK, let’s take one example to start our journey of sed . The log file /var/log/dmesg on my Linux server has 1440 lines in total. I want to filter all lines containing cpu . There are many ways to do it. First one in my mind is grep which I heavily used in daily work. The command grep cpu /var/log/dmesg can handle the task very well. Then, if I want to delete all lines containing cpu (I have no reason to do that), what shall I do? Of course, grep cannot do that as it’s a read command. So, for read operations, there are many duplicated features between grep and sed but for write operations, sed can play its powerful role. To delete all lines in one file, use delete command of sed — yes, sed has lots of sub-commands. Go back to my requirement, we can use the command sed -i 'cpu d' /var/log/dmesg .

Before deletion operation, one good habit is to show what will be deleted — so, if you are going to delete whole root directory, you can stop the stupid plan after checking. So, let’s see what lines include the word cpu :

> grep cpu /var/log/dmesg
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 38 pages/cpu @ffff8ad67fc00000 s118784 r8192 d28672 u262144
[ 0.000000] pcpu-alloc: s118784 r8192 d28672 u262144 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1 2 3 4 5 6 7
[ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=8.
[ 0.256325] core: CPUID marked event: ‘cpu cycles’ unavailable
[ 0.257679] NMI watchdog: disabled (cpu0): hardware events not enabled
[ 0.257680] NMI watchdog: Shutting down hard lockup detector on all cpus
[ 0.940972] cpuidle: using governor menu
[ 2.873729] cryptd: max_cpu_qlen set to 1000

Now, if I want to delete the line only contains Initializing and cpu, and cpu should not be included in other words, then what shall I do?

So, what I meant is to delete the 2nd line only in above output. But if there are more lines matching my target, then, maybe I don’t know which lines. So, we need to use regular expression. I would say regular expression is one of most important invention in coding world. It’s used everywhere and helped people save lots of time. Maybe I can write another article for regular expression.

OK, the command we need could be like:

sed -i '/Initializin.*cpu *$/ d' /var/log/dmesg

Or

sed -i '/Initializin.*cpu\s*$/ d' /var/log/dmesg

And to be more accurate, we should use below one to make sure there is at least one space character before cpu:

sed -i ‘/Initializin.*\s\+cpu\s*$/ d’ /var/log/dmesg

All above 3 expressions will delete the 2nd line which ends with cgroup subsys cpu and all of them used basic regular expression:

  • Dot (.) means any character
  • Star(*) means zero or more times of the preceding regular expressions or ordinary character.
  • \+ means one or more times of the preceding regular expressions or ordinary character. It plays very similar role with star but require at least one occurrence of the preceding regular expressions or ordinary character.
  • \s means space character(space or tab).
  • $ means end of the line, and ^ plays the similar role but for beginning of the line.

To use extended regular expression, need to add option -E or -r. The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, braces (‘{}’), and ‘|’. While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character. ‘|’ is special here because ‘\|’ is a GNU extension — standard basic regular expressions do not provide its functionality. So, by using extended regular expression, the command above will become:

sed -i -E ‘/Initializin.*\s+cpu\s*$/ d’ /var/log/dmesg

Yes, only changed \+ to +

To use basic or extended regular expressions, I personally think it depends on personal habit. I prefer to use extended one as I used it for long time and it works in the same way as what I used before in other places or commands.

Anyway, regular expression plays important role in sed command for addressing purpose. Addressing is important because it’s the prerequisite of sed comman to perform its actions of reading and writing (insert, append, replace and delete).

Many different ways to Address lines

  • Line number, last line ($) — Print line 10 of /var/log/dmesg we can use command sed -n '10p' /var/log/dmesg . Note: here, -n is necessary to disable default sed behaviro — print pattern space after command execution. command in this example is p which is to print the line satisfied the address condition — the 10th line. pattern space is the space to put every line reading from input stream by sedsed read one line from input and check if it satisfy the address condition, if true, perform the command otherwise, print it to output stream. Then, it reads next line and cycle like this to end of the input stream. In one word, if -n is not given, sed will print all content of the input steam and for those lines satisfying the address condition, it will print twice. Print last line, use command sed -n '$p' /var/log/dmesg . The last line addressing special character is worthy to know and it has quite a few use scenarios, for example, to delete all content from 10th line to end of the file, you can use command sed -i '10,$d' filepath .
  • Range (3,6)and negates range (!) — Print lines from 3th to 6th lines, use command sed -n '3,6p' /var/log/dmesg and print other lines, use command sed -n '3,6!p' /var/log/dmesg . Note: in csh or tcsh , need to add \ before ! to escape it as ! is reserved special character for csh or tcsh . !p means the last command starting with p in history.
  • Start and step (1~3) — Print first line and every 3rd line after it.
  • Start and offset (2,+5) — which means from start line to N lines after it. N is the number after + — like 5 here.
  • Start and end with multiple of N (2, ~4) — which means from start line to the line whose line number is multiple of N. N is the number after ~ — like 4 here. To print line from 5 to 8, can use command sed -n '5,~4p' /var/log/dmesg .
  • Start and regular expression for end — this is more flexible and powerful, and also a bit hard to master. Like 3,/[5-9]/ or 3,/[5-9]+/
  • Regular expression (/regex/, \%regex%, /regex/I, \%regex%I, /regex/M, \%regex%M) — I: case-insensitive, M — multiple lines.

When we mastered the skills to address lines, then next thing is to know what commands we can perform by sed .

Sed Commands List

If you call sed as command, then sed commands will become quite confused. Thus, please treat sed as one program and this program has multiple commands can be executed. The syntax for sed commands are [address exrpession]C[options]

We complete [address] part and now we should get to know C part:

  • p: print the pattern space.
  • P : upper-case p — print the pattern space, up to the first newline.
  • d: delete
  • D: delete for pattern space containing newlines and do the same with new pattern space, no reading for new input. A bit complex. If no newlines in pattern space, same with d
  • s: substitute words which is frequently used in daily work. For example, if you want to rename variables in multiple script files, you can run command find . -name '*.sh' -exec sed -i 's/old_var/new_var/g' '{}' \; . I use this command quite often. Trust me, it must be your favorite command once you start to use it once. s command is one of frequently used sed commands, so take time to become familiar with it. (Link to s command)
  • q: exit. After it, can add exit code. For example, q2 means exit with exit code 2 .
  • a: append. Usage like a\text to append
  • i: insert. Usage like i\text to insert
  • c: replace lines. Usage like c\text to be used as replacement
  • e: execute command found in pattern space. For example, echo 'date' | sed '1e' will do the same thing with the command date . This is useful when commands are saved in file. Thus, here the command could be any valid shell command.
  • e [command]: execute specified command. Here, the command could be any valid shell command.
  • l : Print the pattern space in an unambiguous form.

The above commands are the ones I used in my daily work. There are a few more and you can check in sed documentation here.

If we mastered both address and sed commands, then we will feel comfortable use sed. Final thing is to know more about the options of sed. I don’t know, maybe the first thing is to know options.

Sed Options

The most frequently used options is -i and -n to edit file and to disable printing pattern space lines respectively.

For example, if you want to delete 2nd line in one file, then you cannot really delete the line in that file by command: sed '2d' filename and instead, sed just print out the content of the file without 2nd line. If you want to delete the 2nd line in the file, you must add the option -i . In background, sed will create one temporary file to save changed file content, and finally rename it to the input file.

-i option could have suffix to create a backup file named as old_filename[suffix] . For example, sed -i.bak '2d' /var/log/dmesg will create a new file /var/log/dmesg.bak to keep the old content, and /var/log/dmesg will have new content — 2nd line has been deleted.

Other options:

-e — to run command or script in command line. Put the command string after it. Don’t put other option after it (I made this mistake for many times). So, the command cat /var/log/dmesg | sed -e -n '2p' will fail with error message “sed: -e expression #1, char 1: unknown command: `-’”. -e is heavily used to handle the output from other command — normally, it’s used after pipe char.

-E — use extended regular expression. We already used it above in addressing part.

-f — to run script file. I never use it till now.

-l — to wrap line with length, by default 70 when use l command. Use 0 to stop wrap. For example, print first 5lines of /var/log/dmesg by 30 characters width:

> sed -l 30 -n '1,+4l' /var/log/dmesg
[ 0.000000] Initializing c\
group subsys cpuset$
[ 0.000000] Initializing c\
group subsys cpu$
[ 0.000000] Initializing c\
group subsys cpuacct$
[ 0.000000] Linux version \
3.10.0-1062.4.1.el7.x86_64 (m\
ockbuild@x86-vm-27.build.eng.\
bos.redhat.com) (gcc version \
4.8.5 20150623 (Red Hat 4.8.5\
-39) (GCC) ) #1 SMP Wed Sep 2\
5 09:42:57 EDT 2019$
[ 0.000000] Command line: \
BOOT_IMAGE=/vmlinuz-3.10.0-10\
62.4.1.el7.x86_64 root=/dev/m\
apper/rootvg-root ro crashker\
nel=auto spectre_v2=retpoline\
rd.lvm.lv=rootvg/root biosde\
vname=0 net.ifnames=0 rhgb qu\
iet rd.driver.pre=ata_piix,mp\
tspi$

I haven’t got any scenario to use the command l in fact. Please share to me if you faced good use cases in reality.

Other few options, I never use them for now. Maybe you can explore if you are interested in.

Examples in Reality

Here are some examples in reality, probably you can get inspiration from them.

  • Remove non-printable characters in file : sed -i -E 's/[^[:print:]]//g' $console_file
  • Insert header before first line of CSV file: sed -i "1 i\$header" $csv_file_path
  • Replace 2nd space character with ‘=’: sed -i 's/ /=/2;P;D .cshrc.alias' This example is a bit complex and used in my script to convert .cshrc.alias to .bash.alias .You probably know difference between csh and bash alias syntax — csh alias: alias ga "git add ." and bash alias : alias ga="git add ." . Thus, in sed script sed -i 's/ /=/2;P;D' , 2 is to tell the target space character to replace is the 2nd one in current pattern space, P is to print the pattern space up to the first newline, and D is to delete the pattern space up to the first newline. In this case, there is no newline at all for pattern space, so, here P is the same as p and D is the same as d . Furthermore, in fact, it’s not necessary to use P and D here at all. The reason is: sed will perform its script on current pattern space and then print it out then delete by default, then read newline to pattern space. Try it by yourself to verify. :)
  • Delete all content in file: sed -i d $file_path As you know, it’s no need to use sed at all. :)
  • Delete all empty lines in file: sed -i '/^$/d $file_path'
  • Delete all lines starting with # , before # there is 0 or more spaces: sed -i '/ *#/d' $file_path
  • Replace all substring A in all lines which contains substring B with substring C: sed -i '/B/s/A/C/g' $file_path — I believe this is one of most non-understandable sed script. /B/ is address part which is to address all lines matching /B/ or contains B , s/A/C/g is the s command which is to replace all occurrences of A with C . g is to tell all occurrences should be replaced.
  • Capitalization: echo $msg | sed 's/\([a-z]\)\(.*\)/\u\1\2/g'

You can also find other examples from sed reference manual page.

In reality, sed command is often used with grep or find command together to make it more powerful.

See below examples:

  • Remove the comment line TODO Auto-generated method stub in all .java files. Java developers know more about it.
find $PWD -type f -name '*.java' -exec sed -i '/TODO Auto-generated method stub/d' '{}' \;
  • Print all lines which contain delete and remove preceding spaces in each line
grep delete $log_file | sed -e 's/^ \+//g'

Don’t forget ^ which is to indicate start of the line. If it’s missed, then all spaces will be removed, which might not be expected. Note: the above command can be handled by one single sed command. Try to figure it out. 😉

That’s all for now to talk about sed. Hope it can help you dare to start using sed. Once you understand the overall structure of sed command, and you get used to the each part of the sed command, then you will become comfortable to use sed and you can use it well.

To be more familiar with sed, do practice and keep reading the sed reference document.

Thanks for reading and happy coding. Take care!

If you like my articles, don’t hesitate to follow me and subscribe.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store