Note

This post is about parsing technique in Bash. If you have no idea what Bash is, then this is not the post for you.

If the following code is all you have done with Bash read builtin command, then you have missed a lot of fun!

read -p "Please enter your username: " username

Ever tried with -a, -n, or -t? At least, you have used -s (silent mode) for password input, right? I only used read to get user inputs. But someday, I found out it could be even more. Anyway, this post is about parsing, so those options are not the superstars today.

1   Example

Say you want to get the process ID, which uses the most CPU resource currently, as well as the percentages of CPU utilization and memory usage.

% ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line
  509 10.4  1.2

This is the starting point. So according the output, you may write the following code

foo_pid=$(ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line | cut -d ' ' -f 3)
foo_cpu=$(ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line | cut -d ' ' -f 4)
foo_mem=$(ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line | cut -d ' ' -f 6)

If you have experience, you will know this is a very bad idea. Because

  1. The three fields, pid, cpu, and mem, would have prefixing spaces. That will cause field number incorrect when using cut, and
  2. Between three cuts, the data is very likely to be different. Later cuts may have updated information.

Some people may store the ps output to variable, then use echo the variable to pipe to cut.

Now take a look at how you do with read,

read foo_pid foo_cpu foo_mem <<<"$(ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line)"

Simple. Or you can put the result to array,

read -a foo <<<"$(ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line)"
echo ${foo[0]} ${foo[1]} ${foo[2]}

The only drawback is you need to feed it with Here Strings or Here Documents. The only way I know of with pipe is to use while,

ps -u $UID -o pid= -o %cpu=,%mem= --sort -%cpu | line |
while read foo_pid foo_cpu foo_mem; do echo $foo_pid $foo_cpu $foo_mem; done

It looks strange, but actually not. This form command | while read var1 var2; do ...; done is common for parsing and processing data, when data is in lines.

Right now, you should know how read can be used.

2   Counterexample

Is read all great? Nope. Consider the following code,

echo "$first_name $middle_name $last_name" >> names.txt

If you have a script which writes name to names.txt from whatever the source is. Assuming each name does not have spaces. You may process with another script as the following code,

while read first_name middle_name last_name; do
  do_something
done <names.txt

If one line doesn’t have middle name:

Mark  Twain

(two spaces), then $middle_name will be actually the last name and $last_name will be empty. If using cut, you don’t have such problem.

3   Conclusion

read is a builtin command, cut is an external command. Even you don’t know the difference, you should be able to guess which is faster. A quick test,

% line="abc def"
% time for i in {1..100}; do read foo _ <<<"$line"; done

real    0m0.006s
user    0m0.003s
sys     0m0.003s

% time for i in {1..100}; do foo=$(echo "$line" | cut -d ' ' -f 1); done

real    0m0.353s
user    0m0.046s
sys     0m0.213s

read is much faster and that’s expected.

For dealing with fixed format, using cut is a good idea but the main drawbacks are you need to count the field numbers and you can only assign the output to variables one at a time, even cut supports extracting multiple fields, for example -f 1-3,7.

However, there is a Bash trick, using array,

foo=($(echo "a  c" | cut -d ' ' -f 1,3))

But this will be subject to the same situation as explained in counterexample above.

The life could be easier with read based on the scripts I read online. And many times using grep and sed will be awesome, or just sed,

xwininfo -root | sed '/\(Width\|Height\)/ {s/[^0-9]//gi;p} ; d'

sed is an awesome tool, but that’s another story.