The context
It is well known that cat
can be uselessly used and that other forms are preferable. For example:
cat file.txt | grep word
which can be better written like this:
grep word file.txt
This is not without its myths, though. Iván Zenteno (thanks, Iván!) argued in a recent Linuxeros Zapopan conversation that it was due to memory consumption. Quoting:
(Note: all quotations are rough English translations from their original messages in Spanish)
cat | grep
NOOOO. Simply usegrep regex filename
. Don’t overload memory withcat
. If your file is 100 MB long you are doingcat
to RAM and by passing it via standard input togrep
there’s another 100 MB. Then you are using 200 MB when you could be using 100 MB only.
I challenged that assertion:
[It] should be verified. It seems that a cat implementation that reads all the content before throwing it out would be a waste of resources; therefore I don’t think it works like that, therefore the argument would be false. I honestly think it is not the case. I will try to do some tests.
By challenging it, it may have seemed like I was challenging the whole Useless Use of Cat (UUOC) case. Just to make it clear, I was not: I was just challenging the RAM aspect as an argument against UUOC.
I should have even gone further: not even grep will use that amount of memory (for simple grep) because it’s just not needed.
Alex Callejas added a personal experience (thanks, Alex!):
If the search is for a string, wouldn’t it be better to use grep? In [non-current government entity] we had a script that did cat first to the whole file to look for CURP* strings. The problem was when the file was more than a 1 TB in size. It took ages to run to find newly added CURPs. We implemented the search using grep, and we did monthly historicals and lowered resource consumption.
* For context, CURP is a unique identifier string for each Mexican. It follows a specific format.
It made sense. It seems like everyone agrees that cat can frequently be uselessly used. Me included.
Later, in a clarification I said:
I think pure grep is faster. I think it is because of its buffering and avoiding additional pipe processing, but the RAM consumption is a myth.
I offered to do some testing so we all could learn together. After all, this is one of those simple beliefs that can quickly be demystified and provide a great challenge to our personal understanding of underlying technology. Little did I know that it would throw back some interesting results!
The tests
The tests were run in a computer with 8 GiB of RAM (more than 4 GiB free for use by applications) with an first generation i3 2-core CPU with Hyperthreading enabled (a total of 4 virtual cores). The disk drive is an SSD interfaced through a SATA II port. It runs Debian Sid with Linux kernel version 5.10.
I created three text files with the sizes 166.2 MiB, 3.57 GiB and 33.92 GiB. Why those sizes? The first one is an original file with long rows which included path names and sizes. I cannot publish the file as they contain personal data. The other two are just created from a multiple concatenation of the first one to amounts that seemed reasonable to make up for sufficiently distinctive cases.
I tested for performance using Bash’s time
built-in for the following tools:
grep x $FILE | wc -l
I filtered the results through
wc -l
to avoid having the terminal output to be included in the measurement. This would have contaminated the result. It seemed likewc -l
was a good choice for a process that would be way simpler and faster than grep and, thus, would contaminate the results the least possible amount. Later, I got the suggestion forgrep -c
to avoid the extra pipe and it became test #3.grep x $FILE > /dev/zero
Why
/dev/zero
? Because when using /dev/null grep skipped all processing. My take is that grep accurately detects that redirected output to /dev/null would just be a waste of resources and it decides to avoid doing any work.grep -c x $FILE
It is the same as test #1 but avoids the extra pipeline. This is the purest possible form of the test. Thanks Iván Zenteno!
wc -l $FILE
What if grep had some buffering that other utilities did not? This is so we try to replicate the results in tools other than grep.
gawk '{l++}END{print l}' $FILE
Same as above. What if grep had some buffering that other utilities did not? This is so we try to replicate the results in tools other than grep. This gawk script does the same as
grep -c
.
For each tool I tested three input techniques:
- Direct file specification
- Input redirection
- Standard input from cat.
I tested each tool + input combination for each of the three file sizes. For each tool + input + size combination, the test was run three times and the lower value taken. This is so we avoid disk I/O variability by making sure that as much of the file is fully read into disk cache. Why did we not dropped caches instead? Because disk I/O is way slower so the results would be reflecting disk read time instead of pure cat time instead. Important: only the first two file sizes to fully fit in RAM but not the third one! This makes the third file a great way to see if disk I/O performs differently with cat than with other tools.
After each test that involved cat, I repeated the test but using /usr/bin/time -v
(not Bash’s time built-in) just on cat
to get its maximum resident size set.
This is a total of 63 test results taken out of 189 data points. With all these tests we can have a real performance comparison, but also what I cared the most of: if cat RAM consumption was actually an issue.
The results
Measurement | Units | Small | Medium | Large |
---|---|---|---|---|
File size | MiB |
166.2 |
3,656.9 |
34,740.9 |
grep x $FILE | wc -l |
s |
0.2 |
6.1 |
143.5 |
grep x < $FILE | wc -l |
s |
0.3 |
8.2 |
153.5 |
cat $FILE | grep x | wc -l |
s |
0.4 |
9.7 |
137.9 |
Maximum resident set size for cat : |
kB |
1,624 |
1,672 |
1,784 |
grep x $FILE > /dev/null |
s |
0.0 |
0.0 |
0.0 |
grep x $FILE > /dev/zero |
s |
0.3 |
6.2 |
144.6 |
grep x < $FILE > /dev/zero |
s |
0.3 |
6.1 |
143.8 |
cat $FILE | grep x > /dev/zero |
s |
0.3 |
7.3 |
134.0 |
Maximum resident set size for cat : |
kB |
1,636 |
1,628 |
1,628 |
grep -c x $FILE |
s |
0.1 |
2.3 |
140.1 |
grep -c x < $FILE |
s |
0.1 |
2.3 |
140.5 |
cat $FILE | grep -c x |
s |
0.1 |
3.1 |
135.7 |
Maximum resident set size for cat : |
kB |
1,788 |
1,784 |
1,672 |
wc -l $FILE |
s |
0.1 |
1.9 |
133.5 |
wc -l < $FILE |
s |
0.1 |
1.5 |
133.6 |
cat $FILE | wc -l |
s |
0.1 |
3.0 |
133.4 |
Maximum resident set size for cat : |
kB |
1,784 |
1,660 |
1,624 |
gawk '{l++}END{print l}' $FILE |
s |
0.4 |
8.0 |
139.1 |
gawk '{l++}END{print l}' < $FILE |
s |
0.4 |
7.7 |
138.6 |
cat $FILE | gawk '{l++}END{print l}' |
s |
0.6 |
13.1 |
148.0 |
Maximum resident set size for cat : |
kB |
1,624 |
1,624 |
1,788 |
I am highlighting the best and worst results. If two results are within 1% of each other I highlighted both, as I considered it the same result. If the three results are within 1% of each other I did not highlight any of them (all three are the best and the worst at the same time).
The conclusions
- I was quite surprised about
grep > /dev/null
. I certainly didn’t know about that. - The idea that cat will double RAM usage is a myth.
- Surprisingly,
cat | grep
performed better than pure grep in all of the tests for the large file! I am bewildered by this result! cat | wc -l
did not make any difference for the large file but it did for the medium file. This is another head-scratcher.cat | gawk
was consistent with the expectation.- In some cases, even input redirection worked better than directly specifying the file!
With all this, my general conclusions are:
- The real usage waste for simple cases is just an extra process and a pipe.
- For large input or output files, disk I/O will be your bottleneck. Cutting down cat will do next to nothing to improve performance.
- For medium input files, sometimes redirection even worked up to 2x better! Always test if performance is critical to you. It all depends on how optimized is the tool that receives the input for the different input scenarios.
- The extra process and pipe overhead will become apparent when repeating the process hundreds of times for small files. Keep it in mind!
There is no need to be religious or bully about it. Really: don’t worry at all unless there is a case for it. By the way, the last sentence is the definition for a microoptimization.
Fixing UUOC will be, in most cases, just micro-optimization. Always keep it in mind, though, and if you find a case for it, fix it! Some extreme cases may be severely impacted by UUOC. But otherwise, if the case is simple and you feel your code is more legible by uselessly using cat
, use it and don’t feel bad about it at all.
This does not excuse you from learning about this resource waste and its implications. It is your responsibility to know what you are doing and to always keep UUOC in mind while coding to avoid performance impacts!
The cliffhanger ending
These are the results for my scenario. Who knows about other operating systems, other kernel versions, other computer configuration, other row-lengths and text content… Can you replicate my results?
Also, this is only true when the file can be processed sequentially; this is, that the whole file does not need to be read before starting its processing. There are tools that might need to load the whole file into RAM, like sort and sponge. I would expect that if sort works on a file backed by disk (not fed through standard input either with pipe or redirection) it might not need as much RAM, but the algorithm would need to be a bit more complicated. If it’s fed from standard input, though, there is not way to save RAM. Or is it?
So I made a quick test for the small and medium files only:
Measurement | Units | Small | Medium |
---|---|---|---|
File size | MiB |
166.2 |
3,656.9 |
Time for sort $FILE | wc -l |
s |
10.8 |
578.3 |
Time for sort < $FILE | wc -l |
s | 10.6 | 617.9 |
Time for cat $FILE | sort | wc -l |
s | 21.5 | 1,015.9 |
Max RSS for sort $FILE | wc -l |
kB |
235,896 |
4,939,544 |
Max RSS for sort < $FILE | wc -l |
kB | 235,700 | 5,139,724 |
Max RSS for cat $FILE | sort | wc -l |
kB | 14,400 | 15,172 |
Can you explain why the difference of results? Can you estimate what would happen with files larger than the available RAM?
This is a discussion for another day.