Command-line Tools can be 235x Faster than your Hadoop Cluster
"This find | xargs mawk | mawk pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation."
#complexity #ShellTools #RightToolForTheRightJob #Hadoop #computing