Why Every Data Scientist Should Know Command Line Tools
The UNIX command line is great for basic data processing tasks because it has very low latency. If you have a file with millions of rows, performing basic operations in a higher-level language requires reading the entire data file into memory. This can take unacceptably long amounts of time. With the command line, you can work on an entire file without worrying about your task taking hours because it is never necessary to read the entire file into memory.