Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Command Line uses for Data Science

Command Line uses for Data Science

The command line for data science is useful for:

  • Quickly check large CSV files
  • Checking data on a server, like in Google Cloud
  • Merging CSV files quickly
  • Replacing tabs with commas or similar formatting

Display CSV file in terminal

less -S file.csv
column -t -s',' file.csv | less -S
head -n 100 file.csv | column -t -s',' | less -S  # For Large file

Count lines of file

wc -l file

Search in CSV file

less -S file.csv  #  Type "/" and search for word

Display second column of CSV file

cut -d',' -f2 file.csv

Filter lines containing the word 'dog'

cat file.csv | grep 'dog' | less -S

Merge 2 CSV files

cat file_1.csv > merged.csv
tail -n+2 file_2.csv >> merged.csv  # Append without header

Merge many CSV files

for i in {1..3}; do cp csv/GBvideos.csv "file_$i.csv"; done
head -n 1 file_1.csv > marged.csv  # Header
find . -name file_*.csv | xargs tail -n+2 >> marged_file.csv

Merge many CSV files - Alternative way

head -n 1 file_1.csv > marged.csv  # Header
for file in file_*.csv; do
    tail -n+2 file >> merged.csv
done

Substitute tabs with commas

sed -E 's/\t/\,/g' file

Substitute comma with semicolon in unquoted text

sed -Ee :1 -e 's/^(([^",]|"[^"]*")*),/\1|/;t1' file.csv

Remove comma in formatted price

echo '£1,245.20' | sed -E 's/(£[0-9])\,([0-9]{3}\.[0-9]{2})/\1\2/'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment