Skip to content

Instantly share code, notes, and snippets.

@pax
Last active March 7, 2023 16:50
Show Gist options
  • Save pax/322e642662f56334317767ee68ba611f to your computer and use it in GitHub Desktop.
Save pax/322e642662f56334317767ee68ba611f to your computer and use it in GitHub Desktop.
Tools & cli snippets for handling large csv files

tags: cheatsheet, csv, json, dev

  • Desktop apps (largeish files)
  • CLI Tools
  • CLI Snippets

Desktop apps - large files

Tad viewer [Win/Linux/OSX]

  • View & analyze data, no exporting.
  • Worked (slowly) with 900M / 790k rows file
  • doesn’t do custom delimiters
  • Seems faster than Tad
  • A simple CSV editor for OS X
  • works on smaller files with comma (,) as delimiter - tested on a 50M file - doesn’t work on a 900M file
  • A desktop CSV editor for data publishers [Win/Linux/OSX]
  • crashed on a 50M file

Open Refine [Win/Linux/OSX]

  • limit of 30K rows

CSV EASY [Win]

DB Browser for SQLite [Win/Linux/OSX]


CLI Tools

Tools for generating CSV and other flat versions of the structured data

  • python CLI tools
  • A suite of utilities for converting to and working with CSV, the king of tabular file formats

csv2xlsx cli

  1. DerLinkshaender/csv2xlsx Finally: a simple, single file executable, no runtime libs command line tool to convert a CSV file to XLSX

  2. mentax/csv2xlsx Convert CSV data to xlsx - especially the big ones.

batch convert xlsx to csv

Convert xlsx to csv in Linux with command line

for i in *.xlsx; do /Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to csv "$i" ; done

json

  • jq – lightweight and flexible command-line JSON processor.
  • zed - Zed offers a new approach to data that makes it easier to manipulate and manage your data. With Zed's new super-structured data model, messy JSON data can easily be given the fully-typed precision of relational tables without giving up JSON's uncanny ability to represent eclectic data.
  • jello - CLI tool to filter JSON and JSON Lines data with Python syntax. (Similar to jq)

Misc / Unfilled

  • sqlite-utils - CLI tool and Python utility functions for manipulating SQLite databases
  • columnq-cli - Simple CLI to help you query tabular data with support for a rich set of growing formats and data sources.
  • htmlq - Like jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.
  • hq - jq, but for HTML
  • textql - Execute SQL against structured text like CSV or TSV
  • q - Run SQL directly on CSV or TSV files
  • simplql - Query csv, xls and json with SQL / Simplql is a private and fast in-browser tool that lets you query your data files with your favourite query language - SQL, without using a database.
  • Miller Like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON - johnkerl/miller
  • HN collection

CLI snippets

gets first line from file1.txt then concatenatest all the files in all.txt

cat *.csv | cut -d, -f1,2 --complement | uniq -u > output.txt

Concatenate all files in folder

cat $(ls -t) > outputfile
ls -tQ | xargs cat

Remove header from all files who's names begin with "file”:

awk 'FNR > 1' file*.csv > newfile.csv

In-file replacements

Replace quotes with ‘qq’ from file.csv, output to newfile.csv

sed -n 's/\"/qq/gpw newfile’ file.csv

Replace all double quotes with ‘qq’ in file.txt

sed -i '' 's/\"/qq/g' file.txt

Replace all double quotes with single quotes tr is only used for one character replace

tr '"' "'"

Replace header (first line) with 'tralala'

sed -i.bak "1 s/^.*$/tralala/" file.csv

Escape double quotes in all files in directory

for file in *
 do
  sed -i '' 's/\"/\\"/g' "$file"
done

Cut parts of a document

awk 'NR >= 57890000 && NR <= 57890010' /path/to/file

Split big csv file in smaller ones

Split ‘file.txt’ in files with 750k rows

xfile=‘file.csv’; tail -n +2 $xfile | split -l 750000 - split_; for file in split_*; do head -n 1 $xfile > tmp_file; cat $file >> tmp_file; mv -f tmp_file $file; done;

Misc Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment