Skip to content

Instantly share code, notes, and snippets.

@naphat-k
Last active April 29, 2022 15:30
Show Gist options
  • Save naphat-k/851e334205dcf2b3e3ac265488c21400 to your computer and use it in GitHub Desktop.
Save naphat-k/851e334205dcf2b3e3ac265488c21400 to your computer and use it in GitHub Desktop.
Advanced CSV file handing using Sublime Text

Advanced CSV file handing using Sublime Text

Intro

  • Business users occasionally needs to manipulate large data set that usually comes in the from of text files like CSV or TSV, and hit the limit of the spreadsheet tools like Excel.
  • Probably two of the most common issues are (1) slightly broken CSV file, and (2) a really large file that can freeze Excel upon opening.
  • Most people think they needs programming languages like Python, but usually the result can be achieve using simpler too like text editor or command line interface (like Sublime Text that I'll be introducing here). In addition, for people who aren't really used to programming languages, memorizing syntax and executing them without errors can be quite challenging.
  • Sublime Text is the tool that I'll use to introduce this concept. It's free to evaluate and available cross-platform

Basics

Note that I'll be using Mac keyboard shortcuts, but if you're using Windows simply replace Cmd with Ctrl, and the rest should be the same.

Sublime Text has a lot of built-in features but let's check out two essential ones for now

  1. Multiple cursor

    • Highlight the text, in this case I'm highlighting the entire page with Cmd + A, and then press Cmd + Shift + L
    • You should see that there's cursor at the end of every line with text remaining highlighted, press left or right arrow key will bring your cursor to the end or the beginning if each line where you can do tasks like adding a - to make bullet points, or adding comma at the end of each line.
    • You can arbitrarily add cursors at any point in the text by holding a Cmd key while clicking too. img1
  2. Command Palette

    • The command palette is basically a handy tool for typing in any command in Sublime Text instead of going through menus or memorizing the short cut. To bring it up use the short cut Cmd + Shift + P
    • Try installing a package by first bringing up the package install by typing Package Control: Install Package (just some part of the text will do).
    • Try installing rainbow_csv package

Now let's proceed to the main parts:

Search the text with Regular Expression

  • Let's try open a CSV file. In this demo I'll use New York City Covid Data)
  • When you opening the file, the bottom right hand corner indicates the file type, in our example it reads as plain text. Let's try changing the syntax to CSV (Rainbow) by bringing up the command Set Syntax: Rainbow (CSV) from the command palette (again using Cmd + Shift + P ). img3
  • The CSV just becomes colorful! you can type Align CSV columns with spaces into the command palette to make it easier to read. img4
  • Let's undo it back for now because it fills our original text file with spaces which we don't really want for our next demo.
  • Let's try searching part of the text using regular expression, or 'regex' for short. It's basically a script that allows you to find the pattern in the text which can help us perform beyond exact match text search.
  • Press Cmd + F to find text and make sure that the .* button at the bottom left is clicked
  • Say we want to highlight all the column with borough name, maybe to delete them. Try typing ,\".*\", in the search field.
  • The ,\" and \", at the beginning and the end are the pattern that's unique to the borough column .* refers to any character of any length.
  • Notice that with this search, it's incorrectly highlight more part of the text than we want in the first line. We need to adjust our regex to address this case. A guide to regex is kinda a separate guide in itself, so I'll ask you to Google for resources to learn them (or start with this). Learning basic regex is quite easy and rewarding. It also pays off in terms of amount of time saved from manually selecting the text.
  • For the search string that works for this particular example, try ,\"[A-z\s]+\",?. img5

Filtering the data with queries

  • Another nifty feature of Rainbow CSV is an ability to write queries to our text file. It uses RBQL, which is kinda like half SQL half Python. it's not that hard to learn though.
  • Open the command palette again look for Rainbow CSV: Run RBQL query
  • The column will be referred by "a1, a2, a3, ..." according to the popup that Sublime Text shows
  • In this example, I'll try selecting the first three columns, starting from the second row onward, where the string of the second column is Manhattan, and the value of the third column is greater than 3. I typed select a1, a2, a3 where NR>1 and a2=="Manhattan" and float(a3) > 3, hit enter and got the result in a separate tab.
  • The query can seems intimidating at first but it's definitely faster than manipulate using Pandas, and if you're already familiar with SQL you already got a head start. Here's the link to Rainbow CSV documentation which has a list and explanation of the RBQL syntax. img6 img7

What's next

  • In my experience, just getting used to Sublime Text basic features and two use cases: regex and RBQL query, can probably address a lot of business user's needs. Since they tend to be seasoned Excel or Tableau users, it's actually faster for them to take it back there and manipulate with those tools.
  • Sublime Text has a lot more cool features and packages that, to be honest it's more useful if you code. There's also additional packages worth mentioning
    • DataConverter: convert CSV into different table format. One of my favorite is to convert into text representation of table like this (using our CSV file we just made the query) img8
    • Text Pastry: some additional tools like, if I place the cursor at the front of every row, I can add a running number in every row img9
    • FileDiffs: compare and highlight (in red and green) the differences between two text files.
    • Pretty JSON: If you need to work with JSON you'll likely encounter JSON that is slightly broken. This tool helps at checking if it's a valid JSON, and arrange it in a way that is easier to read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment