Skip to content

Instantly share code, notes, and snippets.

@anxiousmodernman
Last active February 15, 2018 01:20
Show Gist options
  • Save anxiousmodernman/b51a0d7b0d32cabbde43fbae46970e50 to your computer and use it in GitHub Desktop.
Save anxiousmodernman/b51a0d7b0d32cabbde43fbae46970e50 to your computer and use it in GitHub Desktop.
Notes on a dataframe environment

What's a dataframe?

A dataframe is a 2 dimensional data structure, a "list of lists", where the items in the list can be of any type. Dataframes are similar to a spreadsheet or a SQL table. The word "dataframe" was popularized by pandas, the widely-used Python library, and R, a programming language devoted to data analysis.

# A dataframe where column 0 is a string and column 1 is a float
[["hello", 10.2]
 ["world", 0.1]]

What makes dataframes special is that they provide an API for programmers to do things with the contents.

Dataframes let you

  • query for contents
  • slice or subset rows
  • perform common statistics tasks
  • easily serialize to csv

Interpreted languages

Dataframes really shine in interpreted languages with first-class support for interfactive programming. Unlike other kinds of programming, data analysis involves a lot of trial and error. Why? Because

  • data sets are messy, so usually one needs to poke around first
  • many analyses are dead ends; a lot of stuff doesn't pan out, so there is less of a demand for production-level development workflow

Having a live interpreter also lets one

  • print and visualize data quickly
  • dynamically add and store dataframes (and other objects) in the environment

Dataframe GUIs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment