Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
## Beginner's Guide to `pd.read_clipboard`
[`read_clipboard`](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#id45) is truly a saving grace for anyone starting out to answer questions in the [tag:pandas] tag. Unfortunately, pandas veterans also know that the data provided in questions isn't always easy to grok into a terminal due to various complication such as MultiIndexes, spaces in header names, datetimes, and python objects.
Thankfully, `read_clipboard` has arguments that make handling most of these cases possible (and easy). The purpose of this answer is to document some of those cases in finer details.
---
### Spaces in column headers
---
### Read a Series instead of a DataFrame
---
### Python objects
Numeric data - simpler
String data - may need yaml
---
### Other considerations
Uses `read_csv` under the hood, so a lot of the principles for loading data from CSV apply here, such as
- parsing datetimes (use `parse_dates`)
- no headers (use `header=None`)
- custom names (use `names=[...]`)
- set a column as the index (use `index_col=[...]`)
- read series instead of DataFrame (use `squeeze=true`)
- specify a custom separator (use `sep='...'`. If multicharacter or regex, use `engine='python'`)
And so on. See [here](https://stackoverflow.com/a/56231664/4909087) for a more comprehensive list.
---
### Limitations of `read_clipboard`
- Cannot parse prettytable/tabulate output (IOW, borders make it harder). Check out some homemade attempts at tackling this.
- Cannot ignore ellipses in data (you'll need to manually remove them)
- Cannot load data from images (if you're upto the task you can make a tesseract extension that does)
-
---
### Other useful `pd.read_clipboard` questions for unconventionally formatted data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment