Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
## Beginner's Guide to `pd.read_clipboard`
[`read_clipboard`]( is truly a saving grace for anyone starting out to answer questions in the [tag:pandas] tag. Unfortunately, pandas veterans also know that the data provided in questions isn't always easy to grok into a terminal due to various complication such as MultiIndexes, spaces in header names, datetimes, and python objects.
Thankfully, `read_clipboard` has arguments that make handling most of these cases possible (and easy). The purpose of this answer is to document some of those cases in finer details.
### Spaces in column headers
### Read a Series instead of a DataFrame
### Python objects
Numeric data - simpler
String data - may need yaml
### Other considerations
Uses `read_csv` under the hood, so a lot of the principles for loading data from CSV apply here, such as
- parsing datetimes (use `parse_dates`)
- no headers (use `header=None`)
- custom names (use `names=[...]`)
- set a column as the index (use `index_col=[...]`)
- read series instead of DataFrame (use `squeeze=true`)
- specify a custom separator (use `sep='...'`. If multicharacter or regex, use `engine='python'`)
And so on. See [here]( for a more comprehensive list.
### Limitations of `read_clipboard`
- Cannot parse prettytable/tabulate output (IOW, borders make it harder). Check out some homemade attempts at tackling this.
- Cannot ignore ellipses in data (you'll need to manually remove them)
- Cannot load data from images (if you're upto the task you can make a tesseract extension that does)
### Other useful `pd.read_clipboard` questions for unconventionally formatted data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment