# How to load (almost) anything into a DataFrame - examples
This notebook is a companion to our blog post on getting data into pandas' DataFrame:
**NOTE: when running this notebook in Google Colab, you will need to restart the runtime after new version `pandas` is installed in the first cell.**
## Setup
### Dependencies
First, we will install dependencies. Apart from Pandas itself, we are installing optional dependencies. This is required, because we will be using features that are not included with Pandas itself, like e.g. reading from S3.
If these dependencies are not installed and you use operation that requires them, Pandas will fail with an error message that will indicate what's the missing package. For example:
Missing optional dependency 'fsspec'. Use pip or conda to install fsspec.
More information on optional dependencies that Pandas is using can be found in their [documentation](
### Logging
In this section, you can adjust logging level for `botocore` or `fsspec` to see into what's going on behind the scenes.
I found it useful when debugging some issues with getting access to S3 objects.
### Temporary files
Here I'm setting up a directory for temporary files used throughout this notebook. There is one CSV file with sample data downloaded into that dir.
## Examples
Ok, now that we have everything set up, let's run through some examples!
For simplicity, all of the examples will be using the same dataset, but stored in different format.
### Example 1: Loading CSV data from the filesystem
As simple as it gets - I have a CSV file on my disk and want to load it.
### Example 2. Loading CSV from remote location
Let's explore few ways you can load data from remote location like S3. Other types of remote storage will work in a very similar way.
**Note**: all of the S3 objects we use in this example are public, so we don't need AWS credentials to access them. When you're accessing your data in S3 you will likely need to have AWS credentials configured before accessing S3 objects.
### Example 3. Loading data using various readers
This section contains examples of loading data using some of the most popular readers that Pandas support.
#### JSON
#### Excel
#### Parquet
#### SQL
#### Compressed files
You can point Pandas to a compressed file as well! It will automatically detect the compression algorithm that was used and decompress it on the fly.
It works with any of the `read_...` methods.
### Example 4. Loading data from a website
This one is one of my most favorite ones. I came across that table with medal counts on the [Wikipedia about 2016 Olympics]( I wanted to see how all these countries compare to each other visually.
Before, I would try to copy that data into a text editor, make it into a proper CSV and then load that into a `DataFrame`. Turns out, Pandas can read data from tables directly from the HTML!
That's very handy for ad-hoc, quick analysis. Bear in mind, it only works for small tables (who would have a huge table on their website anyway?!) and parsing is relatively slow.
#### Exercise
Calculate number of medals per person from each country.
#### How many different formats do Pandas support?
Here's a snippet that gets this info directly from Pandas documentation!
