Skip to content

Instantly share code, notes, and snippets.

@gmsharpe
Created April 2, 2022 19:20
Show Gist options
  • Save gmsharpe/4a2bba5088c0263b254cf954d82c61c9 to your computer and use it in GitHub Desktop.
Save gmsharpe/4a2bba5088c0263b254cf954d82c61c9 to your computer and use it in GitHub Desktop.
parquet_with_java_tablesaw_and_google_colab.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "parquet_with_java_tablesaw_and_google_colab.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "java",
"display_name": "java"
},
"language_info": {
"name": "java"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/gmsharpe/4a2bba5088c0263b254cf954d82c61c9/parquet_with_java_tablesaw_and_google_colab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"source": [
"%%bash\n",
"#!/usr/bin/env bash\n",
"\n",
"echo \"Update environment...\"\n",
"apt update -q &> /dev/null \n",
"\n",
"#echo \"Install Java...\" \n",
"#apt-get install -q openjdk-11-jdk-headless &> /dev/null\n",
"\n",
"echo \"Install Jupyter java kernel...\"\n",
"curl -L https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip \\\n",
" -o ijava-kernel.zip &> /dev/null\n",
"\n",
"unzip -q ijava-kernel.zip -d ijava-kernel \\\n",
" && cd ijava-kernel \\\n",
" && python3 install.py --sys-prefix &> /dev/null"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Mjyn8PAsFQrW",
"outputId": "606e4de1-6ae0-47b1-ef4f-03d548d122a1"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Update environment...\n",
"Install Jupyter java kernel...\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Connect to a hosted Runtime (Java)\n",
"\n",
"![image.png]()"
],
"metadata": {
"id": "DwxjPm5U5zO1"
}
},
{
"cell_type": "code",
"source": [
"%%loadFromPOM\n",
"<dependency>\n",
" <groupId>commons-io</groupId>\n",
" <artifactId>commons-io</artifactId>\n",
" <version>2.11.0</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>tech.tablesaw</groupId>\n",
" <artifactId>tablesaw-core</artifactId>\n",
" <version>0.42.0</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.parquet</groupId>\n",
" <artifactId>parquet-common</artifactId>\n",
" <version>1.12.2</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.parquet</groupId>\n",
" <artifactId>parquet-encoding</artifactId>\n",
" <version>1.12.2</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.parquet</groupId>\n",
" <artifactId>parquet-column</artifactId>\n",
" <version>1.12.2</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.parquet</groupId>\n",
" <artifactId>parquet-hadoop</artifactId>\n",
" <version>1.12.2</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.hadoop</groupId>\n",
" <artifactId>hadoop-common</artifactId>\n",
" <version>3.3.1</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.hadoop</groupId>\n",
" <artifactId>hadoop-mapreduce-client-core</artifactId>\n",
" <version>3.3.1</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.hadoop</groupId>\n",
" <artifactId>hadoop-hdfs</artifactId>\n",
" <version>3.3.1</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.hadoop</groupId>\n",
" <artifactId>hadoop-main</artifactId>\n",
" <version>3.3.1</version>\n",
"</dependency>\n",
"<dependency>\n",
" <groupId>org.apache.parquet</groupId>\n",
" <artifactId>parquet-cli</artifactId>\n",
" <version>1.12.2</version>\n",
"</dependency>"
],
"metadata": {
"id": "FNjZHd0fFXHe"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Download the Data"
],
"metadata": {
"id": "9MBtFaU47lUC"
}
},
{
"cell_type": "code",
"source": [
"import org.apache.commons.io.FileUtils;\n",
"\n",
"String source = \"https://s3.amazonaws.com/nyc-tlc/trip+data/green_tripdata_2021-01.csv\";\n",
"String file_path = \"/content/green_2021-01\";\n",
"FileUtils.copyURLToFile(new URL(source), new File(file_path + \".csv\"));"
],
"metadata": {
"id": "Ax9cDEbC6Agg"
},
"execution_count": 17,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Preview the data in [Tablesaw](https://github.com/jtablesaw/tablesaw)\n",
"\n",
"```\n",
"NOTE: if you get an exception claiming the file cannot be found, it might be due to the 'file_path' variable not being set. \n",
"I wasn't able to figure out why this happens sometimes after first running the above few code cells....\n",
"```"
],
"metadata": {
"id": "cp_g2bG2fcye"
}
},
{
"cell_type": "code",
"source": [
"import tech.tablesaw.api.*;\n",
"import tech.tablesaw.columns.*;\n",
"\n",
"Table table = Table.read().csv(file_path + \".csv\");\n",
"table.print(20);"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0fOOC3hNfZ9T",
"outputId": "38dffa36-1021-4824-e3ed-171ac949f139"
},
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" green_2021-01.csv \n",
" VendorID | lpep_pickup_datetime | lpep_dropoff_datetime | store_and_fwd_flag | RatecodeID | PULocationID | DOLocationID | passenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | congestion_surcharge |\n",
"-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
" 2 | 2021-01-01T00:15:56.000 | 2021-01-01T00:19:52.000 | false | 1 | 43 | 151 | 1 | 1.01 | 5.5 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 6.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:25:59.000 | 2021-01-01T00:34:44.000 | false | 1 | 166 | 239 | 1 | 2.53 | 10 | 0.5 | 0.5 | 2.81 | 0 | | 0.3 | 16.86 | 1 | 1 | 2.75 |\n",
" 2 | 2021-01-01T00:45:57.000 | 2021-01-01T00:51:55.000 | false | 1 | 41 | 42 | 1 | 1.12 | 6 | 0.5 | 0.5 | 1 | 0 | | 0.3 | 8.3 | 1 | 1 | 0 |\n",
" 2 | 2020-12-31T23:57:51.000 | 2021-01-01T00:04:56.000 | false | 1 | 168 | 75 | 1 | 1.99 | 8 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 9.3 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:16:36.000 | 2021-01-01T00:16:40.000 | false | 2 | 265 | 265 | 3 | 0 | -52 | 0 | -0.5 | 0 | 0 | | -0.3 | -52.8 | 3 | 1 | 0 |\n",
" 2 | 2021-01-01T00:16:36.000 | 2021-01-01T00:16:40.000 | false | 2 | 265 | 265 | 3 | 0 | 52 | 0 | 0.5 | 0 | 0 | | 0.3 | 52.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:19:14.000 | 2021-01-01T00:19:21.000 | false | 5 | 265 | 265 | 1 | 0 | 180 | 0 | 0 | 36.06 | 0 | | 0.3 | 216.36 | 1 | 2 | 0 |\n",
" 2 | 2021-01-01T00:26:31.000 | 2021-01-01T00:28:50.000 | false | 1 | 75 | 75 | 6 | 0.45 | 3.5 | 0.5 | 0.5 | 0.96 | 0 | | 0.3 | 5.76 | 1 | 1 | 0 |\n",
" 2 | 2021-01-01T00:57:46.000 | 2021-01-01T00:57:57.000 | false | 1 | 225 | 225 | 1 | 0 | 2.5 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 3.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:58:32.000 | 2021-01-01T01:32:34.000 | false | 1 | 225 | 265 | 1 | 12.19 | 38 | 0.5 | 0.5 | 2.75 | 0 | | 0.3 | 42.05 | 1 | 1 | 0 |\n",
" ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |\n",
" | 2021-01-15T10:03:00.000 | 2021-01-15T10:28:00.000 | | | 26 | 55 | | 7.48 | 25.43 | 0 | 0 | 2.75 | 0 | | 0.3 | 28.48 | | | |\n",
" | 2021-01-15T10:06:00.000 | 2021-01-15T10:17:00.000 | | | 42 | 41 | | 0.89 | 17.69 | 0 | 0 | 2.75 | 0 | | 0.3 | 20.74 | | | |\n",
" | 2021-01-15T10:53:00.000 | 2021-01-15T11:07:00.000 | | | 41 | 74 | | 1.64 | 15.45 | 0 | 0 | 2.75 | 0 | | 0.3 | 18.5 | | | |\n",
" | 2021-01-15T10:47:00.000 | 2021-01-15T10:58:00.000 | | | 177 | 225 | | 1.71 | 16.73 | 0 | 0 | 2.75 | 0 | | 0.3 | 19.78 | | | |\n",
" | 2021-01-15T10:43:00.000 | 2021-01-15T11:20:00.000 | | | 11 | 246 | | 12.55 | 52.46 | 0 | 0 | 2.75 | 0 | | 0.3 | 55.51 | | | |\n",
" | 2021-01-15T10:35:00.000 | 2021-01-15T10:51:00.000 | | | 3 | 147 | | 5.97 | 17.01 | 0 | 0 | 0 | 0 | | 0.3 | 17.31 | | | |\n",
" | 2021-01-15T10:25:00.000 | 2021-01-15T10:34:00.000 | | | 242 | 213 | | 3.83 | 27.27 | 0 | 0 | 2.75 | 0 | | 0.3 | 30.32 | | | |\n",
" | 2021-01-15T10:16:00.000 | 2021-01-15T10:20:00.000 | | | 181 | 181 | | 0.45 | 12.89 | 0 | 0 | 2.75 | 0 | | 0.3 | 15.94 | | | |\n",
" | 2021-01-15T10:16:00.000 | 2021-01-15T10:58:00.000 | | | 244 | 72 | | 22.21 | 50.67 | 0 | 0 | 2.75 | 6.12 | | 0.3 | 59.84 | | | |\n",
" | 2021-01-15T10:24:00.000 | 2021-01-15T10:46:00.000 | | | 227 | 33 | | 4.2 | 21.37 | 0 | 0 | 2.75 | 0 | | 0.3 | 24.42 | | | |"
]
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Load the tablesaw-parquet Jar \n",
"\n",
"Using the `%jars` line magic from IJava, we'll load the necessary class dependencies for processing parquet files with Tablesaw.\n"
],
"metadata": {
"id": "uANVbU4b7JkH"
}
},
{
"cell_type": "code",
"source": [
"String projectGitAcct = \"https://github.com/gmsharpe\";\n",
"String path = \"/tablesaw-parquet/releases/download/0.42.0-1.12.2-3.3.1-0.11.0/\";\n",
"String tablesawParquetJar = \"tablesaw_0.42.0-parquet-0.11.0-SNAPSHOT.jar\";\n",
"\n",
"FileUtils.copyURLToFile(new URL(projectGitAcct + path + tablesawParquetJar), \n",
" new File(\"/content/\" + tablesawParquetJar));"
],
"metadata": {
"id": "P_tXLxEw7Hns"
},
"execution_count": 43,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%jars \"/content/\" + tablesawParquetJar"
],
"metadata": {
"id": "_eG0pIVClRyQ"
},
"execution_count": 45,
"outputs": []
},
{
"cell_type": "code",
"source": [
"table.print(20);"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "166xXyrUWMqB",
"outputId": "a6beba72-9547-4c33-c8a7-0ac2fa9d2b5d"
},
"execution_count": 49,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" green_2021-01.csv \n",
" VendorID | lpep_pickup_datetime | lpep_dropoff_datetime | store_and_fwd_flag | RatecodeID | PULocationID | DOLocationID | passenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | congestion_surcharge |\n",
"-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
" 2 | 2021-01-01T00:15:56.000 | 2021-01-01T00:19:52.000 | false | 1 | 43 | 151 | 1 | 1.01 | 5.5 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 6.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:25:59.000 | 2021-01-01T00:34:44.000 | false | 1 | 166 | 239 | 1 | 2.53 | 10 | 0.5 | 0.5 | 2.81 | 0 | | 0.3 | 16.86 | 1 | 1 | 2.75 |\n",
" 2 | 2021-01-01T00:45:57.000 | 2021-01-01T00:51:55.000 | false | 1 | 41 | 42 | 1 | 1.12 | 6 | 0.5 | 0.5 | 1 | 0 | | 0.3 | 8.3 | 1 | 1 | 0 |\n",
" 2 | 2020-12-31T23:57:51.000 | 2021-01-01T00:04:56.000 | false | 1 | 168 | 75 | 1 | 1.99 | 8 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 9.3 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:16:36.000 | 2021-01-01T00:16:40.000 | false | 2 | 265 | 265 | 3 | 0 | -52 | 0 | -0.5 | 0 | 0 | | -0.3 | -52.8 | 3 | 1 | 0 |\n",
" 2 | 2021-01-01T00:16:36.000 | 2021-01-01T00:16:40.000 | false | 2 | 265 | 265 | 3 | 0 | 52 | 0 | 0.5 | 0 | 0 | | 0.3 | 52.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:19:14.000 | 2021-01-01T00:19:21.000 | false | 5 | 265 | 265 | 1 | 0 | 180 | 0 | 0 | 36.06 | 0 | | 0.3 | 216.36 | 1 | 2 | 0 |\n",
" 2 | 2021-01-01T00:26:31.000 | 2021-01-01T00:28:50.000 | false | 1 | 75 | 75 | 6 | 0.45 | 3.5 | 0.5 | 0.5 | 0.96 | 0 | | 0.3 | 5.76 | 1 | 1 | 0 |\n",
" 2 | 2021-01-01T00:57:46.000 | 2021-01-01T00:57:57.000 | false | 1 | 225 | 225 | 1 | 0 | 2.5 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 3.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:58:32.000 | 2021-01-01T01:32:34.000 | false | 1 | 225 | 265 | 1 | 12.19 | 38 | 0.5 | 0.5 | 2.75 | 0 | | 0.3 | 42.05 | 1 | 1 | 0 |\n",
" ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |\n",
" | 2021-01-15T10:03:00.000 | 2021-01-15T10:28:00.000 | | | 26 | 55 | | 7.48 | 25.43 | 0 | 0 | 2.75 | 0 | | 0.3 | 28.48 | | | |\n",
" | 2021-01-15T10:06:00.000 | 2021-01-15T10:17:00.000 | | | 42 | 41 | | 0.89 | 17.69 | 0 | 0 | 2.75 | 0 | | 0.3 | 20.74 | | | |\n",
" | 2021-01-15T10:53:00.000 | 2021-01-15T11:07:00.000 | | | 41 | 74 | | 1.64 | 15.45 | 0 | 0 | 2.75 | 0 | | 0.3 | 18.5 | | | |\n",
" | 2021-01-15T10:47:00.000 | 2021-01-15T10:58:00.000 | | | 177 | 225 | | 1.71 | 16.73 | 0 | 0 | 2.75 | 0 | | 0.3 | 19.78 | | | |\n",
" | 2021-01-15T10:43:00.000 | 2021-01-15T11:20:00.000 | | | 11 | 246 | | 12.55 | 52.46 | 0 | 0 | 2.75 | 0 | | 0.3 | 55.51 | | | |\n",
" | 2021-01-15T10:35:00.000 | 2021-01-15T10:51:00.000 | | | 3 | 147 | | 5.97 | 17.01 | 0 | 0 | 0 | 0 | | 0.3 | 17.31 | | | |\n",
" | 2021-01-15T10:25:00.000 | 2021-01-15T10:34:00.000 | | | 242 | 213 | | 3.83 | 27.27 | 0 | 0 | 2.75 | 0 | | 0.3 | 30.32 | | | |\n",
" | 2021-01-15T10:16:00.000 | 2021-01-15T10:20:00.000 | | | 181 | 181 | | 0.45 | 12.89 | 0 | 0 | 2.75 | 0 | | 0.3 | 15.94 | | | |\n",
" | 2021-01-15T10:16:00.000 | 2021-01-15T10:58:00.000 | | | 244 | 72 | | 22.21 | 50.67 | 0 | 0 | 2.75 | 6.12 | | 0.3 | 59.84 | | | |\n",
" | 2021-01-15T10:24:00.000 | 2021-01-15T10:46:00.000 | | | 227 | 33 | | 4.2 | 21.37 | 0 | 0 | 2.75 | 0 | | 0.3 | 24.42 | | | |"
]
},
"metadata": {},
"execution_count": 49
}
]
},
{
"cell_type": "markdown",
"source": [
"#### Write the Table to Disk as Parquet"
],
"metadata": {
"id": "vZnW_G9U8SZL"
}
},
{
"cell_type": "code",
"source": [
"import net.tlabs.tablesaw.parquet.*;\n",
"import org.apache.hadoop.mapred.JobConf;\n",
"\n",
"TablesawParquetWriter writer = new TablesawParquetWriter();\n",
"TablesawParquet.register();\n",
"writer.write(table, \n",
" TablesawParquetWriteOptions.builder(\"/content/green_2021-01.parquet\")\n",
" .build());"
],
"metadata": {
"id": "0KVeQcEj6WeI",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a567899a-e192-48d7-8eca-c618e14b7e3f"
},
"execution_count": 53,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).\n",
"log4j:WARN Please initialize the log4j system properly.\n",
"log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"Table tableFromParquet = new TablesawParquetReader()\n",
" .read(TablesawParquetReadOptions.builder(\"/content/green_2021-01.parquet\")\n",
" .build());\n",
"\n",
"tableFromParquet.print();"
],
"metadata": {
"id": "TnFXb72FKWqb",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "5a8e298a-9eec-499b-ee0e-58bec6633b74"
},
"execution_count": 60,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" /content/green_2021-01.parquet \n",
" VendorID | lpep_pickup_datetime | lpep_dropoff_datetime | store_and_fwd_flag | RatecodeID | PULocationID | DOLocationID | passenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | congestion_surcharge |\n",
"-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
" 2 | 2021-01-01T00:15:56.000 | 2021-01-01T00:19:52.000 | false | 1 | 43 | 151 | 1 | 1.01 | 5.5 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 6.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:25:59.000 | 2021-01-01T00:34:44.000 | false | 1 | 166 | 239 | 1 | 2.53 | 10 | 0.5 | 0.5 | 2.81 | 0 | | 0.3 | 16.86 | 1 | 1 | 2.75 |\n",
" 2 | 2021-01-01T00:45:57.000 | 2021-01-01T00:51:55.000 | false | 1 | 41 | 42 | 1 | 1.12 | 6 | 0.5 | 0.5 | 1 | 0 | | 0.3 | 8.3 | 1 | 1 | 0 |\n",
" 2 | 2020-12-31T23:57:51.000 | 2021-01-01T00:04:56.000 | false | 1 | 168 | 75 | 1 | 1.99 | 8 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 9.3 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:16:36.000 | 2021-01-01T00:16:40.000 | false | 2 | 265 | 265 | 3 | 0 | -52 | 0 | -0.5 | 0 | 0 | | -0.3 | -52.8 | 3 | 1 | 0 |\n",
" 2 | 2021-01-01T00:16:36.000 | 2021-01-01T00:16:40.000 | false | 2 | 265 | 265 | 3 | 0 | 52 | 0 | 0.5 | 0 | 0 | | 0.3 | 52.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:19:14.000 | 2021-01-01T00:19:21.000 | false | 5 | 265 | 265 | 1 | 0 | 180 | 0 | 0 | 36.06 | 0 | | 0.3 | 216.36 | 1 | 2 | 0 |\n",
" 2 | 2021-01-01T00:26:31.000 | 2021-01-01T00:28:50.000 | false | 1 | 75 | 75 | 6 | 0.45 | 3.5 | 0.5 | 0.5 | 0.96 | 0 | | 0.3 | 5.76 | 1 | 1 | 0 |\n",
" 2 | 2021-01-01T00:57:46.000 | 2021-01-01T00:57:57.000 | false | 1 | 225 | 225 | 1 | 0 | 2.5 | 0.5 | 0.5 | 0 | 0 | | 0.3 | 3.8 | 2 | 1 | 0 |\n",
" 2 | 2021-01-01T00:58:32.000 | 2021-01-01T01:32:34.000 | false | 1 | 225 | 265 | 1 | 12.19 | 38 | 0.5 | 0.5 | 2.75 | 0 | | 0.3 | 42.05 | 1 | 1 | 0 |\n",
" ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |\n",
" | 2021-01-15T10:03:00.000 | 2021-01-15T10:28:00.000 | | | 26 | 55 | | 7.48 | 25.43 | 0 | 0 | 2.75 | 0 | | 0.3 | 28.48 | | | |\n",
" | 2021-01-15T10:06:00.000 | 2021-01-15T10:17:00.000 | | | 42 | 41 | | 0.89 | 17.69 | 0 | 0 | 2.75 | 0 | | 0.3 | 20.74 | | | |\n",
" | 2021-01-15T10:53:00.000 | 2021-01-15T11:07:00.000 | | | 41 | 74 | | 1.64 | 15.45 | 0 | 0 | 2.75 | 0 | | 0.3 | 18.5 | | | |\n",
" | 2021-01-15T10:47:00.000 | 2021-01-15T10:58:00.000 | | | 177 | 225 | | 1.71 | 16.73 | 0 | 0 | 2.75 | 0 | | 0.3 | 19.78 | | | |\n",
" | 2021-01-15T10:43:00.000 | 2021-01-15T11:20:00.000 | | | 11 | 246 | | 12.55 | 52.46 | 0 | 0 | 2.75 | 0 | | 0.3 | 55.51 | | | |\n",
" | 2021-01-15T10:35:00.000 | 2021-01-15T10:51:00.000 | | | 3 | 147 | | 5.97 | 17.01 | 0 | 0 | 0 | 0 | | 0.3 | 17.31 | | | |\n",
" | 2021-01-15T10:25:00.000 | 2021-01-15T10:34:00.000 | | | 242 | 213 | | 3.83 | 27.27 | 0 | 0 | 2.75 | 0 | | 0.3 | 30.32 | | | |\n",
" | 2021-01-15T10:16:00.000 | 2021-01-15T10:20:00.000 | | | 181 | 181 | | 0.45 | 12.89 | 0 | 0 | 2.75 | 0 | | 0.3 | 15.94 | | | |\n",
" | 2021-01-15T10:16:00.000 | 2021-01-15T10:58:00.000 | | | 244 | 72 | | 22.21 | 50.67 | 0 | 0 | 2.75 | 6.12 | | 0.3 | 59.84 | | | |\n",
" | 2021-01-15T10:24:00.000 | 2021-01-15T10:46:00.000 | | | 227 | 33 | | 4.2 | 21.37 | 0 | 0 | 2.75 | 0 | | 0.3 | 24.42 | | | |"
]
},
"metadata": {},
"execution_count": 60
}
]
},
{
"cell_type": "code",
"source": [
"tableFromParquet.structure()"
],
"metadata": {
"id": "HJOFYZQLZiY_",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "fec36261-a684-4419-abc0-4cdec5f9f7ac"
},
"execution_count": 68,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Structure of /content/green_2021-01.parquet \n",
" Index | Column Name | Column Type |\n",
"-------------------------------------------------------\n",
" 0 | VendorID | INTEGER |\n",
" 1 | lpep_pickup_datetime | LOCAL_DATE_TIME |\n",
" 2 | lpep_dropoff_datetime | LOCAL_DATE_TIME |\n",
" 3 | store_and_fwd_flag | BOOLEAN |\n",
" 4 | RatecodeID | INTEGER |\n",
" 5 | PULocationID | INTEGER |\n",
" 6 | DOLocationID | INTEGER |\n",
" 7 | passenger_count | INTEGER |\n",
" 8 | trip_distance | DOUBLE |\n",
" 9 | fare_amount | DOUBLE |\n",
" 10 | extra | DOUBLE |\n",
" 11 | mta_tax | DOUBLE |\n",
" 12 | tip_amount | DOUBLE |\n",
" 13 | tolls_amount | DOUBLE |\n",
" 14 | ehail_fee | STRING |\n",
" 15 | improvement_surcharge | DOUBLE |\n",
" 16 | total_amount | DOUBLE |\n",
" 17 | payment_type | INTEGER |\n",
" 18 | trip_type | INTEGER |\n",
" 19 | congestion_surcharge | DOUBLE |"
]
},
"metadata": {},
"execution_count": 68
}
]
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "F9nmR5pnmriA"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment