darthbear/instacart-schema.png Secret

## instacart-schema.png

      
    Raw
  

              instacart-schema.png
            
          
## simulmedia-roaringbitmap-post.md

      
    Raw
  

              simulmedia-roaringbitmap-post.md
            
          
    Data preparation

In 2017, Instacart open sourced a dataset comprised of over 3 million orders from more than 200,000 users on about 50,000 products.
It can be downloaded from https://www.instacart.com/datasets/grocery-shopping-2017.
After uncompressing it, one can see it contains 6 CSV files

orders.csv: description of the 3.4 million orders
products.csv: product description of around 50,000 products
order_products__prior.csv and order_products__train.csv: respectively products placed in previous orders and most recent orders of customers
departments.csv: department the products belong to
aisles.csv: aisles where the product can be found


We are only interested in this example by the 4 files orders.csv, orders_products__prior.csv, order_products__train.csv and products.csv.
Let’s create a user orders and a database orders:
createuser orders -s
createdb orders -O orders
Let's create the table orders and order_products in Postgres:
$ psql -U orders orders

drop table if exists orders;
create table orders
(
    order_id integer not null primary key,
    user_id integer,
    eval_set text,
    order_number integer,
    order_dow integer,
    order_hour_of_day integer,
    days_since_prior_order float
);

drop table if exists order_products;
create table order_products
(
    order_id integer not null,
    product_id integer not null,
    add_to_cart_order integer,
    reordered boolean,
    constraint order_products_pkey
        primary key (order_id, product_id)
);

create table products
(
    product_id integer not null primary key,
    product_name text,
    aisle_id integer,
    department_id integer
);

Then let's import the data into Postgres:
\copy orders FROM 'orders.csv' WITH CSV HEADER;
\copy order_products FROM 'order_products__prior.csv' WITH CSV HEADER;
\copy order_products FROM 'order_products__train.csv' WITH CSV HEADER;
\copy products FROM 'products.csv' WITH CSV HEADER;