Skip to content

Instantly share code, notes, and snippets.

@jeremystan
Created May 2, 2017 20:23
Show Gist options
  • Save jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b to your computer and use it in GitHub Desktop.
Save jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b to your computer and use it in GitHub Desktop.
The Instacart Online Grocery Shopping Dataset 2017 Data Descriptions

orders (3.4m rows, 206k users):

  • order_id: order identifier
  • user_id: customer identifier
  • eval_set: which evaluation set this order belongs in (see SET described below)
  • order_number: the order sequence number for this user (1 = first, n = nth)
  • order_dow: the day of the week the order was placed on
  • order_hour_of_day: the hour of the day the order was placed on
  • days_since_prior: days since the last order, capped at 30 (with NAs for order_number = 1)

products (50k rows):

  • product_id: product identifier
  • product_name: name of the product
  • aisle_id: foreign key
  • department_id: foreign key

aisles (134 rows):

  • aisle_id: aisle identifier
  • aisle: the name of the aisle

deptartments (21 rows):

  • department_id: department identifier
  • department: the name of the department

order_products__SET (30m+ rows):

  • order_id: foreign key
  • product_id: foreign key
  • add_to_cart_order: order in which each product was added to cart
  • reordered: 1 if this product has been ordered by this user in the past, 0 otherwise

where SET is one of the four following evaluation sets (eval_set in orders):

  • "prior": orders prior to that users most recent order (~3.2m orders)
  • "train": training data supplied to participants (~131k orders)
  • "test": test data reserved for machine learning competitions (~75k orders)
@magaton
Copy link

magaton commented Oct 18, 2017

Hello, there is no quantity field in order_products_SET.
Maybe stupid question, but how then can you ask for product purchase forecast if you don't take quantity in the previous orders into account?

Thanks

@mzhKU
Copy link

mzhKU commented Nov 28, 2017

@magaton In principle, the quantity is implied in orders. You have the quantity of a product by how many times it's added to a specific order.

@ronyarmon
Copy link

Hello, Did anyone see order_products__SET mentioned above? I want to querry purchase patterns for products using SQL but I need some table to connect products to orders.

Thanks

@Ayush4816
Copy link

for test data "
product_id , add_to_cart_order,
reordered" is not available.

@loring-wu
Copy link

@magaton In principle, the quantity is implied in orders. You have the quantity of a product by how many times it's added to a specific order.

@mzhKU have checked the data, the quantity is not implied in 'orders', because there is no duplicate product_id in same order_id

@manishvisa
Copy link

There is no pricing details available for the product .
I would like to do prediction for the product for which has been bough by customers. It would be great if we can have customer spending amount or salary of the customer.

@hg568
Copy link

hg568 commented Jul 2, 2019

Where is the file of order_product_test? I couldn't find it in the dataset.

@sonikasood
Copy link

I cannot see the test data in my files that i downloaded from https://www.instacart.com/datasets/grocery-shopping-2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment