Skip to content

Instantly share code, notes, and snippets.

Created May 2, 2017 20:23
Star You must be signed in to star a gist
What would you like to do?
The Instacart Online Grocery Shopping Dataset 2017 Data Descriptions

orders (3.4m rows, 206k users):

  • order_id: order identifier
  • user_id: customer identifier
  • eval_set: which evaluation set this order belongs in (see SET described below)
  • order_number: the order sequence number for this user (1 = first, n = nth)
  • order_dow: the day of the week the order was placed on
  • order_hour_of_day: the hour of the day the order was placed on
  • days_since_prior: days since the last order, capped at 30 (with NAs for order_number = 1)

products (50k rows):

  • product_id: product identifier
  • product_name: name of the product
  • aisle_id: foreign key
  • department_id: foreign key

aisles (134 rows):

  • aisle_id: aisle identifier
  • aisle: the name of the aisle

deptartments (21 rows):

  • department_id: department identifier
  • department: the name of the department

order_products__SET (30m+ rows):

  • order_id: foreign key
  • product_id: foreign key
  • add_to_cart_order: order in which each product was added to cart
  • reordered: 1 if this product has been ordered by this user in the past, 0 otherwise

where SET is one of the four following evaluation sets (eval_set in orders):

  • "prior": orders prior to that users most recent order (~3.2m orders)
  • "train": training data supplied to participants (~131k orders)
  • "test": test data reserved for machine learning competitions (~75k orders)
Copy link

MichaelChirico commented May 23, 2017

Three questions:

  1. order_dow = 0 corresponds to Sunday?
  2. By days_since_prior is capped at 30, you mean it's censored or truncated? Censored means all values >= 30 are coerced to 30; truncated means all values above 30 were removed. It appears to be the former.
  3. Is there truncation going on with respect to the number of orders included for some users? There's a big mass point of users with exactly 99 orders.

Copy link

croach commented May 31, 2017

I believe the line describing the types of SET, should be "where SET is one of the three following evaluation sets (eval_set in orders):" instead of "four".

Copy link

Any updates on the question: order_dow = 0 corresponds to Sunday?
To me, 0 seems to represent Monday.

Copy link

ghost commented Aug 1, 2017

Can somebody explain what prior, train, and test flags mean exactly? Test means a test data set, but prior and train data sets are kind of confusing.

Copy link

magaton commented Oct 18, 2017

Hello, there is no quantity field in order_products_SET.
Maybe stupid question, but how then can you ask for product purchase forecast if you don't take quantity in the previous orders into account?


Copy link

mzhKU commented Nov 28, 2017

@magaton In principle, the quantity is implied in orders. You have the quantity of a product by how many times it's added to a specific order.

Copy link

Hello, Did anyone see order_products__SET mentioned above? I want to querry purchase patterns for products using SQL but I need some table to connect products to orders.


Copy link

for test data "
product_id , add_to_cart_order,
reordered" is not available.

Copy link

@magaton In principle, the quantity is implied in orders. You have the quantity of a product by how many times it's added to a specific order.

@mzhKU have checked the data, the quantity is not implied in 'orders', because there is no duplicate product_id in same order_id

Copy link

There is no pricing details available for the product .
I would like to do prediction for the product for which has been bough by customers. It would be great if we can have customer spending amount or salary of the customer.

Copy link

hg568 commented Jul 2, 2019

Where is the file of order_product_test? I couldn't find it in the dataset.

Copy link

I cannot see the test data in my files that i downloaded from

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment