Skip to content

Instantly share code, notes, and snippets.

Created May 2, 2017 20:23
Show Gist options
  • Save jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b to your computer and use it in GitHub Desktop.
Save jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b to your computer and use it in GitHub Desktop.
The Instacart Online Grocery Shopping Dataset 2017 Data Descriptions

orders (3.4m rows, 206k users):

  • order_id: order identifier
  • user_id: customer identifier
  • eval_set: which evaluation set this order belongs in (see SET described below)
  • order_number: the order sequence number for this user (1 = first, n = nth)
  • order_dow: the day of the week the order was placed on
  • order_hour_of_day: the hour of the day the order was placed on
  • days_since_prior: days since the last order, capped at 30 (with NAs for order_number = 1)

products (50k rows):

  • product_id: product identifier
  • product_name: name of the product
  • aisle_id: foreign key
  • department_id: foreign key

aisles (134 rows):

  • aisle_id: aisle identifier
  • aisle: the name of the aisle

deptartments (21 rows):

  • department_id: department identifier
  • department: the name of the department

order_products__SET (30m+ rows):

  • order_id: foreign key
  • product_id: foreign key
  • add_to_cart_order: order in which each product was added to cart
  • reordered: 1 if this product has been ordered by this user in the past, 0 otherwise

where SET is one of the four following evaluation sets (eval_set in orders):

  • "prior": orders prior to that users most recent order (~3.2m orders)
  • "train": training data supplied to participants (~131k orders)
  • "test": test data reserved for machine learning competitions (~75k orders)
Copy link

@magaton In principle, the quantity is implied in orders. You have the quantity of a product by how many times it's added to a specific order.

@mzhKU have checked the data, the quantity is not implied in 'orders', because there is no duplicate product_id in same order_id

Copy link

There is no pricing details available for the product .
I would like to do prediction for the product for which has been bough by customers. It would be great if we can have customer spending amount or salary of the customer.

Copy link

hg568 commented Jul 2, 2019

Where is the file of order_product_test? I couldn't find it in the dataset.

Copy link

I cannot see the test data in my files that i downloaded from

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment