orders
(3.4m rows, 206k users):
order_id
: order identifieruser_id
: customer identifiereval_set
: which evaluation set this order belongs in (seeSET
described below)order_number
: the order sequence number for this user (1 = first, n = nth)order_dow
: the day of the week the order was placed onorder_hour_of_day
: the hour of the day the order was placed ondays_since_prior
: days since the last order, capped at 30 (with NAs fororder_number
= 1)
products
(50k rows):
product_id
: product identifierproduct_name
: name of the productaisle_id
: foreign keydepartment_id
: foreign key
aisles
(134 rows):
aisle_id
: aisle identifieraisle
: the name of the aisle
deptartments
(21 rows):
department_id
: department identifierdepartment
: the name of the department
order_products__SET
(30m+ rows):
order_id
: foreign keyproduct_id
: foreign keyadd_to_cart_order
: order in which each product was added to cartreordered
: 1 if this product has been ordered by this user in the past, 0 otherwise
where SET
is one of the four following evaluation sets (eval_set
in orders
):
"prior"
: orders prior to that users most recent order (~3.2m orders)"train"
: training data supplied to participants (~131k orders)"test"
: test data reserved for machine learning competitions (~75k orders)
@mzhKU have checked the data, the quantity is not implied in 'orders', because there is no duplicate product_id in same order_id