Skip to content

Instantly share code, notes, and snippets.

@philipphager
Created April 11, 2023 15:08
Show Gist options
  • Save philipphager/594d58e0466929ee6db14f1d790c8344 to your computer and use it in GitHub Desktop.
Save philipphager/594d58e0466929ee6db14f1d790c8344 to your computer and use it in GitHub Desktop.
mslr-web10k
# Dependencies to install. Pandas for dataframes, pyarrow to support the .parquet file format.
# pip install pandas
# pip install pyarrow
# Load a downloaded dataset from file:
train_df = pd.read_parquet("mslr_train.parquet")
train_df.head()
# The dataset contains 136 features per query-document (columns starting with 'feature_*')
# E.g. one way to select all columns starting with 'feature_'
train_df.filter(regex="^feature_", axis=1).head()
# The field query_id signals which query-document vectors belong to the same search query.
train_df["query_id"].head()
# The relevance column contains the human expert judgments how relevant each document was for the current query (scale 0 - 4)
train_df["relevance"].head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment