Skip to content

Instantly share code, notes, and snippets.

@philipphager
Created April 19, 2023 06:44
Show Gist options
  • Save philipphager/24c4785626239d0f12b4431259622c8e to your computer and use it in GitHub Desktop.
Save philipphager/24c4785626239d0f12b4431259622c8e to your computer and use it in GitHub Desktop.
# Dependencies to install: Pandas for dataframes, pyarrow to support the feather file format.
# pip install pandas
# pip install pyarrow
# Load a downloaded dataset from file:
train_df = pd.read_feather("train.feather")
train_df.head()
# The dataset contains 220 features per query-document (columns starting with 'feature_*')
# Here is one of many ways to select all columns starting with 'feature_'
train_df.filter(regex="^feature_", axis=1).head()
# The field query_id signals which query-document vectors belong to the same search query.
train_df["query_id"].head()
# The relevance column contains the human expert judgments how relevant each document was for the current query (scale 0 - 4)
train_df["relevance"].head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment