Skip to content

Instantly share code, notes, and snippets.

@y-lan
Last active October 17, 2023 03:51
Show Gist options
  • Save y-lan/b0fd15043b1e93a900ed5e903a922441 to your computer and use it in GitHub Desktop.
Save y-lan/b0fd15043b1e93a900ed5e903a922441 to your computer and use it in GitHub Desktop.
from datasets import Dataset, DatasetDict
import pandas as pd
train_ratio = 0.9
tdf = df.sample(frac=train_ratio, random_state=14)
vdf = df.drop(tdf.index)
tds = Dataset.from_pandas(tdf, preserve_index=False)
vds = Dataset.from_pandas(vdf, preserve_index=False)
ds = DatasetDict()
ds['train'] = tds
ds['validation'] = vds
ds.push_to_hub("account/dataset_name", private=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment