Skip to content

Instantly share code, notes, and snippets.

@OhadRubin
Last active April 26, 2022 05:42
Show Gist options
  • Save OhadRubin/58f1fe396533b5ae89b1907f142d38ba to your computer and use it in GitHub Desktop.
Save OhadRubin/58f1fe396533b5ae89b1907f142d38ba to your computer and use it in GitHub Desktop.
Infer huggingface datasets features from a single example
import datasets
from datasets import Value,Sequence,Features
def dict_generator(indict):
if isinstance(indict,str):
return datasets.Value("string")
elif isinstance(indict,int):
return datasets.Value("int32")
elif isinstance(indict,float):
return datasets.Value("float32")
elif isinstance(indict,list):
return datasets.Sequence(dict_generator(indict[0]))
elif isinstance(indict,dict):
out_dict = {}
for key,value in indict.items():
if key not in ["entities","answer_url"]:
out_dict[key] = dict_generator(value)
return out_dict
def infer_features(example):
return Features(dict_generator(example))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment