Skip to content

Instantly share code, notes, and snippets.

@ResidentMario
Created January 2, 2019 22:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ResidentMario/05fc9e290ce873d38207e19e52ce6691 to your computer and use it in GitHub Desktop.
Save ResidentMario/05fc9e290ce873d38207e19e52ce6691 to your computer and use it in GitHub Desktop.
categories = ["Sandwich", "Hamburger", "Hot dog"]
# Download the class names, boxed image, and image id metadata
class_names = pd.read_csv(
"https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv",
header=None, names=['LabelID', 'LabelName'])
train_boxed = pd.read_csv(
"https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv",
index_col=0)
image_ids = pd.read_csv(
"https://storage.googleapis.com/openimages/2018_04/train/train-images-boxable-with-rotation.csv",
index_col=0)
# Get category IDs for the given categories and sub-select train_boxed with them.
label_map = dict(class_names.set_index('LabelName').loc[categories, 'LabelID']
.to_frame().reset_index().set_index('LabelID')['LabelName'])
label_values = set(label_map.keys())
relevant_training_images = train_boxed[train_boxed.LabelName.isin(label_values)]
# Select relevant flickr image URLs and their metadata
relevant_flickr_urls = (relevant_training_images.set_index('ImageID')
.join(image_ids.set_index('ImageID'))
.loc[:, 'OriginalURL'])
relevant_flickr_img_metadata = (relevant_training_images.set_index('ImageID').loc[relevant_flickr_urls.index]
.pipe(lambda df: df.assign(LabelValue=df.LabelName.map(lambda v: label_map[v]))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment