Skip to content

Instantly share code, notes, and snippets.

@ashutoshbsathe
Last active July 5, 2020 14:48
Show Gist options
  • Save ashutoshbsathe/27558207fd5f0bc6a769bef6ff8eb96a to your computer and use it in GitHub Desktop.
Save ashutoshbsathe/27558207fd5f0bc6a769bef6ff8eb96a to your computer and use it in GitHub Desktop.
Generating ground truth labels for ImageNet LSVRC 2012 Validation Set downloaded from AcademicTorrents

Ground Truth Labels for ImageNet LSVRC2012 Validation Set

Originally these labels were available at ImageNet website. The website now returns invalid page. Moreover, downloading the original data from ImageNet website is painfully slow. Downloading the validation set from AcademicTorrents is fast enough for everyone's need.

The only pain is that you don't get original labels from ImageNet website.

Fortunately, we can solve this using following piece of python code:

import xml.etree.ElementTree as ET
import os 
import yaml
from tqdm import tqdm

# https://raw.githubusercontent.com/tensorflow/models/master/research/inception/inception/data/imagenet_2012_validation_synset_labels.txt
VALIDATION_SYNSET_LABELS = '/path/to/validation/labels/in/synset' # (above link)
# https://gist.githubusercontent.com/fnielsen/4a5c94eaa6dcdf29b7a62d886f540372/raw/d25516d26be4a8d3e0aeebe9275631754b8e2c73/imagenet_label_to_wordnet_synset.txt
LABEL_TO_SYNSET_MAP_FILE = '/path/to/synset/to/validation/labels/mapping' # (above link)
OUTPUT_LABEL_TXT = './ground_truth_ilsvrc2012_val.txt' # output ground truth txt

def main():
    with open(LABEL_TO_SYNSET_MAP_FILE, 'r') as f:
        labels_synset_json = f.read().replace('\n', ' ')#.replace('\'', '\"')
    labels_synset = yaml.load(labels_synset_json)
    synset_to_label_dict = {}
    for k, v in labels_synset.items():
        synset_to_label_dict['n' + v['id'].split('-')[0]] = k
        # print('dict[{}] = {}'.format('n' + v['id'][:8], k))
    with open(OUTPUT_LABEL_TXT, 'w') as f:
        print('Created empty file : {}'.format(OUTPUT_LABEL_TXT))
    with open(VALIDATION_SYNSET_LABELS, 'r') as f:
        lines = f.readlines()
    for synset in tqdm(lines):
        with open(OUTPUT_LABEL_TXT, 'a') as f:
            try:
                f.write('{}\n'.format(synset_to_label_dict[synset.replace('\n', '')]))
            except KeyError:
                if synset.replace('\n', '') == 'n02012849':
                    # it's a crane class, either replace with class 134 or 517
                    # the above synset appears only 50 times apparently, so we can 
                    # let go of these samples IMO
                    f.write('134\n')

if __name__ == '__main__':
    main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment