ashutoshbsathe/imagenet-val-set.md

## imagenet-val-set.md

      
    Raw
  

              imagenet-val-set.md
            
          
    Ground Truth Labels for ImageNet LSVRC2012 Validation Set

Originally these labels were available at ImageNet website. The website now returns invalid page.
Moreover, downloading the original data from ImageNet website is painfully slow. Downloading the validation set from AcademicTorrents is fast enough for everyone's need.
The only pain is that you don't get original labels from ImageNet website.
Fortunately, we can solve this using following piece of python code:
import xml.etree.ElementTree as ET
import os 
import yaml
from tqdm import tqdm

# https://raw.githubusercontent.com/tensorflow/models/master/research/inception/inception/data/imagenet_2012_validation_synset_labels.txt
VALIDATION_SYNSET_LABELS = '/path/to/validation/labels/in/synset' # (above link)
# https://gist.githubusercontent.com/fnielsen/4a5c94eaa6dcdf29b7a62d886f540372/raw/d25516d26be4a8d3e0aeebe9275631754b8e2c73/imagenet_label_to_wordnet_synset.txt
LABEL_TO_SYNSET_MAP_FILE = '/path/to/synset/to/validation/labels/mapping' # (above link)
OUTPUT_LABEL_TXT = './ground_truth_ilsvrc2012_val.txt' # output ground truth txt

def main():
    with open(LABEL_TO_SYNSET_MAP_FILE, 'r') as f:
        labels_synset_json = f.read().replace('\n', ' ')#.replace('\'', '\"')
    labels_synset = yaml.load(labels_synset_json)
    synset_to_label_dict = {}
    for k, v in labels_synset.items():
        synset_to_label_dict['n' + v['id'].split('-')[0]] = k
        # print('dict[{}] = {}'.format('n' + v['id'][:8], k))
    with open(OUTPUT_LABEL_TXT, 'w') as f:
        print('Created empty file : {}'.format(OUTPUT_LABEL_TXT))
    with open(VALIDATION_SYNSET_LABELS, 'r') as f:
        lines = f.readlines()
    for synset in tqdm(lines):
        with open(OUTPUT_LABEL_TXT, 'a') as f:
            try:
                f.write('{}\n'.format(synset_to_label_dict[synset.replace('\n', '')]))
            except KeyError:
                if synset.replace('\n', '') == 'n02012849':
                    # it's a crane class, either replace with class 134 or 517
                    # the above synset appears only 50 times apparently, so we can 
                    # let go of these samples IMO
                    f.write('134\n')

if __name__ == '__main__':
    main()