live-wire/mnist.md

## mnist.md

      
    Raw
  

              mnist.md
            
          
    MNIST challenge creation for EvalAI

A tutorial on how I created the mnist-challenge

I started with going through this documentation about challenge creation. This involves creation of a yaml configuration file for the challenge.
Before we dive into the configuration, we need to establish a few things about the platform. Each challenge is split into phases decided by the host (us in this scenario). The host can have different dataset splits for different challenge phases. The host can set the visibility of the challenge phases and also of the leaderboards to HOST, OWNER & HOST or PUBLIC. The number and types of columns in the leaderboard is also configurable.
Configuration

Let me use the configuration I created for the challenge and go through it line by line:
title: MNIST Challenge 2018
short_description: The MNIST database is a good digit image database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
description: description.html
evaluation_details: evaluation_details.html
terms_and_conditions: terms_and_conditions.html
image : mnist.png
submission_guidelines: submission_guidelines.html
evaluation_script: evaluation_script.zip
start_date: 2018-03-15 20:00:00
end_date: 2018-12-30 20:00:00

leaderboard:
  - id: 1
    schema: {"labels": ["test_score"], "default_order_by": "test_score"}

challenge_phases:
  - id: 1
    name: test-mnist2018
    description: challenge_description.html
    leaderboard_public: True
    is_public: True
    start_date: 2018-03-15 20:00:00
    end_date: 2018-12-30 20:00:00
    test_annotation_file: test_annotation.txt
    codename: test-mnist2018
    max_submissions_per_day: 10
    max_submissions: 999


dataset_splits:
  - id: 1
    name: split1
    codename: split1
  - id: 2
    name: split2
    codename: split2
  

challenge_phase_splits:
  - challenge_phase_id: 1
    leaderboard_id: 1
    dataset_split_id: 1
    visibility: 3
  - challenge_phase_id: 1
    leaderboard_id: 1
    dataset_split_id: 2
    visibility: 3


Many of the keys mentioned in this configuration are self explanatory like title, short_description, start_date and end_date. 

The keys description, evaluation_details, terms_and_conditions and submission_guidelines expect HTML file names for the values. These html files will contain relevant information about the challenge like detailed instructions about the challenge, links to the datasets, evaluation criteria, sample submission format etc. The HTML files used for this config can be found here. 

The image key expects an image that will be used as the logo for the challenge on the platform. The evaluation_script key expects a zip file that contains the evaluation script. This is discussed in detail in the upcoming section.

The leaderboard key expects the schema of the leaderboard. It should contain the column names that will be present in the leaderboard and it also expects the host to specify which key will the results in the leaderboard be sorted on. For our example, I just created a column test_score in the leaderboard. One can create multiple kinds of leaderboards (for different challenge phases), but I just created one. 

The challenge_phase key expects multiple keys like name, codename, description, start_date, end_date, is_public, leaderboard_public and test_annotation_file. All the filds are self-explanatory except the test_annotation_file. This file is used for ranking the submission made by a participant. An annotation file can be shared by more than one challenge phase. I just created one phase for our challenge and marked everything as public.

The dataset_splits key expects id, name and codename for all the dataset splits that the challenge submissions will be evaluated on. I created two splits for the dataset split1 and split2 corresponding to train and test data.

Finally, the challenge_phase_splits key has a relation between a challenge phase and dataset splits for a challenge (many to many relation). I created two rules in this key making both the dataset-splits available to the only challenge phase and leaderboard that we have.


Dataset

I used the MNIST dataset from this challenge on Kaggle. 

Here are the download links for the dataset.


Link for train.csv
Link for test.csv
Link for sample_submission.csv

Evaluation Script

I used just the train.csv from the link above and split it up using sklearn's train_test_split function. I prepared the final training.csv (dataset for training (with labels)), testing.csv (dataset for evaluating models (without labels)), answers.csv (labels of the testing dataset) and submission.csv (sample submission file) from this dataset. The step-by-step preparation of these csv files can be found here. I saved all of these csv files in the Data/ folder.


The next part was writing the evaluation script. The documentation for the same can be found here. I used the following code block to see if the sample submission file works as expected:
import numpy as np
import pandas as pd

userfilename = "Data/submission.csv"
answerfilename = "Data/answers.csv"

user = pd.read_csv(userfilename)
answers = pd.read_csv(answerfilename)

matches = 0

for i in range(0, len(user)):
    if user.iloc[i]['label'] == answers.iloc[i]['label']:
        matches = matches+1

print("Score:",(matches/len(user))*100)


And I got a score of 10% which is very close to picking a random label for each of the testing data image. The sample submission file is a randomly ordered file. Hence, the submission file seems to be working as expected if we use this block of code in our evaluation function.
Using the code-block mentioned above, I finally ended up with the following evaluation function:
import numpy as np
import pandas as pd


def evaluate(annFile, resFile, phase_codename):
	# We will not use the annFile at all for evaluation
	# Since we have two dataset splits, and only one phase let's just hard code the dataset splits for now
	split_codename = "split2"
	# Split2  is the dataset split we want the results to be published for
	
	userfilename = resFile
	answerfilename = "Data/answers.csv"

	user = pd.read_csv(userfilename)
	answers = pd.read_csv(answerfilename)

	submission_result = ""
	result = {}
    result['result'] = []
    result["submission_result"] = submission_result

	if len(user) != len(answers):
		submission_result = "Number of rows in the training data ("+str(len(answers))+") and the submission file ("+str(len(user))+") don't match."
		result["submission_result"] = submission_result
		return result

	temp = {}
	temp[split_codename] = {}
	matches = 0
	for i in range(0, len(user)):
	    if user.iloc[i]['label'] == answers.iloc[i]['label']:
	        matches = matches+1
	score = (matches/len(user))*100
	print("Score:",score)
	temp[split_codename]['score'] = score
	result['result'].append(temp)
	submission_result = "Evaluated scores for the phase "+str(phase_codename)+". Score="+str(score)
	result["submission_result"] = submission_result
    return result

The result returned by this function will look something like this:
{
  'result': [
              {
                <split-codename>:{
                                  'score':80
                                  }
              }
            ],
  'submission_result': 'Evaluated scores for the phase <phase-codename>. Score=80'
}

Finally, I zipped the contents of the evaluation_script folder to evaluation_script.zip and the contents of the parent folder to mnist_challenge.zip and submitted it to the examples/ folder in the EvalAI repository. Here is the link to my pull request.