I started with going through this documentation about challenge creation. This involves creation of a yaml configuration file for the challenge.
Before we dive into the configuration, we need to establish a few things about the platform. Each challenge is split into phases decided by the host (us in this scenario). The host can have different dataset splits for different challenge phases. The host can set the visibility of the challenge phases and also of the leaderboards to HOST
, OWNER & HOST
or PUBLIC
. The number and types of columns in the leaderboard is also configurable.
Let me use the configuration I created for the challenge and go through it line by line:
title: MNIST Challenge 2018
short_description: The MNIST database is a good digit image database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
description: description.html
evaluation_details: evaluation_details.html
terms_and_conditions: terms_and_conditions.html
image : mnist.png
submission_guidelines: submission_guidelines.html
evaluation_script: evaluation_script.zip
start_date: 2018-03-15 20:00:00
end_date: 2018-12-30 20:00:00
leaderboard:
- id: 1
schema: {"labels": ["test_score"], "default_order_by": "test_score"}
challenge_phases:
- id: 1
name: test-mnist2018
description: challenge_description.html
leaderboard_public: True
is_public: True
start_date: 2018-03-15 20:00:00
end_date: 2018-12-30 20:00:00
test_annotation_file: test_annotation.txt
codename: test-mnist2018
max_submissions_per_day: 10
max_submissions: 999
dataset_splits:
- id: 1
name: split1
codename: split1
- id: 2
name: split2
codename: split2
challenge_phase_splits:
- challenge_phase_id: 1
leaderboard_id: 1
dataset_split_id: 1
visibility: 3
- challenge_phase_id: 1
leaderboard_id: 1
dataset_split_id: 2
visibility: 3
- Many of the keys mentioned in this configuration are self explanatory like
title
,short_description
,start_date
andend_date
. - The keys
description
,evaluation_details
,terms_and_conditions
andsubmission_guidelines
expect HTML file names for the values. These html files will contain relevant information about the challenge like detailed instructions about the challenge, links to the datasets, evaluation criteria, sample submission format etc. The HTML files used for this config can be found here. - The
image
key expects an image that will be used as the logo for the challenge on the platform. Theevaluation_script
key expects a zip file that contains the evaluation script. This is discussed in detail in the upcoming section. - The
leaderboard
key expects the schema of the leaderboard. It should contain the column names that will be present in the leaderboard and it also expects the host to specify which key will the results in the leaderboard be sorted on. For our example, I just created a column test_score in the leaderboard. One can create multiple kinds of leaderboards (for different challenge phases), but I just created one. - The
challenge_phase
key expects multiple keys like name, codename, description, start_date, end_date, is_public, leaderboard_public and test_annotation_file. All the filds are self-explanatory except the test_annotation_file. This file is used for ranking the submission made by a participant. An annotation file can be shared by more than one challenge phase. I just created one phase for our challenge and marked everything as public. - The
dataset_splits
key expects id, name and codename for all the dataset splits that the challenge submissions will be evaluated on. I created two splits for the dataset split1 and split2 corresponding to train and test data. - Finally, the
challenge_phase_splits
key has a relation between a challenge phase and dataset splits for a challenge (many to many relation). I created two rules in this key making both the dataset-splits available to the only challenge phase and leaderboard that we have.
I used the MNIST dataset from this challenge on Kaggle.
Here are the download links for the dataset.
I used just the train.csv from the link above and split it up using sklearn's train_test_split function. I prepared the final training.csv (dataset for training (with labels)), testing.csv (dataset for evaluating models (without labels)), answers.csv (labels of the testing dataset) and submission.csv (sample submission file) from this dataset. The step-by-step preparation of these csv files can be found here. I saved all of these csv files in the Data/
folder.
The next part was writing the evaluation script. The documentation for the same can be found here. I used the following code block to see if the sample submission file works as expected:
import numpy as np
import pandas as pd
userfilename = "Data/submission.csv"
answerfilename = "Data/answers.csv"
user = pd.read_csv(userfilename)
answers = pd.read_csv(answerfilename)
matches = 0
for i in range(0, len(user)):
if user.iloc[i]['label'] == answers.iloc[i]['label']:
matches = matches+1
print("Score:",(matches/len(user))*100)
And I got a score of 10% which is very close to picking a random label for each of the testing data image. The sample submission file is a randomly ordered file. Hence, the submission file seems to be working as expected if we use this block of code in our evaluation function.
Using the code-block mentioned above, I finally ended up with the following evaluation function:
import numpy as np
import pandas as pd
def evaluate(annFile, resFile, phase_codename):
# We will not use the annFile at all for evaluation
# Since we have two dataset splits, and only one phase let's just hard code the dataset splits for now
split_codename = "split2"
# Split2 is the dataset split we want the results to be published for
userfilename = resFile
answerfilename = "Data/answers.csv"
user = pd.read_csv(userfilename)
answers = pd.read_csv(answerfilename)
submission_result = ""
result = {}
result['result'] = []
result["submission_result"] = submission_result
if len(user) != len(answers):
submission_result = "Number of rows in the training data ("+str(len(answers))+") and the submission file ("+str(len(user))+") don't match."
result["submission_result"] = submission_result
return result
temp = {}
temp[split_codename] = {}
matches = 0
for i in range(0, len(user)):
if user.iloc[i]['label'] == answers.iloc[i]['label']:
matches = matches+1
score = (matches/len(user))*100
print("Score:",score)
temp[split_codename]['score'] = score
result['result'].append(temp)
submission_result = "Evaluated scores for the phase "+str(phase_codename)+". Score="+str(score)
result["submission_result"] = submission_result
return result
The result returned by this function will look something like this:
{
'result': [
{
<split-codename>:{
'score':80
}
}
],
'submission_result': 'Evaluated scores for the phase <phase-codename>. Score=80'
}
Finally, I zipped the contents of the evaluation_script folder to evaluation_script.zip
and the contents of the parent folder to mnist_challenge.zip
and submitted it to the examples/
folder in the EvalAI repository. Here is the link to my pull request.