Skip to content

Instantly share code, notes, and snippets.

@AshStuff
Last active July 20, 2016 00:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AshStuff/3c4bc6e7ed3cc61eb5a5472efcf79b28 to your computer and use it in GitHub Desktop.
Save AshStuff/3c4bc6e7ed3cc61eb5a5472efcf79b28 to your computer and use it in GitHub Desktop.

#State Farm Distracted Driver Detection

Data Set Information

##1.How many images in total? Training images= 22424 Test images = 79726 ##2.How many images for each class? Aproximately 2200 images in each class. Total 10 classes. ##3.How many drivers are there? 10 drivers. ##4. Is it possible that the driver id is related to the classification accuracy? No, I don't think so. driver ID is used to categorize the driver into different class. ##5. What is the original size of images? 640 x 480 ##Preprocessing images 1.images are croped to image[70:,100:550] Reason- By cropping like this only the driver is selected in most of the images. This can reduce variance. 2.images are resized to 224 x 224. so that the model can fit into vgg and googlenet model. 3.images are randomly rotated with angle 10. Not sure about the perfect reason. 4.images are shuffled so that learning can not be biased. 5.Used 5 fold cross validation with random shuffle in each fold. 6.10% of the train data as validation and 90 % of the data as training data. ##Experiments Setting and Results model 1 VGG 16 layer +dropouts (pre-trained) with epochs = 30 and batchsize= 32,lr=0.0001 model 2 Googlenet with Batch normalization (pre-trained) with epochs = 30 and batchsize= 32,lr=0.0001 VGG16 validation accuracy taken from average of 5 fold cross validation 98 % Googlenet + BN taken from average of 5 fold cross validation is 99% ##Evaluation evaluation method : multi class loss
leaderboard is calculated on approximately 31% of the test data. The final results will be based on the other 69%, so the final standings may be different.

Evaluation formula:
#y_true = true labels 
#predictions=predictions
predictions = np.clip(y_pred, eps, 1 - eps)

#normalize row sums to 1
predictions /= predictions.sum(axis=1)[:, np.newaxis]
actual = np.zeros(y_pred.shape)
rows = actual.shape[0]
actual[np.arange(rows), y_true.astype(int)] = 1
vsota = np.sum(actual * np.log(predictions))
log_loss=-1.0 / rows * vsota

Final evaluation score = 1.69 ( with vgg model) , running with ensemble version of vgg and inception + BN -> yet to evaluate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment