Skip to content

Instantly share code, notes, and snippets.

@mdvsh
Created January 12, 2020 05:05
Show Gist options
  • Save mdvsh/06f623772724521c6151774907aed5e8 to your computer and use it in GitHub Desktop.
Save mdvsh/06f623772724521c6151774907aed5e8 to your computer and use it in GitHub Desktop.
Creating a model to "judge a book by it's cover". | GCI Julia
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Judge books by their cover using Flux.jl \n",
"## @author : PseudoCodeNerd\n",
"\n",
"> (Don't) judge a book by its cover.\n",
"\n",
"### Task Description\n",
"Create a machine learning model to predict the category of a book from its cover image\n",
"This task is inspired by this [paper](https://arxiv.org/pdf/1610.09204.pdf). Your task is to use the Flux machine learning library to predict the category of books in this dataset based on their cover images.\n",
"\n",
"You can find the Flux documentation [here](https://fluxml.ai/Flux.jl/stable/) and sample models for image categorization in the model zoo. We recommend starting with a simple model like [this](https://github.com/FluxML/model-zoo/blob/master/vision/mnist/mlp.jl) one and then optionally using a more complex one if you are interested.\n",
"\n",
"### Aim : \n",
"In this notebook, I'll attempt to judge a book by it's cover (sorry Mom!). Pretty Simple right ? I think not...\n",
"Shoutout to Akshat Mehortra and Mudit Somani for their helpful message in GCI Slack.\n",
"\n",
"## 1. Importing required libraries.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"using Flux\n",
"using CSV, Images, FileIO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Getting the data\n",
"Data is sourced from [The Book DatasSet](https://github.com/uchidalab/book-dataset). We'll use `FileIO` to get it into a variable.\n",
"It'd been better if the researcher could have made a script to download the full images in Julia also. I'll try doing that myself when I get some free time.\n",
"\n",
"Data Courtesy : \n",
"> B. K. Iwana, S. T. Raza Rizvi, S. Ahmed, A. Dengel, and S. Uchida, \"Judging a Book by its Cover,\" arXiv preprint arXiv:1610.09204 (2016)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"data_train_csv = CSV.File(\"book30-listing-train.csv\");"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7-element CSV.Row{false}:\n",
" \"520271181\" \n",
" \"0520271181.jpg\" \n",
" \"http://ecx.images-amazon.com/images/I/51s8awrmTRL.jpg\" \n",
" \"Becoming Dr. Q: My Journey from Migrant Farm Worker to Brain Surgeon\"\n",
" \"Alfredo Quinones-Hinojosa\" \n",
" 1 \n",
" \"Biographies & Memoirs\" "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_train_csv[42]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So we can see that every item (or row here) is of the form,\n",
"\n",
"**ID | FileName | Image URL | Title | Author | CategoryNum | Category**\n",
"\n",
"From the data README on GitHub, we come to know that there are 30 categories of books, each 1710 train and 190 test images.\n",
"\n",
"**Total Number of images : 51,300 (Train) | 5,700 (Test)** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Data pre-processing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our model will accept an image as a Floating Vector. I'll also convert it to greyscale as directed by Image Classification workflows in ML community. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"grey_arr (generic function with 1 method)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function grey_arr(img)\n",
" return vec(Float64.(Gray.(img)))\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating batches of training images using Flux's `Batch` and using `onehot` for getting the categories of book images into another array."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"create_batch (generic function with 1 method)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function batcher(size)\n",
" for x in data_train_csv[1:size]\n",
" images = [grey_arr(load(\"./data/$(x[2])\"))];\n",
" labels = [Flux.onehot(x[6]+1,1:30)]; #plus 1 to account for 1 based indexing \n",
" end\n",
" return (Flux.batch(images), Flux.batch(labels))\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Making batches of 2000/1000 book images using our newly created function."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"trainbatch = batcher(2000);\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"([0.7568627450980392 0.6 … 0.4117647058823529 0.12549019607843137; 0.6862745098039216 0.47058823529411764 … 0.4117647058823529 0.12549019607843137; … ; 0.8 0.22352941176470587 … 0.12941176470588234 0.2980392156862745; 0.9607843137254902 0.2784313725490196 … 0.12941176470588234 0.3529411764705882], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 1 0; 0 0 … 0 1])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"trainbatch_2 = batcher(1000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Setting up our model, defining hyperparameters, adding loss, accuracy and optimiser functions. \n",
"The image is of dimensions `224x224x3` so we'll feed our Vanilla Neural Network with a 224x224 input. The expected output is one of the 30 labels of the book genre.\n",
"\n",
"Therefore,\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"const alpha = 0.000075;\n",
"const epoch = 20;\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using a NN with 3 layers as my fellow peers at GCI said that they were themselves unable to get a conv NN work.\n",
"\n",
"relu as an activation function because it's my go to with image classification tasks and also of its non-saturation of gradient, which greatly accelerates the convergence of stochastic gradient descent compared to the sigmoid / tanh functions.\n",
"\n",
"softmax to return a 30 element array with probabilities of the predicted labels."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chain(Dense(50176, 512, relu), Dense(512, 64), Dense(64, 30), softmax)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = Chain(Dense(224*224, 512, relu),\n",
"Dense(512, 64),\n",
"Dense(64, 30), softmax,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"using Flux: onehotbatch, crossentropy, throttle\n",
"using Statistics"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"mod_cb (generic function with 1 method)"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"optim = ADAM(alpha);\n",
"loss(x,y) = Flux.crossentropy(model(x), y);\n",
"acc(a,b) = mean(Flux.onecold(model(a)).== Flux.onecold(b));\n",
"function mod_cb()\n",
" c_acc = acc(trainbatch_2...)\n",
" c_loss = loss(trainbatch_2...)\n",
" print(\"Current Accuracy: \", string(c_acc), \" | Current Loss : \", string(c_loss), \" ;\\n\")\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Training process"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Current Accuracy: 0.075 | Current Loss : 7.827263 ;Current Accuracy: 0.057 | Current Loss : 7.081502 ;Current Accuracy: 0.04 | Current Loss : 5.475811 ;Current Accuracy: 0.049 | Current Loss : 4.304302 ;"
]
}
],
"source": [
"Flux.train!(loss, params(model), Iterators.repeated(trainbatch_2, 10), optim, cb = Flux.throttle(mod_cb, 10))"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Current Accuracy: 0.083 | Current Loss : 4.1588893 ;Current Accuracy: 0.082 | Current Loss : 3.8012412 ;Current Accuracy: 0.07 | Current Loss : 3.4713938 ;Current Accuracy: 0.102 | Current Loss : 3.3677185 ;"
]
}
],
"source": [
"Flux.train!(loss, params(model), Iterators.repeated(trainbatch, 10), optim, cb = Flux.throttle(mod_cb, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we can see that the accuracy nearly doubled, Lets train it further and also the iterations."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Current Accuracy: 0.101 | Current Loss : 3.3449063 ;Current Accuracy: 0.103 | Current Loss : 3.2926059 ;Current Accuracy: 0.123 | Current Loss : 3.228091 ;Current Accuracy: 0.141 | Current Loss : 3.1849866 ;Current Accuracy: 0.142 | Current Loss : 3.1404302 ;Current Accuracy: 0.137 | Current Loss : 3.1053653 ;Current Accuracy: 0.156 | Current Loss : 3.0754461 ;Current Accuracy: 0.154 | Current Loss : 3.0544689 ;Current Accuracy: 0.166 | Current Loss : 3.0326622 ;Current Accuracy: 0.181 | Current Loss : 3.0075598 ;Current Accuracy: 0.193 | Current Loss : 2.9763196 ;Current Accuracy: 0.192 | Current Loss : 2.9434323 ;Current Accuracy: 0.216 | Current Loss : 2.920823 ;Current Accuracy: 0.227 | Current Loss : 2.893316 ;Current Accuracy: 0.232 | Current Loss : 2.8663476 ;Current Accuracy: 0.253 | Current Loss : 2.8385205 ;Current Accuracy: 0.255 | Current Loss : 2.8103878 ;"
]
}
],
"source": [
"trainbatch_3 = create_batch(3000)\n",
"Flux.train!(loss, params(model), Iterators.repeated(trainbatch, 50), optim, cb = Flux.throttle(mod_cb, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We get a train accuracy of **25.5 %** which is swell."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Testing Time\n"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3.0033288f0"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss(trainbatch_3...)\n"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.19166666666666668"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"acc(trainbatch_3...)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Loading and predicting label for a new image."
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"224×224 Array{RGB{N0f8},2} with eltype RGB{Normed{UInt8,8}}:\n",
" RGB{N0f8}(0.145,0.192,0.294) … RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) … RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) … RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" ⋮ ⋱ \n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) … RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) … RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)\n",
" RGB{N0f8}(0.145,0.192,0.294) RGB{N0f8}(0.231,0.439,0.337)"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load(\"./data/$(data_test_csv[7][2])\")"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7-element CSV.Row{false}:\n",
" \"521456924\" \n",
" \"0521456924.jpg\" \n",
" \"http://ecx.images-amazon.com/images/I/41n7iZq-0jL.jpg\" \n",
" \"Diagrammatica: The Path to Feynman Diagrams (Cambridge Lecture Notes in Physics)\"\n",
" \"Martinus Veltman\" \n",
" 23 \n",
" \"Science & Math\" "
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_test_csv[7]"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30-element Array{Float32,1}:\n",
" 0.027294824 \n",
" 0.008443545 \n",
" 0.032920413 \n",
" 0.008069489 \n",
" 0.016592907 \n",
" 0.010181716 \n",
" 0.13866615 \n",
" 0.03892814 \n",
" 0.02634485 \n",
" 0.03132174 \n",
" 0.0062278663\n",
" 0.04601992 \n",
" 0.008348866 \n",
" ⋮ \n",
" 0.025657153 \n",
" 0.010952779 \n",
" 0.0171675 \n",
" 0.06719829 \n",
" 0.010065774 \n",
" 0.0694461 \n",
" 0.02233742 \n",
" 0.034847874 \n",
" 0.024896467 \n",
" 0.01961776 \n",
" 0.01895972 \n",
" 0.042962853 "
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output_arr = model(grey_arr(load(\"./data/$(data_train_csv[69][2])\")))"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.13866615f0"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"maxval = maximum(output_arr)"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1-element Array{Int64,1}:\n",
" 7"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findall(x -> x==maxval, output_arr)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"7 acc to the Labels and Categories of the images is the **Computers & Technology** however it should be Science & Math. Pretty Close I must say\n",
"\n",
"\n",
"## Thank You!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.3.0",
"language": "julia",
"name": "julia-1.3"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.3.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment