Skip to content

Instantly share code, notes, and snippets.

@Terryhung
Created August 12, 2015 16:05
Show Gist options
  • Save Terryhung/a907480ff50b266055dc to your computer and use it in GitHub Desktop.
Save Terryhung/a907480ff50b266055dc to your computer and use it in GitHub Desktop.
Experiment
\section{Experiment}
Our fine-tuning takes an already learned model: BVLC CaffeNet Model. CaffeNet is modified by AlexNet. This model is the result of Caffnet training on ImageNet. We set the result of fine-tuning as our baseline.
We use the dataset from Microsoft: Clickture-FilteredDog. This is a subset of the Clickture-Full dataset which only contains the dog breed related items. We pick out 107 class of this subset which contains more than 100 images total 89,910 images. We use 5-fold to split this dataset: 7,1932 images for training and 17,978 for testing.
Our result on Clickture-FilteredDog in Table 1 and Table 2. Our network achieves accuracy of \textbf{50.5\%}. The best performance with fine-tuning is 46.2\%.
In Table 1, the result of our first approach, average vector, does not exceed baseline(fine-tuning). Our MMD loss does not fall down after 7000? iterations, so we consider that average reduces some information in text such that the performance do not better than fine-tuning. Form t-SNE algorithm, we can discovery some different types of dog are clustered together, so we conclude that average vector feature will lose some useful information in text because of average the vector word2vec.
In Table 2, most of our result surpass the baseline. Networks utilizing VLAD feature has a total 4\% improvement over baseline. The VLAD feature make our network has impressive improvement: This model let 12\% of error prediction which made by average vector method be correct and 8\% of correct prediction be error. %The loss of MMD can be lower than average vector method, it reflect that those images with dissimilar text will be pull away.
\begin{table}[]
\centering
\label{my-label}
\resizebox{!}{!}{
\begin{tabular}{|l|l|l|l|l|}
\hline
Text Feature & 0.25* & 0.1* & 0.25 & 0.1 \\ \hline
baseline & 46.2\% & 46.2\% & 46.2\% & 46.2\% \\ \hline
avg-vec & & & & \\ \hline
\end{tabular} }
\caption{: The head of figure is the weight of MMD loss and the figure with an asterisk* is the left of network without fc\_adapt }
\end{table}
\begin{table}[]
\centering
\resizebox{88mm}{!}{
\begin{tabular}{|l|l|l|l|l|l|l|l|}
\hline
Text Feature & 0.25 & 0.1 & 0.05 & 0.01 & 0.005 & 0.001 & 0.0005 \\ \hline
baseline & 46.2\% & 46.2\% & 46.2\% & 46.2\% & 46.2\% & 46.2\% & 46.2\% \\ \hline
vlad-1024 & 45.8\% & 48.0\% & 48.4\% & 49.6\% & 49.7\% & 49.9\% & \\ \hline
vlad-2048 & 43.0\% & 47.4\% & 48.8\% & 50.0\% & 50.0\% & 50.2\% & \\ \hline
vlad-4000 & & & 48.2\% & 49.9\% & 50.1\% & 50.1\% & 50.2\% \\ \hline
vlad-2048* & 43.1\% & 47.6\% & 48.3\% & 50.0\% & 50.0\% & 50.2\% & \\ \hline
vlad-4000* & & & 48.4\% & 50.0\% & 50.0\% & 50.3\% & {\bf 50.5\%} \\ \hline
\end{tabular} }
\caption{The head of figure is the weight of MMD loss. Text feature with an asterisk* is "local vlad"}
\end{table}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment