Skip to content

Instantly share code, notes, and snippets.

\begin{table}[t!]
\centering
\caption{Example of correct and fail cases.}
\label{table:dog_compare}
\begin{adjustbox}{max width=90mm}
\begin{tabular}{|l|>{\centering\arraybackslash}m{2in}|}
\hline
& \quad \quad Border Collie \newline \raisebox{-\totalheight}{\includegraphics[scale=0.2]{correction_border_collie}} \vfill \\ \hline
baseline & \quad \quad Australian Shepherd \newline \raisebox{-\totalheight}{\includegraphics[scale=0.2]{correction_australian_shepherd_1}} \vfill \\ \hline
\section{Experiment}
Our fine-tuning takes an already learned model: BVLC CaffeNet Model. CaffeNet is modified by AlexNet. This model is the result of Caffnet training on ImageNet. We set the result of fine-tuning as our baseline.
We use the dataset from Microsoft: Clickture-FilteredDog. This is a subset of the Clickture-Full dataset which only contains the dog breed related items. We pick out 107 class of this subset which contains more than 100 images total 89,910 images. We use 5-fold to split this dataset: 7,1932 images for training and 17,978 for testing.
Our result on Clickture-FilteredDog in Table 1 and Table 2. Our network achieves accuracy of \textbf{50.5\%}. The best performance with fine-tuning is 46.2\%.
In Table 1, the result of our first approach, average vector, does not exceed baseline(fine-tuning). Our MMD loss does not fall down after 7000? iterations, so we consider that average reduces some information in text such that the performance do not better than fine-tuning. Form t-SNE algorith