Skip to content

Instantly share code, notes, and snippets.

@pkmandke
Last active December 17, 2022 06:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pkmandke/34a811a7b0bc20a9fa62968a89a902e8 to your computer and use it in GitHub Desktop.
Save pkmandke/34a811a7b0bc20a9fa62968a89a902e8 to your computer and use it in GitHub Desktop.
Knowledge Distillation and OpenCL-FPGA implementation of FaceNet CNN

Distillation based Knowledge Transfer & OpenCL-FPGA implementation of FaceNet CNN

Motivation

In the wake of a burgeoning amount of data and demand for IoT based edge computing devices at the advent of 5G, the research community as well as the industry are taking an increased interest in deploying AI on the edge. AI based systems in edge computing devices provide the best of both worlds viz., the state-of-the-art accuracies of deep learning models and the portability and scalability of embedded systems. However, a major bottleneck is the prohibitively large size of deep learning models and the limited memory and computation capacities of embedded systems that work on a low power budget. With this background, our work is at the intersection of computer vision, deep learning and embedded systems. By addressing the problem of face recognition using deep convolutional neural networks(CNNs), we explore optimizations at the model level and the hardware level with an aim to ease embedded implementations.

Crux

Our work extends the idea of distillation based knowledge transfer for model compression to regression based problems. We present the results of experiments where the knowledge from an Inception CNN (teacher network) with ~3.7M parameters is transferred to MobileNet CNN (student network) architectures with ~0.8M and ~0.5M parameters. We demonstrate that the smaller student networks are not only able to achieve comparable results but even exceed the face verification accuracy of the Inception teacher CNN on the Labeled Faces in the Wild (LFW) test set. For instance, we transfer the knowledge from a Inception CNN with 81.07% LFW accuracy into a MobileNet model to achieve an accuracy of 83.28% with a 76.75% eduction in the parameter count. The student network is trained using a so-called transfer dataset consistinf of ~1M images from VGG2 from the teacher networks embeddings in a mean square error sense. In addition, we demonstrate that pruning the local response normalization layers along with those layers that either perform a constant product or power on their input, has negligible effects on the model accuracy. Further, precluding the affine transform based face alignment step reduces the accuracy by a only modest amount. Additionally, a myriad of experiments on knowledge transfer with hyper-parameter tuning have been performed and discussed to promote future work.

FaceNet CNN - The baseline model

FaceNet is a deep convolutional neural network that generates a unified embedding from cropped facial images. The merit of this model, in comparison to it's competitors, lies in the way it is trained. While most competitive techniques like DeepFace train the neural architectures by backpropagating through the cross-entropy of soft predictions over known identities, FaceNet minimizes a so-called triplet loss that optimizes the embeddings themselves instead of a bottleneck layer over limited identities. As a direct consequence, the embeddigns become robust not only from a classification point-of-view but demonstrate noteworthy results on such tasks as face clustering and verification.

WORK IN PROGRESS

Authors

Acknowledgements

Copyright

This work is owned by College of Engineering, Pune. All rights are reserved by the Center of Excellence in Signal & Image Processing (CoE-SIP) at COEP. The rights to publish this work in any form are held by the authors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment