Skip to content

Instantly share code, notes, and snippets.

View vade's full-sized avatar

Anton Marini vade

View GitHub Profile
@vade
vade / Image Label InceptionV3 Inference Optimized.txt
Created November 21, 2016 20:31
Output of Trace Label Image for Straight Inception V3 graph Inference Optimized
Andromeda:tensorflow vade$ time bazel-bin/tensorflow/examples/label_image/label_image
I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 514 nodes
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=9316873 count=11 runs, avg 9317 ms, 514 nodes defined 504 nodes observed
128366.4 avg KB per run.
============ By run order (ms) =================
[start] [first] [avg] [%] [cdf%] [Op] [Name]
0.000 0.086 0.086 0.001% 0.001% _SOURCE
0.125 0.027 0.027 0.000% 0.001% Const mixed/join/concat_dim
0.158 0.007 0.007 0.000% 0.001% Const pool_3/_reshape/shape
@vade
vade / Image Label InceptionV3 No Optimization
Created November 21, 2016 20:29
Output of Trace Label Image for Straight Inception V3 graph
time bazel-bin/tensorflow/examples/label_image/label_image
I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 1004 nodes
W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=10922857 count=11 runs, avg 1.092e+04 ms, 1004 nodes defined 901 nodes observed
128366.4 avg KB per run.
============ By run order (ms) =================
[start] [first] [avg] [%] [cdf%] [Op] [Name]
0.000 0.204 0.204 0.002% 0.002% _SOURCE
0.273 0.031 0.031 0.000% 0.002% Const mixed_9/tower/conv/batchnorm/gamma
@vade
vade / main.cc
Created November 21, 2016 20:26
Tensorflow Image Label example with tracing enabled.
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
@vade
vade / Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Optimized - Non Quantized Graph.txt
Created November 21, 2016 17:50
Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Optimized - Non Quantized Graph
Tensorflow compiled with : bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so
Graph : Inception V3 post running Inference Optimizer
Output of custom app running TF, 222 frames took 28.129690 seconds
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=41604678 count=11 runs, avg 4.16e+04 ms, 514 nodes defined 514 nodes observed
28625707.2 avg KB per run.
@vade
vade / Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Unoptimized Graph.txt
Last active November 21, 2016 17:48
Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Unoptimized Graph
Tensorflow compiled with : bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so
Graph : Inception V3 no alterations.
Output of custom app running TF, 222 frames took 32.598143 seconds
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=48605744 count=11 runs, avg 4.861e+04 ms, 1004 nodes defined 994 nodes observed
28625707.2 avg KB per run.
============ By run order (ms) =================
@vade
vade / Optimized_Quantized_Eightbit.txt
Created November 21, 2016 17:44
Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Inference Optimized + Quantized eight bit
Tensorflow compiled with : bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so
Graph : Inception V3 post running inference Optimizer + Quantizer with mode eightbit
Output of custom app running TF, 222 frames took 63.174700 seconds
I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 1282 nodes
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=108301617 count=11 runs, avg 1.083e+05 ms, 1282 nodes defined 1282 nodes observed
23220669.5 avg KB per run.
@vade
vade / Optimized_Quantized_Rounded.txt
Created November 21, 2016 17:33
Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Inference Optimized + Quantized Rounded
Tensorflow compiled with : bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so
Graph : Inception V3 post running inference Optimizer + Quantizer with mode weights_rounded
Output of custom app running TF, 222 frames took 25.201791 seconds
I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 514 nodes
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=36817771 count=11 runs, avg 3.682e+04 ms, 514 nodes defined 514 nodes observed
28625707.2 avg KB per run.
Retraining (ie https://www.tensorflow.org/versions/r0.11/how_tos/image_retraining/index.html ) doesnt really go into
nuances about what types of labels you should choose based on your model.
Since InceptionV3 is an object recognition task, and the penultimate layer (pool 3) contains some 2048 vector length descriptions that
somehow infer various 'objectness' traits, its far better to say:
train for labels that tend toward objectness (lamp, lampshade, chandelier, standing lamp, desk lamp)
than train for labels that then to abstract image features like composition: chaotic, patterned, symmetric, asymmetric, mirrored, circular, diagonal, natural (photographic) , synthetic)
If I were interested in the latter labeling (ie, meta-features), is it more sensible to:
unique_ptr<Frame> Decoder::convertVideoFrame( const Frame &frame ) const
{
CI_ASSERT( frame.getMediaType() == AVMEDIA_TYPE_VIDEO );
unique_ptr<Frame> result( new FrameVideo( frame.getTimeBase() ) );
result->getAvFrame()->format = AV_PIX_FMT_RGB24;
result->getAvFrame()->width = frame.getAvFrame()->width;
result->getAvFrame()->height = frame.getAvFrame()->height;
// allocate backing
void Encoder::configureVideoStreams()
{
// Preset options:
// ultrafast superfast veryfast faster fast medium slow slower veryslow placebo
// see : http://dev.beandog.org/x264_preset_reference.html
AVDictionary *optionsDict = NULL;
// careful with ultrafast - it seems to force constrained baseline?; this call also allocates
av_dict_set( &optionsDict, "preset","superfast", 0 );