Anton Marini vade

## Image Label InceptionV3 Inference Optimized.txt
Andromeda:tensorflow vade$ time bazel-bin/tensorflow/examples/label_image/label_image
I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 514 nodes
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=9316873 count=11 runs, avg 9317 ms, 514 nodes defined 504 nodes observed
128366.4 avg KB per run.

============ By run order (ms) =================
  [start]  [first]    [avg]	     [%] 	  [cdf%] 	      [Op]	[Name]
    0.000    0.086    0.086	  0.001%	  0.001%	          	_SOURCE
    0.125    0.027    0.027	  0.000%	  0.001%	     Const	mixed/join/concat_dim
    0.158    0.007    0.007	  0.000%	  0.001%	     Const	pool_3/_reshape/shape

## Image Label InceptionV3 No Optimization
 time bazel-bin/tensorflow/examples/label_image/label_image
I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 1004 nodes
W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=10922857 count=11 runs, avg 1.092e+04 ms, 1004 nodes defined 901 nodes observed
128366.4 avg KB per run.

============ By run order (ms) =================
  [start]  [first]    [avg]	     [%] 	  [cdf%] 	      [Op]	[Name]
    0.000    0.204    0.204	  0.002%	  0.002%	          	_SOURCE
    0.273    0.031    0.031	  0.000%	  0.002%	     Const	mixed_9/tower/conv/batchnorm/gamma

## main.cc
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,

## Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Optimized - Non Quantized Graph.txt

Tensorflow compiled with :  bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so

Graph : Inception V3 post running Inference Optimizer

Output of custom app running TF, 222 frames took 28.129690 seconds

I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=41604678 count=11 runs, avg 4.16e+04 ms, 514 nodes defined 514 nodes observed
28625707.2 avg KB per run.

## Inception V3 Benchmark - Macbook Pro 2.8 GHz Intel Core i7 - Unoptimized Graph.txt
Tensorflow compiled with :  bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so

Graph : Inception V3 no alterations.

Output of custom app running TF, 222 frames took 32.598143 seconds

I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=48605744 count=11 runs, avg 4.861e+04 ms, 1004 nodes defined 994 nodes observed
28625707.2 avg KB per run.

============ By run order (ms) =================

## Optimized_Quantized_Eightbit.txt

Tensorflow compiled with :  bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so

Graph : Inception V3 post running inference Optimizer + Quantizer with mode eightbit

Output of custom app running TF, 222 frames took 63.174700 seconds

I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 1282 nodes
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=108301617 count=11 runs, avg 1.083e+05 ms, 1282 nodes defined 1282 nodes observed
23220669.5 avg KB per run.

## Optimized_Quantized_Rounded.txt

Tensorflow compiled with :  bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so

Graph : Inception V3 post running inference Optimizer + Quantizer with mode weights_rounded

Output of custom app running TF, 222 frames took 25.201791 seconds

I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 514 nodes
I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=36817771 count=11 runs, avg 3.682e+04 ms, 514 nodes defined 514 nodes observed
28625707.2 avg KB per run.

## Retraining InceptionV3 Question
Retraining (ie https://www.tensorflow.org/versions/r0.11/how_tos/image_retraining/index.html ) doesnt really go into
nuances about what types of labels you should choose based on your model.

Since InceptionV3 is an object recognition task, and the penultimate layer (pool 3) contains some 2048 vector length descriptions that
somehow infer various 'objectness' traits, its far better to say:

train for labels that tend toward objectness (lamp, lampshade, chandelier, standing lamp, desk lamp)
than train for labels that then to abstract image features like	composition: chaotic, patterned, symmetric, asymmetric, mirrored, circular, diagonal, natural (photographic) , synthetic)

If I were interested in the latter labeling (ie, meta-features), is it more sensible to:

## example.cpp
unique_ptr<Frame> Decoder::convertVideoFrame( const Frame &frame ) const
{
    CI_ASSERT( frame.getMediaType() == AVMEDIA_TYPE_VIDEO );
    unique_ptr<Frame> result( new FrameVideo( frame.getTimeBase() ) );

    result->getAvFrame()->format = AV_PIX_FMT_RGB24;
    result->getAvFrame()->width = frame.getAvFrame()->width;
    result->getAvFrame()->height = frame.getAvFrame()->height;

    // allocate backing

## Videostream.cpp
void Encoder::configureVideoStreams()
{
    // Preset options:
    // ultrafast	superfast	veryfast	faster	fast	medium	slow	slower	veryslow	placebo
    // see : http://dev.beandog.org/x264_preset_reference.html
    AVDictionary *optionsDict = NULL;

    // careful with ultrafast - it seems to force constrained baseline?; this call also allocates
    av_dict_set( &optionsDict, "preset","superfast", 0 );
	Andromeda:tensorflow vade$ time bazel-bin/tensorflow/examples/label_image/label_image
	I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 514 nodes
	I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=9316873 count=11 runs, avg 9317 ms, 514 nodes defined 504 nodes observed
	128366.4 avg KB per run.

	============ By run order (ms) =================
	[start] [first] [avg] [%] [cdf%] [Op] [Name]
	0.000 0.086 0.086 0.001% 0.001% _SOURCE
	0.125 0.027 0.027 0.000% 0.001% Const mixed/join/concat_dim
	0.158 0.007 0.007 0.000% 0.001% Const pool_3/_reshape/shape
	time bazel-bin/tensorflow/examples/label_image/label_image
	I tensorflow/core/util/stat_summarizer.cc:33] StatSummarizer found 1004 nodes
	W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
	I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=10922857 count=11 runs, avg 1.092e+04 ms, 1004 nodes defined 901 nodes observed
	128366.4 avg KB per run.

	============ By run order (ms) =================
	[start] [first] [avg] [%] [cdf%] [Op] [Name]
	0.000 0.204 0.204 0.002% 0.002% _SOURCE
	0.273 0.031 0.031 0.000% 0.002% Const mixed_9/tower/conv/batchnorm/gamma
	/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,

	Tensorflow compiled with : bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so

	Graph : Inception V3 post running Inference Optimizer

	Output of custom app running TF, 222 frames took 28.129690 seconds

	I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=41604678 count=11 runs, avg 4.16e+04 ms, 514 nodes defined 514 nodes observed
	28625707.2 avg KB per run.
	Tensorflow compiled with : bazel build -c opt --copt=-mavx --cxxopt=-fno-exceptions --cxxopt=--std=c++11 --cxxopt=-DNDEBUG --cxxopt=-DNOTFDBG --cxxopt=-O2 --cxxopt=-DUSE_GEMM_FOR_CONV //tensorflow:libtensorflow_cc.so

	Graph : Inception V3 no alterations.

	Output of custom app running TF, 222 frames took 32.598143 seconds

	I tensorflow/core/util/stat_summarizer.cc:353] Total time (us): curr=48605744 count=11 runs, avg 4.861e+04 ms, 1004 nodes defined 994 nodes observed
	28625707.2 avg KB per run.

	============ By run order (ms) =================
	Retraining (ie https://www.tensorflow.org/versions/r0.11/how_tos/image_retraining/index.html ) doesnt really go into
	nuances about what types of labels you should choose based on your model.

	Since InceptionV3 is an object recognition task, and the penultimate layer (pool 3) contains some 2048 vector length descriptions that
	somehow infer various 'objectness' traits, its far better to say:

	train for labels that tend toward objectness (lamp, lampshade, chandelier, standing lamp, desk lamp)
	than train for labels that then to abstract image features like composition: chaotic, patterned, symmetric, asymmetric, mirrored, circular, diagonal, natural (photographic) , synthetic)

	If I were interested in the latter labeling (ie, meta-features), is it more sensible to:
	unique_ptr<Frame> Decoder::convertVideoFrame( const Frame &frame ) const
	{
	CI_ASSERT( frame.getMediaType() == AVMEDIA_TYPE_VIDEO );
	unique_ptr<Frame> result( new FrameVideo( frame.getTimeBase() ) );

	result->getAvFrame()->format = AV_PIX_FMT_RGB24;
	result->getAvFrame()->width = frame.getAvFrame()->width;
	result->getAvFrame()->height = frame.getAvFrame()->height;

	// allocate backing
	void Encoder::configureVideoStreams()
	{
	// Preset options:
	// ultrafast superfast veryfast faster fast medium slow slower veryslow placebo
	// see : http://dev.beandog.org/x264_preset_reference.html
	AVDictionary *optionsDict = NULL;

	// careful with ultrafast - it seems to force constrained baseline?; this call also allocates
	av_dict_set( &optionsDict, "preset","superfast", 0 );