asmehra95/DL4JImageRecogniser Motivation

## DL4JImageRecogniser for Further improvements
This model currently supports only importing VGG16 it will soon be able to support other moedels,
currently there is an issue with loading VGG16NOTOP due some bug in helperfunctions of DL4J.
The issue has been filed under issue as

https://github.com/deeplearning4j/deeplearning4j/issues/3099

## DL4JImageRecogniser Motivation
When ObjectRecognitionParser was built to do image recognition, there wasn't
good support for Java frameworks.  All the popular neural networks were in
C++ or python.  Since there was nothing that runs within JVM, we tried
several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
However, this game is changing slowly now. Deeplearning4j, the most famous
neural network library for JVM, now supports importing models that are
pre-trained in python/C++ based kits [5].

*Improvement:*
It will be nice to have an implementation of ObjectRecogniser that
doesn't require any external setup(like installation of native libraries or
starting REST services). Reasons: easy to distribute and also to cut the IO
time.

  was:
When we built ObjectRecognitionParser to do image recognition, there wasn't
good support for Java frameworks.  All the popular neural networks were in
C++ or python.  Since there was nothing that runs within JVM, we tried
several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
However, this game is changing slowly now. Deeplearning4j, the most famous
neural network library for JVM, now supports importing models that are
pre-trained in python/C++ based kits [5].

*Improvement:*
It will be nice to have an implementation of ObjectRecogniser that
doesn't require any external setup(like installation of native libraries or
starting REST services). Reasons: easy to distribute and also to cut the IO
time.

## DL4JImageRecogniser Usage
The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running  RESTservice in as in case of TensorFlowRESTrecogniser.
You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision

To use the DL4JImageRecogniser set
class param to org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser
modelType to VGG16
sample configuration is given below for refference.
<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser">
            <mime>image/jpeg</mime>
            <params>
                <param name="topN" type="int">5</param>
                <param name="minConfidence" type="double">0.015</param>
                <param name="class" type="string">org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser</param>
				        <param name="modelType" type="string">VGG16</param>
            </params>
        </parser>
    </parsers>
</properties>
Save the configuration at : tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest

To run it, build the project and move to root directory of the project and run the command

java -Xmx3G -jar tika-app/target/tika-app-1.14.jar --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml <path to your image file>

-Xmx3G is required because VGG16 model requires quite a lot of memory to run. If your system is not able to run it, you may try to pump up the memory further

Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels
To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces
the resource usage (specially memory consumption) for future loads.
	This model currently supports only importing VGG16 it will soon be able to support other moedels,
	currently there is an issue with loading VGG16NOTOP due some bug in helperfunctions of DL4J.
	The issue has been filed under issue as

	https://github.com/deeplearning4j/deeplearning4j/issues/3099
	When ObjectRecognitionParser was built to do image recognition, there wasn't
	good support for Java frameworks. All the popular neural networks were in
	C++ or python. Since there was nothing that runs within JVM, we tried
	several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
	However, this game is changing slowly now. Deeplearning4j, the most famous
	neural network library for JVM, now supports importing models that are
	pre-trained in python/C++ based kits [5].

	Improvement:
	It will be nice to have an implementation of ObjectRecogniser that
	doesn't require any external setup(like installation of native libraries or
	starting REST services). Reasons: easy to distribute and also to cut the IO
	time.

	was:
	When we built ObjectRecognitionParser to do image recognition, there wasn't
	good support for Java frameworks. All the popular neural networks were in
	C++ or python. Since there was nothing that runs within JVM, we tried
	several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
	However, this game is changing slowly now. Deeplearning4j, the most famous
	neural network library for JVM, now supports importing models that are
	pre-trained in python/C++ based kits [5].

	Improvement:
	It will be nice to have an implementation of ObjectRecogniser that
	doesn't require any external setup(like installation of native libraries or
	starting REST services). Reasons: easy to distribute and also to cut the IO
	time.
	The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running RESTservice in as in case of TensorFlowRESTrecogniser.
	You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision

	To use the DL4JImageRecogniser set
	class param to org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser
	modelType to VGG16
	sample configuration is given below for refference.
	<?xml version="1.0" encoding="UTF-8"?>
	<properties>
	<parsers>
	<parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser">
	<mime>image/jpeg</mime>
	<params>
	<param name="topN" type="int">5</param>
	<param name="minConfidence" type="double">0.015</param>
	<param name="class" type="string">org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser</param>
	<param name="modelType" type="string">VGG16</param>
	</params>
	</parser>
	</parsers>
	</properties>
	Save the configuration at : tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest

	To run it, build the project and move to root directory of the project and run the command

	java -Xmx3G -jar tika-app/target/tika-app-1.14.jar --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml <path to your image file>

	-Xmx3G is required because VGG16 model requires quite a lot of memory to run. If your system is not able to run it, you may try to pump up the memory further

	Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels
	To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces
	the resource usage (specially memory consumption) for future loads.