Skip to content

Instantly share code, notes, and snippets.

@nhojpatrick
Created September 4, 2020 18:40
Show Gist options
  • Save nhojpatrick/c11c00ce35f5af26de51efca9f8e8b4e to your computer and use it in GitHub Desktop.
Save nhojpatrick/c11c00ce35f5af26de51efca9f8e8b4e to your computer and use it in GitHub Desktop.
tika IBM500
$ cat file1
a d
$ java -jar tika-app-1.24.jar file1
Sep 04, 2020 7:37:28 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Sep 04, 2020 7:37:28 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.csv.TextAndCSVParser"/>
<meta name="Content-Encoding" content="ISO-8859-1"/>
<meta name="resourceName" content="file"/>
<meta name="Content-Length" content="3"/>
<meta name="Content-Type" content="text/plain; charset=ISO-8859-1"/>
<title/>
</head>
<body><p>a d</p>
</body></html>
$ cat file2
"a d"
$ java -jar tika-app-1.24.jar file2
Sep 04, 2020 7:38:58 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Sep 04, 2020 7:38:58 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.csv.TextAndCSVParser"/>
<meta name="Content-Encoding" content="ISO-8859-1"/>
<meta name="resourceName" content="file"/>
<meta name="Content-Length" content="5"/>
<meta name="Content-Type" content="text/plain; charset=ISO-8859-1"/>
<title/>
</head>
<body><p>"a d"</p>
</body></html>
$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment