Skip to content

Instantly share code, notes, and snippets.

View armathur's full-sized avatar
🎯
Focusing

Aman Mathur armathur

🎯
Focusing
  • Los Angeles
View GitHub Profile
"""
Aman Mathur
Last Modified: 10/07/2017
Purpose: TO get filetype detected by Tika, file path and byte frequencies of TREC-DD Polar Dataset
Acknowledgement: Ming-Chang Chiu
"""
import numpy as np
import h5py
import os
@armathur
armathur / gist:10ece414e60f56cd99bb26dfe5835271
Created October 7, 2017 01:11
File Type detection using Apache Tika
import numpy as np
import h5py
import os
from os import path
import preprocessor
import tika
from tika import detector
import sys
import pandas as pd