Skip to content

Instantly share code, notes, and snippets.

@eellpp
Last active August 30, 2016 00:15
Show Gist options
  • Save eellpp/bda725019ccbc666a3e4c12f4d699769 to your computer and use it in GitHub Desktop.
Save eellpp/bda725019ccbc666a3e4c12f4d699769 to your computer and use it in GitHub Desktop.
http://www.grcdi.nl/gsb/summary_%20company%20legal%20forms.html
https://en.wikipedia.org/wiki/Types_of_business_entity
https://gate.ac.uk/sale/tao/splitap6.html
https://github.com/hpcc-systems/TextAnalytics/blob/master/hpcc/GATE_Annie/plugins/ANNIE/resources/NE/name.jape
## Java * References
https://www.oreilly.com/learning/java-8-functional-interfaces
http://refactoring.info/tools/LambdaFicator/
https://github.com/JnRouvignac/AutoRefactor/wiki/Useful-links
https://mailparser.io/pricing
https://parser.zapier.com/
http://unitedobjectives.com/products/invoice_processor/
InvoiceP2 is built in Python and C (for performance critical modules). It uses proprietary Natural Language and Text Processing technology created by United Objectives. Parsing algorithms are backed up by semantic, frequency, colocation, proper names and morphological dictionaries that enable the system to parse documents without any predefined structure.
CloudFactory is a scalable way to outsource tedious and repetitive data work. We help break your project down into small tasks that are processed by our global 24x7 workforce. Our software platform manages this workforce to ensure quick turnaround and accurate results for your business.
http://www.raremile.com/mobile-receipt-scanning-and-data-extraction.html
Papers
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.403.1594&rep=rep1&type=pdf
http://vision.gyte.edu.tr/publications/2016/VISIGRAPP_2016Camera.pdf
http://www.ict.griffith.edu.au/das2012/attachments/FullPaperProceedings/4661a409.pdf
http://www.dsi.unifi.it/~simone/Papers/ICDAR97b.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.137&rep=rep1&type=pdf
http://ocean.kisti.re.kr/downfile/volume/kmtd/MTMDCW/2010/v13n12/MTMDCW_2010_v13n12_1786.pdf
https://www.researchgate.net/profile/Enrico_Francesconi/publication/3710626_Rectangle_labelling_for_an_invoice_understanding_system/links/02e7e51818f35768f6000000.pdf
http://seco.cs.aalto.fi/u/jwtuomin/svn/secoweb/public_html/publications/2011/nyberg-masters-thesis.pdf
https://jamia.oxfordjournals.org/content/21/5/850
paper
Converting and Annotating Quantitative Data Tables
https://files.ifi.uzh.ch/ddis/iswc_archive/iswc/ab/2010/iswc2010.semanticweb.org/pdf/167.pdf
Rule Based Systems for Big Data
http://www.springer.com/us/book/9783319236957
GSearch Queries:
annotation of quantitative research data stored in tables.
table header disambiguation
http://iswc2013.semanticweb.org/sites/default/files/iswc_poster_7.pdf
Towards Disambiguating Web Tables
create a file with simple Table
GUI
create a application
- annotate it with tokens
- create the jape file for it and test it output
- save as the applicataion
library
- load the application from file
- execute the PR
print out
- the reason mail parsing is slow because its following rules based parsing on labels. Matching each text against some list of patterns
- instead we should use the model based parsing where probabilistic output is provided
We can also use a combination:
- use the rule based parser to prepare the learning set for the probabilistic parser
- use the rule parser in combination with the ML
Currently I am doing following steps :
https://gate.ac.uk/wiki/TrainingCourseJune2015/
jape training
https://gate.ac.uk/sale/talks/gate-course-jun13/track-1/module-3-jape/module-3-jape.pdf
jake wiki
https://gate.ac.uk/wiki/jape-repository/
GATE Training docs
https://gate.ac.uk/wiki/TrainingCourseJune2015/
"","SportsInterest","MoviesInterest","Technology.Interest","Finance.Interest","Politics.Interest","Travel.Interest","BizInterest","Intnl Interest","Age","Gender.Female","Gender.Male","Relationship.Status.Divorced","Relationship.Status.Married","Relationship.Status.Single", "Family.Size","Job.Level.Director","Job.Level.Entry.Level","Job.Level.intern","Job.Level.Manager","Job.Level.Sr..Manager","Income.25.to. 35","Income.35.to.50","Income.50.to.75","Income.75.to.100","Income.100.plus","Income.25.minus","No.of.Vehicles.Owned","Sales"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment