Skip to content

Instantly share code, notes, and snippets.

@karimkhanp
Last active January 6, 2019 05:13
Show Gist options
  • Save karimkhanp/b73df105beafbbc7b059f298f06d9ec9 to your computer and use it in GitHub Desktop.
Save karimkhanp/b73df105beafbbc7b059f298f06d9ec9 to your computer and use it in GitHub Desktop.
[![Build Status](https://travis-ci.org/joemccann/dillinger.svg?branch=master)](https://travis-ci.org/joemccann/dillinger)
Objective of this Software code is to perform data classification on Consumer complain data available on https://www.consumerfinance.gov/data-research/consumer-complaints/ . There are two main classifer here
- Product classifier
- Issue classifer
Product is the type of products the consumer identified in the complaint
Issue is the type of issues the consumer identified in the complaint
# Data
Consumer complaints are added to this public database after the company has responded to the complaint, confirming a commercial relationship with the consumer, or after they've had the complaint for 15 calendar days, whichever comes first. It does’t verify all the facts alleged in complaints, but do give companies the opportunity to publicly respond to complaints by selecting responses from a pre-populated list.
Database contains 1,192,904 total complaints with 18+ products and 61+ Issues. For this prototype we have considered 6 products each with 500 issues and total 36 issues.
Data can be downloaded from - https://www.consumerfinance.gov/data-research/consumer-complaints/search/?from=0&searchField=all&searchText=&size=25&sort=created_date_desc
### Tech
This Software usage few Open source libraries to build entire package:
* [NLTK] - The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language.NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning.
* [Scikit Learn] - It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
* [Numpy]- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
* [Pickle] - The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.
* [Python] - Python is an interpreted, high-level, general-purpose programming language.
### Installation
Dillinger requires nltk, scikit learn, pickle, numpy, scipy, python 2.7
Install the dependencies and devDependencies and start the server.
1. NLTK
```sh
$ sudo pip install -U nltk
```
2. scikit learn
```sh
$ pip install sklearn
```
3. Numpy
```sh
$ sudo pip install -U numpy
```
4. Scipy
```sh
$ pip install scipy
```
For production environments...
```sh
$ npm install --production
$ NODE_ENV=production node app
```
### Plugins
Dillinger is currently extended with the following plugins. Instructions on how to use them in your own application are linked below.
| Plugin | README |
| ------ | ------ |
| Dropbox | [plugins/dropbox/README.md][PlDb] |
| Github | [plugins/github/README.md][PlGh] |
| Google Drive | [plugins/googledrive/README.md][PlGd] |
| OneDrive | [plugins/onedrive/README.md][PlOd] |
| Medium | [plugins/medium/README.md][PlMe] |
| Google Analytics | [plugins/googleanalytics/README.md][PlGa] |
### Development
Want to contribute? Great!
Dillinger uses Gulp + Webpack for fast developing.
Make a change in your file and instantanously see your updates!
Open your favorite Terminal and run these commands.
First Tab:
```sh
$ node app
```
Second Tab:
```sh
$ gulp watch
```
(optional) Third:
```sh
$ karma test
```
#### Building for source
For production release:
```sh
$ gulp build --prod
```
Generating pre-built zip archives for distribution:
```sh
$ gulp build dist --prod
```
### Docker
Dillinger is very easy to install and deploy in a Docker container.
By default, the Docker will expose port 8080, so change this within the Dockerfile if necessary. When ready, simply use the Dockerfile to build the image.
```sh
cd dillinger
docker build -t joemccann/dillinger:${package.json.version} .
```
This will create the dillinger image and pull in the necessary dependencies. Be sure to swap out `${package.json.version}` with the actual version of Dillinger.
Once done, run the Docker image and map the port to whatever you wish on your host. In this example, we simply map port 8000 of the host to port 8080 of the Docker (or whatever port was exposed in the Dockerfile):
```sh
docker run -d -p 8000:8080 --restart="always" <youruser>/dillinger:${package.json.version}
```
Verify the deployment by navigating to your server address in your preferred browser.
```sh
127.0.0.1:8000
```
#### Kubernetes + Google Cloud
See [KUBERNETES.md](https://github.com/joemccann/dillinger/blob/master/KUBERNETES.md)
### Todos
- Write MORE Tests
- Add Night Mode
License
----
MIT
**Free Software, Hell Yeah!**
[//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax)
[dill]: <https://github.com/joemccann/dillinger>
[git-repo-url]: <https://github.com/joemccann/dillinger.git>
[john gruber]: <http://daringfireball.net>
[df1]: <http://daringfireball.net/projects/markdown/>
[markdown-it]: <https://github.com/markdown-it/markdown-it>
[Ace Editor]: <http://ace.ajax.org>
[node.js]: <http://nodejs.org>
[Twitter Bootstrap]: <http://twitter.github.com/bootstrap/>
[jQuery]: <http://jquery.com>
[@tjholowaychuk]: <http://twitter.com/tjholowaychuk>
[express]: <http://expressjs.com>
[AngularJS]: <http://angularjs.org>
[Gulp]: <http://gulpjs.com>
[PlDb]: <https://github.com/joemccann/dillinger/tree/master/plugins/dropbox/README.md>
[PlGh]: <https://github.com/joemccann/dillinger/tree/master/plugins/github/README.md>
[PlGd]: <https://github.com/joemccann/dillinger/tree/master/plugins/googledrive/README.md>
[PlOd]: <https://github.com/joemccann/dillinger/tree/master/plugins/onedrive/README.md>
[PlMe]: <https://github.com/joemccann/dillinger/tree/master/plugins/medium/README.md>
[PlGa]: <https://github.com/RahulHP/dillinger/blob/master/plugins/googleanalytics/README.md>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment