ivanliu/data_scraping.prd

## data_scraping.prd
= System Requirements =
+ Finacial Data Store
- Be able to store raw web pages (unstructured data) as well as extracted data (structured data)
- Use MySql for now

+ Crawler
- Crawl selected website and store raw web pages;
- Extract fields of interest from the web page;
- Use Python Scrapy

+ Data Processing
- Process extracted data and generate derived information
- Simple Python ETL job

= Data Sources =
+ dataroma.com
- Crawl the investing activity data of super investers.
https://github.com/ivanliu/springforward/tree/master/graph/datafetch

+ Financial statements from Morningstar and Yahoo.
- income, balance and cash flow
	= System Requirements =
	+ Finacial Data Store
	- Be able to store raw web pages (unstructured data) as well as extracted data (structured data)
	- Use MySql for now

	+ Crawler
	- Crawl selected website and store raw web pages;
	- Extract fields of interest from the web page;
	- Use Python Scrapy

	+ Data Processing
	- Process extracted data and generate derived information
	- Simple Python ETL job

	= Data Sources =
	+ dataroma.com
	- Crawl the investing activity data of super investers.
	https://github.com/ivanliu/springforward/tree/master/graph/datafetch

	+ Financial statements from Morningstar and Yahoo.
	- income, balance and cash flow