Skip to content

Instantly share code, notes, and snippets.

@ivanliu
Last active December 18, 2016 08:13
Show Gist options
  • Save ivanliu/a34e1459c19638bed47ef3ddde2add13 to your computer and use it in GitHub Desktop.
Save ivanliu/a34e1459c19638bed47ef3ddde2add13 to your computer and use it in GitHub Desktop.
Data Scraping
= System Requirements =
+ Finacial Data Store
- Be able to store raw web pages (unstructured data) as well as extracted data (structured data)
- Use MySql for now
+ Crawler
- Crawl selected website and store raw web pages;
- Extract fields of interest from the web page;
- Use Python Scrapy
+ Data Processing
- Process extracted data and generate derived information
- Simple Python ETL job
= Data Sources =
+ dataroma.com
- Crawl the investing activity data of super investers.
https://github.com/ivanliu/springforward/tree/master/graph/datafetch
+ Financial statements from Morningstar and Yahoo.
- income, balance and cash flow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment