Skip to content

Instantly share code, notes, and snippets.

@simplymathematics
Last active August 28, 2019 00:45
Show Gist options
  • Save simplymathematics/e5a6e1580641a8629c96bfa4dda72647 to your computer and use it in GitHub Desktop.
Save simplymathematics/e5a6e1580641a8629c96bfa4dda72647 to your computer and use it in GitHub Desktop.
# Librerouter Testing and Coverage Mapping
Pros: Very useful, great portfolio piece, front-end/db/rest practice. Light C++/Sh development. Distributed development.
Cons: Hardware dependency, very hard to accurately test a large coverage map without deploying code to a target system
Data: From libremap.net, collected in real-time on target hardware, or simulated with qemu
Goal: Front-end application for displaying real-time service data as fed from target hardware
# Real-time Chat Bot
Pros: Fun hack, great portfolio piece, tensorflow RT practice
Cons: Expensive, no immediate value to market or open source project
Data:
Potentially: https://www.kaggle.com/rtatman/ubuntu-dialogue-corpus.
1 million ubuntu support chats between two people.
Goal: Ask a box in my room ubuntu questions and have it respond. Develop tooling for training its response at a user level.
# Facial Recognition and Combatting Sampling Bias
Pros: Very interesting topic. Datasets readily available. Potentially informative and novel research.
Cons: Doesn't fit into current portfolio or fit in long-term career goals (embedded data science). Not much tech practice (unless model includes tensorflow)
Data:
Potentially:https://www.nist.gov/srd/nist-special-database-18
National Institute for Standards dataset for mugshots.
Potentially: https://lionbridge.ai/datasets/5-million-faces-top-15-free-image-datasets-for-facial-recognition/
Potentially: http://robotics.csie.ncku.edu.tw/Databases/FaceDetect_PoseEstimate.htm
Other Faces in the wild
# Investigate causal relationships between technology and wealth
Pros: Finance, projection, analytics practice.
Cons: Dataset will be unreliable, could get non-interesting result, not much tech practice
Data:
Probably: Census data by tract or county level
Country Data: Did not allow for rigid financial analysis due to currency conversions and purchase parity questions
US Census State Data: Wasn't high enough resolution to do good control-based studies (manova, for example)
Goal: Investigate with high confidence the covariance relationship between wealth and internet access rates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment