I had detailed talks with two guys from my Statup School Group. They both gave me some Excel sheets they are interested in processing. I was able to write a script to do something potentially useful in both cases. I'm still in discussions with both of them to figure out where to go from here. I will probably reach out to a broader audience this week to see if I can identify a cluster of use-cases here.
I also spent some time looking into neural networks. A lot of research in the field of automatic coding type stuff appears to be focused there. Here are some articles I've been wading through:
- Neural Turing Machines -- https://arxiv.org/abs/1410.5401
- Neural Programmer Interpreter -- https://medium.com/near-ai/review-of-neural-programmer-interpreters-854a14a494fb
- RNN's for code generation -- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
There seems to be an enormous amount of knowledge required to understand the above studies to the point where I would be able to do useful work using similar approaches. It would be a big time investment to learn all that stuff. I'm not sure whether it would be worth it. Even if I did learn everything, the cutting edge doesn't quite seem sufficient to produce a marketable product. I will probably spend some time digging into it further though, despite my hesitations. I noticed there are some machine learning tutors reachable thorugh wyzant.com -- I'm thinking about hiring one of them to make the learning process more efficient.
I also did some random side work last week:
I added better search to my note taking tool, Electric Toothbrush. The motivation was that I had some notes about how to parse Excel spreadsheets in Python, but I couldn't find them -- because the note title was "read excel python" while my query was "parse excel python". I fixed this by basically switching from AND'ing together the terms in my search query to OR'ing them. This leads to better recall but lower precision. I needed to offset the precision loss in order to keep search working well which I did by basically incorporating inverse document frequency into the search results. So far the results seem to be a significant improvement.
I have a tool that takes a screenshot of my desktop every five seconds (https://github.com/JesseAldridge/screen_recorder) and uploads them to S3. I can combine this with quantified-self tools like Task Ranger and Metal Detector to look more deeply into time periods of interest. I cleaned up the workflow for screenshot downloading and it's now working pretty well. In particular, querying for the screenshots from a remote EC2 machine and then downloading them to my local computer is much faster than trying to download them directly. This is because all of the millions of screenshots I have are in a single bucket and when you query for a particular subset S3 apparently does a linear scan through the entire bucket which somehow requires many separate http queries (something to do with pagination I assume).