Skip to content

Instantly share code, notes, and snippets.

@icco
Created March 12, 2011 22:11
Show Gist options
  • Save icco/867625 to your computer and use it in GitHub Desktop.
Save icco/867625 to your computer and use it in GitHub Desktop.
A course outline for CSC 400

CSC 400 - bloomFilter

Here is where I will put my notes as I develop.

3.20.11 && random dates before

I did some vertical prototyping with rails. I haven't used rails before, so I'm slowly trying to get a hang on the framework.

3.30.11 && 3.31.11

I spent about four hours last night getting user authentication working. Users can now submit items. I also got the basics of voting done.

This morning I looked into building an item heirarchy so users can comment on submissions.

I also am slowly getting the design down, although it is still pretty ugly.

I've also been thinking about how to abstract out the list view to do bandit testing on all of the different sort options.

Between last night and this morning I did about eight hours of work.

4.1.11

Spent about an hour refactoring the template files and making sure that pages had the correct data. I still need to make sure only the right people can delete and edit their posts.

4.4.11

Researched validations in rails and made some more UI tweaks.

4.5.11

Met and talked with Clements. We discussed the need to be able to generate data over time, a time model for our site if you will. If we had 1000 users, submitting 5 links a day, who have 10 interests and voted on a selection (10) of the links in their interests, we would be in good shape.

Things to research:

To have done by next meeting:

  • A system generating the data above mentioned and possibly some look into the algorithms.

I also spent some time cleaning up the UI some more. Still needs a lot of work.

4.16.11

Wow, I really should have worked on this last week... Oh well. I didn't get much done today besides reading about how testing works in rails and learning about the TimeCop gem.

4.19.11

Met with clements. This weeks goals, get the users voting (by basically having the model "cheat") and get Vowpol Wabbit inside a ruby gem.

  • rake data:model now generates users in a valid way.

4.22.11

4.25.11

More work on the data generation.

4.26.11

More work on the data generation.

4.28.11

Got data generation finished.

4.29.11

Spent a few hours comptemplating design changes.

  • Maybe I should comments their own class?
    • I feel like I am over complicating things.
    • What's the best way to do trees in ruby?
      • and represent that in the db?
  • Look at Reddit and HackerNews, how should I style the page?
  • Need some color

5.2.11

Problems with moving comments:

  • Votes I need to make votes "smarter" aka more complicated.

I got vowpal wabbit compiled, but I can't figure out the input data it wants.

5.7.11

Met with clements yesterday, basically we need a clustering algorithm and a distance metric. Here are some things I am reading today at SHDH.

5.17.11

I really need to stop putting such huge breaks between my work.

5.23.11

Alright, time to implement. We are implementing clustering. From more of my reading it seems that collaborative filtering is just a type of clustering. The main thing is that we need to make sure we are implementing an unsupervised learning algorithm.

  • The distance is currently the number of voters we share.
  • We need to make it so we can select items sorted by date and distance
  • so add a join table that stores two item ids, their distance, and date last computed. Cache for 15 minutes.

5.31.11

6.2.11

I talked to Clements today. I'm an idiot.

Each point in our graph isn't a one dimensial point, but rather n dimentional. I then take the euclidean distance (the difference between the two column vectors) and use that to mean the data.

6.6.11 - 6.8.11

Got basics of k-means clustering working. Only issue I'm still having is figuring out how to pick the correct item closest to the centroid.

6.8.11

Met with clements. He suggested storing the vector in the db, instead of associating it with a point. I would do this by storing rows which had user_id, cluster_id, mean for that cluster. I would then compute based off of that.

What I am currently doing is called k-metroid (or something like that) apparently.

To finish the project, I need to create a page that shows the reasonable recommendation the project is giving me. I am going to mail it to him and then if there is anything he does not understand, we will talk.

6.9.11

Finishing up code based on yesterdays notes. Need to test and write proof.

CSC 400 Rubric

  • Plain website 35 pts

    • Users can register and authenticate 5 pts
    • Users can submit URLs 5 pts
    • Users can comment on submitted URLs 5 pts
    • Users have a front page of URLs of interest 5 pts
    • Users can vote on their favorite posts 5 pts
    • Users can edit their submissions 5 pts
    • Users can be turned into moderators, who can delete posts 5 pts
  • Recommendation Algorithms 60 pts

    • Lab Notebook with bi weekly checkin. Tuesdays at 10am.
  • Final - Write a short paper explaining findings, decisions made, etc. 5 pts

Total: 100 pts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment