wilkie/projects.txt

## projects.txt
Alright, my two possible contributions in the area of implementing distributed systems.

1. Distributed Tag-Based File System using What We Already Have

Margo Seltzer told us that hierarchical file systems are dead. The prevalence of the Internet, which is very much a distributed file system without a true hierarchy, seems to support this. Let's explore building a file system that uses the web (hypermedia) to discover relationships between files and explore those relationships through a file system abstraction.

Tools: Linux + FUSE, build a rudimentary file system to start from (it will be pretty much be the solution to the CS1550 FS assignment)
Related Work: LiFS, Quasar, Xanadu (Yes, that Xanadu)

At minimum, we can build a nice proof of concept. Look at a website, links to pages are directories, links to resources (either in <link>, <a href="">, or Link: HTTP headers) are files. File "stat"ing is an HTTP HEAD or HTTP OPTION command. Support for some hypermedia API can be fairly easy to achieve, but exploring whether or not we can do something more general would be very interesting.

Points to consider:

* Files can also link to other files... how do we manage that?
* Specialization can be nice.
* Is there a mechanism/protocol for discovering APIs on domains?
* How to write things? haha.

2. Distributed Code Sharing

Too many systems distrust their users. Sometimes, users can be very capable! In this open source world, it should be easy for users to edit the code on their system and improve it. I mean, that's the point. How do you share good work? Do you have to upload it somewhere and then market it? Well, that works for "celebrities," but not for everybody. Let's be smarter. Maybe we can let other machines pull our source when our machine reports it as "better" through some metric. Those machines will benchmark it themselves (to make sure differences are controlled) and may merge it into their own programs. The idea is that over time, all machines in our system will be optimal based on their collective progress.

Tools: A few machines, or just one very capable one with a xen VMM, some initial framework to work from (some code that's kinda crappy with some benchmarks).

So, there needs to be some standard way to test and build code. We don't need an entire system, just a proof of concept program that can be improved on all machines automatically by editing the code on one... and if the code is made worse, the code is not propagated. We need to test for correctness so that regressions are not propagated. We need to provide a benchmark to assert the performance of the code. So, figure out what we care about, look at improving the code and measuring the difference. Then, figure out which method is best to network the machines together and have them automatically discover new progress.

Points to consider:

* What benchmarks are important that we can measure?
* Code verification. Code should be *correct* as well as "better." Obviously we don't trust everybody... unless we do.
* Trust.
	Alright, my two possible contributions in the area of implementing distributed systems.

	1. Distributed Tag-Based File System using What We Already Have

	Margo Seltzer told us that hierarchical file systems are dead. The prevalence of the Internet, which is very much a distributed file system without a true hierarchy, seems to support this. Let's explore building a file system that uses the web (hypermedia) to discover relationships between files and explore those relationships through a file system abstraction.

	Tools: Linux + FUSE, build a rudimentary file system to start from (it will be pretty much be the solution to the CS1550 FS assignment)
	Related Work: LiFS, Quasar, Xanadu (Yes, that Xanadu)

	At minimum, we can build a nice proof of concept. Look at a website, links to pages are directories, links to resources (either in <link>, <a href="">, or Link: HTTP headers) are files. File "stat"ing is an HTTP HEAD or HTTP OPTION command. Support for some hypermedia API can be fairly easy to achieve, but exploring whether or not we can do something more general would be very interesting.

	Points to consider:

	* Files can also link to other files... how do we manage that?
	* Specialization can be nice.
	* Is there a mechanism/protocol for discovering APIs on domains?
	* How to write things? haha.

	2. Distributed Code Sharing

	Too many systems distrust their users. Sometimes, users can be very capable! In this open source world, it should be easy for users to edit the code on their system and improve it. I mean, that's the point. How do you share good work? Do you have to upload it somewhere and then market it? Well, that works for "celebrities," but not for everybody. Let's be smarter. Maybe we can let other machines pull our source when our machine reports it as "better" through some metric. Those machines will benchmark it themselves (to make sure differences are controlled) and may merge it into their own programs. The idea is that over time, all machines in our system will be optimal based on their collective progress.

	Tools: A few machines, or just one very capable one with a xen VMM, some initial framework to work from (some code that's kinda crappy with some benchmarks).

	So, there needs to be some standard way to test and build code. We don't need an entire system, just a proof of concept program that can be improved on all machines automatically by editing the code on one... and if the code is made worse, the code is not propagated. We need to test for correctness so that regressions are not propagated. We need to provide a benchmark to assert the performance of the code. So, figure out what we care about, look at improving the code and measuring the difference. Then, figure out which method is best to network the machines together and have them automatically discover new progress.

	Points to consider:

	* What benchmarks are important that we can measure?
	* Code verification. Code should be correct as well as "better." Obviously we don't trust everybody... unless we do.
	* Trust.