DonnchaC/gist:03ad5cd0b8ead0ae9e30

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    What project would you like to work on? Use our ideas lists as a starting point or make up your own idea. Your proposal should include high-level descriptions of what you're going to do, with more details about the parts you expect to be tricky. Your proposal should also try to break down the project into tasks of a fairly fine granularity, and convince us you have a plan for finishing it. A timeline for what you will be doing throughout the summer is highly recommended.

I'm particularly interested in Tor onion services. I hope that they can facilitate widely deployed self-authenticating encrypted communication channels at scale.
Anecdotal reports suggest that the current onion services infrastructure does not scale well [1, #8902]. In particular onion service introduction points are susceptible to hammering by clients and malicious attackers. Denial of service attacks where a single Tor process is overloaded by an attackers have also been experienced recently [#15463].
In contrast to modern distributed web service architecture, there is little scope at present for onion services to load balance across multiple physical servers (similar to DNS round-robin). Application level solutions such as routing web requests using tools such as HAProxy are less than ideal.
In my project I would like to develop a robust tool for increasing for load balancing Tor onion service requests across multiple back-end Tor instances, thereby increasing availability and reliability.
This project's current design involves collating the set of introduction points created by one or more independent Tor onion service instances into a single 'master' descriptor. It is hoped that much of the development in this project can happen independently of little-t-tor allowing for more rapid testing and optimization of potential load balancing strategies.
Use Cases


A popular onion service with an overloaded web server or Tor process:
A service such as Facebook which gets a large number of users would like to distribute client requests across multiple servers as the load is too much for a single Tor instance to handle. They would also like to balance between instances when the 'encrypted services' proposal is implemented [2555].


Redundancy and automatic failover:
A political activist would like to keep their web service accessible and secure in the event that the secret police seize some of their servers. Clients should ideally automatically fail-over to another online instances with minimal service disruption.


'Shared Hosting' scenarios:
A hosting provider wishes to allow their customers to access their shared hosting control panel over an encrypted onion service. Rather than creating an individual onion service (with corresponding overhead) for thousands of customers, the host could instead run one onion service. Multiple service descriptors could then be published under unique customer onion addresses which would then be routed to that users control panel. This could also enable a low-resourced OnionFlare implementation [2].


Secure Onion Service Key storage:
An onion service operator would like to compartmentalize their permanent onion key in a secure location separate to their Tor process and other services. With this proposal permanent keys could be stored on an independent, isolated system.


Proposal

The current proposed solution involves an operator running a standard onion service for each of their instances. Each onion service instances would be running with a unique onion service key. These services should use "stealth" authorization to obscure the list of introduction points in their service descriptor.
A standalone management service would then periodically fetch service descriptors from the HSDir system for each of it's configured onion service instances. The descriptors would be parsed by the management service to extract the current set of introduction points for all online service instances.
The management service would select a set of the valid introduction points and combine them within a new 'master' service descriptor. This master descriptor would be signed by the actual onion service permanent key and published to the HSDir system as normal.
Clients who wish to access the onion service would then retrieve the 'master' service descriptor and begin connect to introduction points at random from the introduction point list. After successful introduction the client will have created an onion service circuit to one of the available onion services instances and can then begin communicating as normally along that circuit.
This design has an number of advantages:

It can be deployed with minimal modifications to the little-t-tor code base.
The onion service key can be protected by only needing to be stored in one location.
Instances can be added or removed in a single location.
The management service would be resistant the currently known denial-of-service or discovery attacks as it would not necessarily need to run any listening onion services itself.

Core Project Goals


Engage with the operators of popular or resource intensive onion services to understand their use cases and the scaling problems that they currently experience in production.


Write a Python-based application which manages the creation, signing and publication of the combined onion service descriptors.


Develop test cases for the code base to ensure that it is stable and reliable.


Package application in the Python Package Index


Produce clear documentation for configuring, maintaining and securing an OnionBalance-backed onion service.


Other Potential Features


Improved onion service key storage
At present onion service keys must be stored unencrypted on disk and be accessible to the Tor process. I propose implementing optional onion service descriptor signing via the PKCS#11 interface, allowing permanent keys to be stored securely on a smart card or a hardware security module.


Utilities for onion service operators
Creating utilities for service operators to perform actions such as generating self-signed SSL certificates signed which are verified by their permanent key.


Allow for the submission of an onion service to indexers such as ahmia.fi


Write general documentation for scaling onion services
The documentation should include strategies to maximizing the performance of Tor onion services (such as guard selection parameters). These approaches would likely need to be tailored depending on the anonymity criteria of the service.


Timeline


1/6/15: Start Date


1/6/15-5/6/15: Engage with onion service operators to understand their use cases and constraints


8/6/15-3/7/15: Implement a management service in Python with accompanying unit tests.


6/7/15-18/7/15: Gather feedback from the community and tweak the protocol to improve performance and reliability based on responses and investigation of the outlined open questions. Create little-t-tor improvement patches if necessary.


20/7/15-31/7/15: Write documentation outlining how to configure the load balancer and general advice for increasing Tor onion service performance.


3/8/15-12/8/15: Implement extra features as time allows such as secure key storage, index submission (ahmia.fi), and onion service operator utilities. Further time to spend examining open research questions.


18/8/15-31/8/15: Buffer time to finish project work including packaging, testing and documentation.


1/9/15: End Date


Open Questions

I feel that the following questions should be investigated and considered during this project in order to develop a secure and practical solution. Some of theses questions are likely of general interest to the Tor community:


Investigate the optimal selection of introduction points for an onion service descriptor. It is unclear at this stage whether if it is better to provide a changing subset of introduction points in each descriptor, preferentially select the longest lived (i.e most stable) or the introduction points selected most recently by the instances.


Is a two directional channel needed between the management service and the onion service instances to obtain timely, valid introduction point data? The current polling design may need to be extended if race conditions which affect performance are encountered when fetching and publishing sets of introduction points. It may be possible to mitigate these problems by verifying the introduction points before including them in a descriptor.


How does the selection of introduction points, and the number of instances influence the anonymity or security properties of onion services? Is it possible to reliably obscure the number of onion service instance, and their availability. Perhaps descriptors should always contain a fixed number of introduction points (generating false entries if necessary) with a privacy/performance trade off.


What issue arise when the management server retrieves an old descriptor as a result of churn in the DHT. Maybe the management server should sequentially poll the HSDir set in order to maximize the chances of retrieving a fresh descriptor.


Is it possible to minimize a single point of failure with the management server. There may be a way of doing automatic fail-over from one management server to another.


What advantages or limitations would "Next Generation Hidden Services" proposal confer on the proposed design? Scaling solutions should be developed with migration to this proposal in mind.


Investigate the choice of (potentially non-consensus) relays as stable introduction points for large 'encrypted services'-type onion services.


Point us to a code sample: something good and clean to demonstrate that you know what you're doing--ideally from an existing project.

I've recently been working on a proof-of-concept for my Tor SoP project proposal. The code base is available on GitHub at https://github.com/DonnchaC/onion-balance. While the current design is rough, I think it is representative of my current coding style.
I currently have a pair of demo onion service instances managed by OnionBalance deployed on http://fqyw6ojo2voercr7.onion/. If you select 'New Identity' after loading the service you should eventually be able to access both back-end instances.
Why do you want to work with The Tor Project in particular? Tell us about your experiences in free software development environments. We especially want to hear examples of how you have collaborated with others rather than just working on a project by yourself.

I enjoy the challenge of working on practical projects within the constraints of a real and adversarial threat model. The Tor Project's work combines my interests in digital security, cryptography and distributed systems.
I have contributed to a number of free software projects in the area of internet privacy. I've previously collaborated on the Python-based SecureDrop project where I've submitted patches to resolve issues.
Last summer I developed OnionTip.com (https://github.com/DonnchaC/oniontip) which allows donations to Tor relay operators relative to how much bandwidth they contribute to the network. The project has received a good response and has helped to facilitate approximately $4400 USD in Bitcoin donations at the current market price. ~27% of Tor relays (by consensus weight) are now listed on OnionTip.
Recently I have submitted a patch (#3523) to little-t-tor to enable greater interaction between the Tor control port and the HSDir system. I've contributed code for parsing onion service descriptors which has been merged into stem. I have also developed tools for detecting malicious HSDirs on the Tor network. As such I feel that I have a strong understand of the onion service subsystem in Tor and as such I am particularly well placed to successfully complete this project.
Will you be working full-time on the project for the summer, or will you have other commitments too (a second job, classes, etc)? If you won't be available full-time, please explain, and list timing if you know it for other major deadlines (e.g. exams). Having other activities isn't a deal-breaker, but we don't want to be surprised.

I will be finishing my final university exams during the second week of May. From that point forward I will be able to focus completely on my Tor Summer of Privacy project.
I hope to work full time on the project (35-40 hrs / week) during the course of this summer. I don't foresee any other major work commitments outside of this project.
Will your project need more work and/or maintenance after the summer ends? What are the chances you will stick around and help out with that and other related projects?

If this project is selected I hope to develop a tool which is suitable for onion service operators to deploy in production. I plan to continue supporting the project and contributing to the Tor ecosystem after the summer ends.
What is your ideal approach to keeping everybody informed of your progress, problems, and questions over the course of the project? Said another way, how much of a "manager" will you need your mentor to be?

I am accustomed to working quite independently on projects. During this project I plan to keep the community and mentors updated with blog posts describing progress with the project. I would also aim to submit biweekly status reports with my progress.
What school are you attending? What year are you, and what's your major/degree/focus? If you're part of a research group, which one?

I'm currently in the final (4th) year of my Medicinal Chemistry undergraduate degree in Trinity College, Dublin, Ireland.
How can we contact you to ask you further questions? You can send emails to tor-assistants@…. In addition, what's your IRC nickname? Interacting with us on IRC will help us get to know you, and help you get to know our community.

I'm generally available to communicate via XMPP during the time I spend online. I idle as DonnchaC on Freenode and OFTC and have been following and engaging in discussion on #tor-dev for the past couple of months.

Email: donncha@donncha.is (0x3B0D706A7FBFED86)
XMPP: donncha@donncha.is  (OTR Fingeprints: http://donncha.is/otr.txt)

I'm responsive to email and generally reply to communications within hours.
Is there anything else that we should know that will make us like your project more?

I'm conscious that there is some overlap between some aspects of my proposal and tasks which are covered under Sponsor R's task list. With this project I hope to maximize my utility to the Tor community by focusing on approaches and work which is not otherwise funded.
I'm happy to modify or tune my proposal in any way that help maximize return to the community from Tor's limited funding.
References

[1] https://blog.torproject.org/blog/hidden-services-need-some-love
[2] tor2web/Tor2web#228
https://lists.torproject.org/pipermail/tor-dev/2013-October/005606.html
http://archives.seul.org/or/talk/Mar-2015/msg00184.html
http://cbaines.net/projects/tor/disths/report.pdf

Thank you to asn, arma and s7r for their enlightening feedback on this proposal.