solso/potential_improvement_bluetooth_contact_tracing.md

## potential_improvement_bluetooth_contact_tracing.md

      
    Raw
  

              potential_improvement_bluetooth_contact_tracing.md
            
          
    Potential Improvements of Bluetooth Contact Tracing

Motivation

Contact tracing is a key factor for containment on epidemics. Multiple apps are attempting to do this via storing bluetooth proximity contacts on the device. For instance,

TraceTogether
Covid Watch
Private Kit
Probably many others...

Great effort! This post contains potential improvements both of privacy and utility, feel free to use or abuse.
If you know someone developing this kind of apps, please, send them the post.
Description

Assume that every device stored close-contacts via Bluetooth with time-stamps for a long period of time. S = [ <bid, t>_i  … <bid, t>_j]. Each device has a bid (unique-identifier, BL_ADDR). Such solution already exists in the aforementioned apps.
Any person will see all close-contacts he has had over a long period of time. Trivially, there could a central repository in which infected people (once positive-tested or clearly symptomatic), could report themselves. The central service could broadcast the list of contagious C_s = [bid_i, … bid_j]. This poses some privacy concerns as people reporting are revealing their bid to the world (will address this issue later).
So that devices could validate if they have become on close-contact with an infected person.
As a boolean,
intersection(bid in S, C_s)  != empty
Probabilistically,
p_s = 1 – exp(-B*n)
where B is the contagiousness rate per exposure,
n is the total number of exposures, multiple exposures with the same person would give and n>1, single exposure over a prolonged period of time would give n>1.
if one assumes to have a good estimation per B on a minute basis, each bluetooth contact should be normalized by its duration.
So far, this is what I understand TraceTogether and PrivacyKit are doing. [TraceTogether is doing a bit more than that as the contact list S of the infected person is also revealed to the authorities, so that they can proactively contact the contacts rather
than the contacts be informed by the app.]
Issues and Potential Solutions

There are at least two issues that – if not already addressed – they can be,
1) Privacy of Infected people is compromised

Broadcasting plain bids (the C_s) to the world is easily avoidable. bid could be hashed (e.g. SHA256) to prevent identifiability. The devices can then run the same hash function H locally,
intersection(H(bid) in S, C_s)  != empty
This would leak the real bid only to those who have been in direct contact, as opposed to the whole population.
2) Contacts should be Transitive (contacts are not lists, but networks)

The bluetooth contacts (S) stored on each person’s device only account for first degree contacts. However, it would be useful to also use information of the contacts of the contacts (2nd degree) and so on. As it would provide a more accurate risk metric.
Having had a contact with someone (who is still healthy) but have had contacts with people (who are contagious) is valuable information. To calculate than locally it would be required to know the contacts of all contacts, which is unfeasible.
To achieve that, indirectly, we could use an iterative approach. Each person calculates the risk of 1st degree contacts as,
p_s = 1 – exp(-Bn)*
(as described before)
Then, each device will send to a central server the tuple,
<H(bid), p_s> iif p_s > threshold.
Instead of sending the bid only when someone is contagious, the server will receive probabilities for all the people on a population with a risk larger than a threshold. These probabilities can then be broadcasted to recalculate the risk based on 2nd degree contacts and so on,
p_s = 1 – exp(-B*m)
where m is no longer the total number of exposures, but it has to be weighted by the p_s of the bid_i contacted.
If this process is done iteratively, even asynchronously, it will converge after few iterations. That will give a better risk estimation because not only your contacts will be considered, but also the contacts of contacts and so on.
To help convergence and also avoid all population having to report their bids, only risks above a certain threshold should be reported, assuming zero if not present.
Devices would have to download much larger set S,  as they do not contain bids  (or H(bits) known to be contagious all bids whose p_s > threshold. The full list, luckly, can be trivially sharded by taking the first k bits of the hash space.
Sharding aside, the server would only see see reads and writes of <H(bid), p_s>. Thus, a simple key-value store would suffice. Where the hash of the bid is the key and p_s is a double in the threshold..1.0 range.
Self-reporting is of course open to malicious attacks, where fake p_s could be injected.
Additional Considerations

To prevent an attacker to scrape the server to fetch the risks based on H(bid) of known devices.
To prevent this attack, or rather to limit the damage of the attack, the sender could write them encrypted with a rotating key (e.g. daily) to the server. The key would be transfered only via direct bluetooth contact (on the BLADDR?, unclear if enough space) so that only people who has had a real contact should know the current keys for each other, and therefore, decrypt the latest p_s stored on the server. The server would not hold <H(bid), p_s>, but <H(bid), [<E(p_s), k_1> ... <E(p_s), k_n>]>, where n is the number of days for which users would share access to p_s value with their direct contacts.
Conclusions

If this is something useful for you and need help to implement it count me in. Further clarifications, discussions, ideas are always welcome on the comments.