rajesh-s/privacy_challenges.md

## privacy_challenges.md

      
    Raw
  

              privacy_challenges.md
            
          
    Understanding privacy challenges in the age where data is everything

The Problem


Most critical data is in the form of plain text (CVV, phone numbers, full name, email, SSN etc.).
Plaintext unlike an image (/soundtrack/email) lacks metadata indicating the source of the data. [ELI5: metadata is a set of attributes such as last modified, device created on etc. which is always associated with the file (such as an image) even if someone copies it to a new device]
Now the problem with this is that a decade ago we shared important credentials only with a few sources. But now there are a ton of different apps and services with long list of convoluted clauses that make it extremely difficult to know the implications of sharing data.
For example, there have been cases where services like TrueCaller sell user datasets for money to advertisers etc.
The issue here is that once an adversary has gained access to private data such as email/card number/phone number there is no way to find the originating source. In case of an image or a document it's possible to do so (provided the metadata is not tampered with)

What did I try to do about it?


Of course, we all wish there was something like a traceroute command to find out who leaked my data. But that's not feasible.
To find out how random folks were getting access to my data I did something - I renamed my first and last name on each profile such as Truecaller to "Truecaller Truecaller" and so on. So now everytime I would know the source of my data leak.
And within a few days I saw a message like this
Hi Truecaller Truecalle, PRICE DROP ALERT! FLAT 85% OFF on ROLEX, RADO Watches. Clearance Sale, 2 Years Warranty. Visit: https://bit.ly/2LkZz0Q

This is perfect, now I know what apps are bad.

One feasible solution


The example above may not work for other services like your money wallet or social networks since you are forced to provide a valid name.
For this reason, I believe we need to figure out a way to have unmodifiable metadata with plain textual data as well where privacy is a concern.
Forcing an implementation like that will ensure that the user will know the source of private data leak.
If every place I've entered my critical private data is associated with a unique ID as metadata this problem would have a solution.
There are examples where something like this has worked already. One can determine the IP/device/location from which an email was sent etc.

Why do we need something like this?


With a ton of different devices constantly transmitting all types of data including plain text. It's an increasingly difficult problem because everyone likes to claim that our data is private. This could be IoT devices, CCTV, software services or even that online course you signed up for or an internet feedback form that you filled.
Enforcing legal action requires evidence and we need to figure out a way to be able to determine the source of data.

Let me know what are your thoughts on this!