Skip to content

Instantly share code, notes, and snippets.

@rajesh-s
Created January 10, 2020 05:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajesh-s/c9cb58e5641faaee0bc6db49d80630ee to your computer and use it in GitHub Desktop.
Save rajesh-s/c9cb58e5641faaee0bc6db49d80630ee to your computer and use it in GitHub Desktop.
Food for thought on finding a way to address privacy challenges

Understanding privacy challenges in the age where data is everything

The Problem

  • Most critical data is in the form of plain text (CVV, phone numbers, full name, email, SSN etc.).
  • Plaintext unlike an image (/soundtrack/email) lacks metadata indicating the source of the data. [ELI5: metadata is a set of attributes such as last modified, device created on etc. which is always associated with the file (such as an image) even if someone copies it to a new device]
  • Now the problem with this is that a decade ago we shared important credentials only with a few sources. But now there are a ton of different apps and services with long list of convoluted clauses that make it extremely difficult to know the implications of sharing data.
  • For example, there have been cases where services like TrueCaller sell user datasets for money to advertisers etc.
  • The issue here is that once an adversary has gained access to private data such as email/card number/phone number there is no way to find the originating source. In case of an image or a document it's possible to do so (provided the metadata is not tampered with)

What did I try to do about it?

  • Of course, we all wish there was something like a traceroute command to find out who leaked my data. But that's not feasible.
  • To find out how random folks were getting access to my data I did something - I renamed my first and last name on each profile such as Truecaller to "Truecaller Truecaller" and so on. So now everytime I would know the source of my data leak.
  • And within a few days I saw a message like this
    Hi Truecaller Truecalle, PRICE DROP ALERT! FLAT 85% OFF on ROLEX, RADO Watches. Clearance Sale, 2 Years Warranty. Visit: https://bit.ly/2LkZz0Q
  • This is perfect, now I know what apps are bad.

One feasible solution

  • The example above may not work for other services like your money wallet or social networks since you are forced to provide a valid name.
  • For this reason, I believe we need to figure out a way to have unmodifiable metadata with plain textual data as well where privacy is a concern.
  • Forcing an implementation like that will ensure that the user will know the source of private data leak.
  • If every place I've entered my critical private data is associated with a unique ID as metadata this problem would have a solution.
  • There are examples where something like this has worked already. One can determine the IP/device/location from which an email was sent etc.

Why do we need something like this?

  • With a ton of different devices constantly transmitting all types of data including plain text. It's an increasingly difficult problem because everyone likes to claim that our data is private. This could be IoT devices, CCTV, software services or even that online course you signed up for or an internet feedback form that you filled.
  • Enforcing legal action requires evidence and we need to figure out a way to be able to determine the source of data.

Let me know what are your thoughts on this!

@claui
Copy link

claui commented Jan 10, 2020

I use a dedicated domain for my email addresses. When I sign up, I use a unique username before the @ sign. A catch-all rule forwards it all to my inbox but I can still see the original recipient address in the headers. That way, I always know where the account leaked from originally.

@rajesh-s
Copy link
Author

@claui That's a good method for going about emails.

But what I tried to highlight here is that we're not treating privacy critical plain text (from other sources such as credit card numbers and phone numbers) with the importance that it deserves. Today we don't have a mechanism to hold the receiving end responsible for misuse of such information. Metadata is the one way I can think of to implement this because there are a lot of technology unaware people whose data is at stake here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment