- Data sanitizing - supressing identifiers
- k-Anonymity (Sweeney & Samarai, 1998) - each individual contained in dataset is indistinguisable from k-1 other users
In practice, it works by a combination of supressing identifiers and bucketing values https://en.wikipedia.org/wiki/K-anonymity The algorithm k-Optimize by Bayardo and Agrawal (2005) approximates k-Anonymity . It aims to perform the "lowest cost" anonymization - meaning it supresses and aggregates data a little as possible in order to achieve the required "k" not great for high-dimensional datasets
finding the optimal k-anonymization is a powerset search problem
- l-Diversity - remedies the "homogeneity" attack where sensitive values are not "well represented" in a dataset. Even if a dataset is k-annonymized, sensitve information can be discerned if one knows that a particular individual will fall in some group K.
https://www.utdallas.edu/~muratk/courses/privacy08f_files/ldiversity.pdf
Homogeneity Attack: Alice and Bob are antagonistic
neighbors. One day Bob falls ill and is taken by ambulance
to the hospital. Having seen the ambulance, Alice sets out
to discover what disease Bob is suffering from. Alice discovers
the 4-anonymous table of current inpatient records
published by the hospital (Figure 2), and so she knows that
one of the records in this table contains Bob’s data. Since
Alice is Bob’s neighbor, she knows that Bob is a 31-year-old
American male who lives in the zip code 13053. Therefore,
Alice knows that Bob’s record number is 9,10,11, or 12.
Now, all of those patients have the same medical condition
(cancer), and so Alice concludes that Bob has cancer.
So, l-diversity will ensure that, for each group k of individuals, sensitive values are well represented, so that Alic can't conclude that Bob has cancer, because several other conditions are also represented. Note - I think this means fudging the data!
- Differential Privacy