Skip to content

Instantly share code, notes, and snippets.

@adulau
Last active March 26, 2025 05:21
Facebook 533m leak - analysis

Warning: Analysis is based on the data leaked and subject to interpretation

Format

The original leak contains a zip with various files Zip per "country" with typographic errors and geographic errors. Some files are rar and 7z too.

CSV headers

There are multiple inconsistencies of position and size in the various contry files (merged from different sources?).

The most common structure is the following.

Name Position Notes
Phone number 1 (including International code)
Facebook ID 2
First Name 3
Last Name 4
Sex 5 male,female
City? 6
Province 7
Marital Status 8
Workplace 9
Creation date 10
Email 11
DoB 12

Origin

The origin seems to be a brute-force of all phone numbers per operators via the Facebook account recovery process (vulnerability disclosed in July 2017). A large number of mobile phones are continuous for some know mobile operator pools. Nevertheless some (less than 10%) entries are unrelated phones and with less data. It could be possible that the leak is composed of mixed sources obtain from different means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment