Skip to content

Instantly share code, notes, and snippets.

View eng-rodrigocunha's full-sized avatar

Rodrigo Cunha eng-rodrigocunha

View GitHub Profile
@eng-rodrigocunha
eng-rodrigocunha / mergePdf.gs
Last active June 6, 2023 19:07
Merge PDF based on a list of Google Drive PDF links on a spreadsheet
// https://tanaikech.github.io/2023/01/10/merging-multiple-pdf-files-as-a-single-pdf-file-using-google-apps-script/
async function mergePDF() {
// Informe o ID da planilha onde estão os links
var planilhaId = "1Iq6DNGk8XkfOX3kp2HDmDb1Ac7vK76brkJc6T6oydNc";
// Informe o nome da planilha que contém os links
var nomePlanilha = "LINKS";
// Obter a planilha
var planilha = SpreadsheetApp.openById(planilhaId);
@eng-rodrigocunha
eng-rodrigocunha / download_gcs.py
Created March 17, 2023 02:33
Realiza download de bucket no GCS e procura quais arquivos possuem determinada condição
import basedosdados as bd
import pandas as pd
import glob
bd.config.project_config_path = "D:\\basedosdados\\staging"
for hour in range(14, 24, 1):
print(hour)
st = bd.Storage(dataset_id="br_rj_riodejaneiro_onibus_gps", table_id="registros")
st.download(savepath=".", partitions=f"data=2023-03-08/hora={hour}", mode="staging")
@eng-rodrigocunha
eng-rodrigocunha / get_vaccination_status.gs
Last active March 12, 2023 02:29
Realiza scrapping na Carteira Nacional de Vacinação Digital ou no Certificado Nacional de Vacinação Covid-19 emitido através do ConecteSUS para identificar quantas doses de COVID-19 foram administradas
/*
* Convert PDF file to text
* @param {string} fileId - The Google Drive ID of the PDF
* @param {string} language - The language of the PDF text to use for OCR
* return {string} - The extracted text of the PDF file
* https://www.labnol.org/extract-text-from-pdf-220422
* IMPORTANT! https://www.labnol.org/shared-drives-google-script-220128
*/
const convertPDFToText = (fileId, language) => {
@eng-rodrigocunha
eng-rodrigocunha / get_vaccination_status.py
Last active March 12, 2023 01:58
Realiza scrapping na Carteira Nacional de Vacinação Digital ou no Certificado Nacional de Vacinação Covid-19 emitido através do ConecteSUS para identificar quantas doses de COVID-19 foram administradas
#!pip install pdfminer.six
import io
from pdfminer.high_level import extract_text
doses = ["Reforço", "Dose Adicional", "2/2", "1/2"]
# abrir o arquivo PDF
with open(r'E:\DOCUMENTOS PESSOAIS\Carteira Nacional de Vacinação Digital_4_DOSE.pdf', 'rb') as f:
# extrair o texto do PDF
text = extract_text(f)
@eng-rodrigocunha
eng-rodrigocunha / mail_web_scrapping.py
Last active March 12, 2023 01:59
Realiza web scrapping para coletar todos os e-mails de determinado conjunto de páginas web
#!pip install requests
#!pip install beautifulsoup4
# https://stackoverflow.com/questions/63533115/extract-valid-email-address-using-regular-expression-and-beautifulsoup
import requests
import re
from bs4 import BeautifulSoup
email = re.compile(r'([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+){0,}')
@eng-rodrigocunha
eng-rodrigocunha / dbt_to_dbdiagram.rb
Created March 1, 2023 22:50
Ruby code to convert dbt yml to dbdiagram.io format
#!/usr/bin/env ruby
# Generate a dbdiagram for dbdiagram.io from a dbt project.
#
# Usage:
# 1. Write your model schema.yml (there's another code in this gist to make it automatically)
# 2. Run `dbt docs generate` first.
# 3. Run `dbt_to_dbdiagram.rb`
# 4. Paste the output in https://dbdiagram.io/
require 'yaml'
@eng-rodrigocunha
eng-rodrigocunha / bigquery_schema_generator.sql
Last active March 22, 2024 21:39
dbt schema.yml generator query using the information_schema of the generated tables for BigQuery
WITH
columns AS (
SELECT
" " || "- name: " || column_name || "\n" ||
" " || ' description: "' || column_name || '"' AS column_statement,
table_name
FROM
`rj-smtr.veiculo`.INFORMATION_SCHEMA.COLUMNS ),
tables AS (
SELECT
# Sumário por quinzena e consórcio
WITH
sumario AS (
SELECT
EXTRACT(YEAR
FROM
DATA) AS ano,
EXTRACT(MONTH
FROM
DATA) AS mes,
@eng-rodrigocunha
eng-rodrigocunha / pdf_reduct.py
Created February 18, 2023 17:57
Reduct pdf sensitive content
#!pip install pdf-redactor
import re
from datetime import datetime
import pdf_redactor
## Set options.
options = pdf_redactor.RedactorOptions()