Skip to content

Instantly share code, notes, and snippets.

View salvadorgascon's full-sized avatar

Salvador Gascon salvadorgascon

View GitHub Profile
@salvadorgascon
salvadorgascon / bigquery_truncate_table.py
Created February 23, 2024 13:03
Truncate table in Google Cloud BigQuery
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
from google.api_core.exceptions import Conflict
from shared.loaders.bigquery_constants import BIGQUERY_PROJECT_NAME
print(f'Contando registros en {BIGQUERY_PROJECT_NAME}.{bigquery_dataset_name}.{bigquery_table_name}')
query_count = f'SELECT COUNT(*) as record_count FROM {BIGQUERY_PROJECT_NAME}.{bigquery_dataset_name}.{bigquery_table_name}'
query_count_job = bigquery_client.query(query_count)
@salvadorgascon
salvadorgascon / bigquery_constants.py
Last active May 25, 2024 18:45
Load rows into Google Cloud Big Query table object using incremental batch
BIGQUERY_PROJECT_NAME = "project-XXXXXX"
@salvadorgascon
salvadorgascon / bigquery_constants.py
Created February 23, 2024 12:53
Check if exists a table object in Google Cloud BigQuery
BIGQUERY_PROJECT_NAME = "project-XXXXXX"
@salvadorgascon
salvadorgascon / bigquery_dataset_exists.py
Created February 23, 2024 12:52
Check if exists a dataset object in Google Cloud BigQuery
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
def BigQueryDatasetExists(bigquery_client, bigquery_dataset_name):
exists_dataset = False
print("Checking dataset " + bigquery_dataset_name, '...', end="")
try:
bigquery_client.get_dataset(bigquery_dataset_name)
@salvadorgascon
salvadorgascon / bigquery_constants.py
Last active February 23, 2024 12:54
Create table object in Google Cloud BigQuery
BIGQUERY_PROJECT_NAME = "project-XXXXX"
@salvadorgascon
salvadorgascon / bigquery_constants.py
Last active February 23, 2024 12:55
Create Dataset object in Google Cloud BigQuery
BIGQUERY_PROJECT_NAME = "project-XXXXX"
@salvadorgascon
salvadorgascon / pdf_string_transformer.py
Created February 23, 2024 12:48
Python transformer to convert a bytes array containing PDF data into a string
import datetime
import os
from PyPDF2 import PdfReader
def PdfTextTransformer(pdf_binary, tmp_path):
print("Reading PDF")
filename = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
@salvadorgascon
salvadorgascon / excel_dataframe_transformer.py
Created February 23, 2024 12:46
Python transformer to convert a bytes array containing EXCEL data into a Panda Object
import datetime
import os
from pandas import read_excel
def ExcelDataframeTransformer(excel_binary, tmp_path):
print("Reading EXCEL")
filename = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
print("Saving EXCEL into", tmp_path+'/'+filename+'.xls')
@salvadorgascon
salvadorgascon / csv_dataframe_transformer.py
Last active February 23, 2024 12:46
Python transformer to convert a string containing CSV data into a Panda Object
import datetime
import os
from pandas import read_csv
def CsvDataframeTransformer(csv_string, tmp_path):
print("Reading CSV")
filename = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
print("Saving CSV into", tmp_path+'/'+filename+'.csv')
@salvadorgascon
salvadorgascon / bigquery_connection.py
Created February 23, 2024 12:41
Python connection to Google Cloud BigQuery
import os
from pathlib import Path
from google.cloud import storage
# Use Google Cloude IAM to generate valida keys and set permissions
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.join(Path(__file__).parent,"google-keys.json")
print("Connecting to Google Cloud BigQuery ...", end="")
bigquery_client = bigquery.Client()
print("OK")