Skip to content

Instantly share code, notes, and snippets.

View RaulMedeiros's full-sized avatar

Raul Medeiros RaulMedeiros

  • Fortaleza
View GitHub Profile
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"datetime": pd.to_datetime(np.random.randint(1582800000000000000, 1583500000000000000, 100, dtype=np.int64))})
fig, ax = plt.subplots()
df["datetime"].astype(np.int64).plot.hist(ax=ax)
labels = ax.get_xticks().tolist()
labels = pd.to_datetime(labels)
ax.set_xticklabels(labels, rotation=90)
import pandas as pd
x = pd.DataFrame()
isinstance(x, pd.DataFrame) and not x.empty
>>>False
x = None
isinstance(x, pd.DataFrame) and not x.empty
>>>False
x = pd.DataFrame([1,2,3])
@RaulMedeiros
RaulMedeiros / Data Quality
Created August 10, 2020 11:48
Data Quality
Data Quality
Validity: How closely the data meets defined business rules or constraints. Some common constraints include:
>Mandatory constraints: Certain columns cannot be empty
>Data-type constraints: Values in a column must be of a certain data type
>Range constraints: Minimum and maximum values for numbers or dates
>Foreign-key constraints: A set of values in a column are defined in the column of another table containing unique values
>Unique constraints: A field or fields must be unique in a dataset
Regular expression patterns: Text fields will have to be validated this way.
>Cross-field validation: Certain conditions that utilize multiple fields must hold
CREATE OR REPLACE FUNCTION query_builder(tab_name text) RETURNS table ( aaa int ) LANGUAGE plpgsql AS $$
DECLARE
built_query varchar;
BEGIN
SELECT 'SELECT (
cast(f."' || STRING_AGG (column_name, '" IS NULL as int) + cast(f."') || '" IS NULL as int)) as sum_nulls from public."'|| tab_name ||'" as f; ' into built_query
from information_schema.columns
where table_name = tab_name
group by table_name;
RETURN query EXECUTE built_query;
CREATE OR REPLACE FUNCTION qtd(nomeTabela text) RETURNS integer LANGUAGE plpgsql AS $$
DECLARE
resultado integer;
BEGIN
EXECUTE 'SELECT count(1) FROM ' || nomeTabela INTO resultado;
RETURN resultado;
END; $$;
SELECT column_name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'table_name';
docker exec -i container_id psql -U postgres -c "
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name;
"
Example 1: List of lists
A list of multiple arguments can be passed to a function via pool.map
(function needs to accept a list as single argument)
Example: calculate the product of each data pair
import multiprocessing
import numpy as np
@RaulMedeiros
RaulMedeiros / fletten_dict
Created April 8, 2020 14:43
fletten_dict
from pandas.io.json.normalize import nested_to_record # _normalize for new versions
def fletten_dict(dict_):
return {tuple(k.split('.')):v for k, v in nested_to_record(dict_).items()}
procss_data_dict_flatten = fletten_dict(procss_data_dict)
procss_data_dict_flatten.keys()
def safe_run(func):
def func_wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as e:
print(e)