Skip to content

Instantly share code, notes, and snippets.

@sungchun12
Created September 29, 2022 18:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sungchun12/5cb966f64be1bbe04cf427b0f843d127 to your computer and use it in GitHub Desktop.
Save sungchun12/5cb966f64be1bbe04cf427b0f843d127 to your computer and use it in GitHub Desktop.
Use this to generate fake data in your dbt pipelines as an alternative to dbt seeds with csv files: https://www.loom.com/share/90084f27396746619d4f53f44143faab
from faker import Faker
import pandas as pd
fake = Faker()
def create_rows_faker(num=1):
output = [{"name":fake.name(),
"address":fake.address(),
"name":fake.name(),
"email":fake.email(),
#"bs":fake.bs(),
"city":fake.city(),
"state":fake.state(),
"date_time":fake.date_time(),
#"paragraph":fake.paragraph(),
#"Conrad":fake.catch_phrase(),
"randomdata":100} for x in range(num)
]
return output
def model( dbt,_):
dbt.config(
materialized='table',
packages = ['Faker'] # how to import python libraries in dbt's context
)
df = pd.DataFrame(create_rows_faker(num=100))
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment