Skip to content

Instantly share code, notes, and snippets.

BigQuery Storage Managemnt:- Internally, BigQuery stores data in a proprietary columnat format calles Capacitor., which has a number of benefits for data warehouse workloads. BigQuery users a proprietary format because it can evolve in tandem with the query engine, which takes advantage of deep knowledge of the data layout to optimize query execution. BigQuery uses query access pattern to determine the optimal number of physical shards and how they are encoded.

pcoll1 = ..........
pcoll2 = ..........
left_joined = (
{'left': pcoll1, 'right': pcoll2}
| 'LeftJoiner: Combine' >> beam.CoGroupByKey()
| 'LeftJoiner: ExtractValues' >> beam.Values()
| 'LeftJoiner: JoinValues' >> beam.ParDo(LeftJoinerFn())
)
@prabeesh
prabeesh / bigquery_multiple_delete.py
Created June 21, 2017 06:56
Helps to delete the multiple BigQuery table. Useful to delete the date sharded tables
import sys
from datetime import datetime
from dateutil import rrule
from google.cloud import bigquery
if __name__ == '__main__':
if len(sys.argv) == 3:
start_date = datetime.strptime(sys.argv[1], "%Y%m%d")
end_date = datetime.strptime(sys.argv[2], "%Y%m%d")
from facebookads import FacebookAdsApi
import facebookads.objects as objects
FacebookAdsApi.init(access_token=access_token)
user = objects.AdUser(fbid='me')
accounts = user.get_ad_accounts()
@prabeesh
prabeesh / beam_example0.py
Last active May 30, 2017 11:48
Beam examples for quick reference
output = (lines
| 'split' >> beam.Map(
lambda x: (x[:10], x[10:99]))
.with_output_types(beam.typehints.KV[str, str])
| 'group' >> beam.GroupByKey()
| 'format' >> beam.FlatMap(
lambda (key, vals): ['%s%s' % (key, val) for val in vals]))
@prabeesh
prabeesh / New line delimited JSON
Created May 29, 2016 07:11
To create new line delimited JSON from list of dict in Python3.
# creating new line delimited json
fd = StringIO()
for row in rows:
json.dump(row, fd)
fd.write('\n')
@prabeesh
prabeesh / pr.md
Last active August 29, 2015 14:27 — forked from piscisaureus/pr.md
Checkout github pull requests locally

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = git@github.com:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this: