Skip to content

Instantly share code, notes, and snippets.

View mauliksoneji's full-sized avatar

Maulik Soneji mauliksoneji

View GitHub Profile
@mauliksoneji
mauliksoneji / gcs_spark_client.py
Created October 28, 2019 06:37
Client to read gcs data into spark dataframe
class GCSClient(object):
def __init__(self, spark, projectId):
self.spark = spark
self.spark._jsc.hadoopConfiguration().set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
self.spark._jsc.hadoopConfiguration().set("fs.AbstractFileSystem.gs.impl",
"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
self.spark._jsc.hadoopConfiguration().set("fs.gs.project.id", projectId)
#spark._jsc.hadoopConfiguration().set("fs.gs.auth.service.account.email", "/hadoop/bq/key.json")
self.spark._jsc.hadoopConfiguration().set("fs.gs.auth.service.account.enable", "true")
@mauliksoneji
mauliksoneji / bq_spark_reader.py
Created October 28, 2019 06:33
Bigquery client to read data from bigquery into spark dataframe
class BigQueryClient(object):
def __init__(self, project_id):
self.project_id = project_id
def _get_conf(self, bucket, dataset_id, table_id):
return {
"fs.gs.project.id": self.project_id,
"mapred.bq.project.id": self.project_id, # default project
"mapred.bq.gcs.bucket": bucket, # gcs bucket holding the temperory path
@mauliksoneji
mauliksoneji / cloudSettings
Last active May 6, 2019 04:58
Visual Studio Code Settings Sync Gist
{"lastUpload":"2019-05-06T04:58:27.314Z","extensionVersion":"v3.2.9"}
@mauliksoneji
mauliksoneji / .txt
Created August 28, 2017 09:45
Maulik_EthereumPublicKey
0x44Ab6709Fc0723C9ac86b192DC057B59B56DbAC9