Skip to content

Instantly share code, notes, and snippets.

@MattFaus
Created March 22, 2014 00:28
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MattFaus/9699292 to your computer and use it in GitHub Desktop.
Save MattFaus/9699292 to your computer and use it in GitHub Desktop.
The mapreduce job we use to transform datastore backups into JSON files that we then load into BigQuery.
mr_pipeline = mapreduce_pipeline.MapperPipeline(
'bq_property_transform',
'extbackup.bq_property_transform.property_transform_mapper',
'third_party.mapreduce.input_readers.RecordsReader',
'third_party.mapreduce.output_writers.FileOutputWriter',
params={
'input_reader': {
'files': list_backup_files(kind, backup_date),
},
'output_writer': {
'filesystem': 'gs',
'mime_type': 'text/plain',
'output_sharding': 'input',
'gs_bucket_name': get_output_path(kind, backup_date),
},
'backup_date': backup_date.encode('UTF-8'),
'kind': kind.encode('UTF-8'),
},
shards=N_SHARDS,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment