Skip to content

Instantly share code, notes, and snippets.

@garystafford
Created January 2, 2020 19:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garystafford/4d4a1b55d6b8191b89c4179ee834d5bc to your computer and use it in GitHub Desktop.
Save garystafford/4d4a1b55d6b8191b89c4179ee834d5bc to your computer and use it in GitHub Desktop.
GlueJobRatesToParquet:
Type: AWS::Glue::Job
Properties:
GlueVersion: 1.0
Command:
Name: glueetl
PythonVersion: 3
ScriptLocation: !Sub "s3://${ScriptBucketName}/glue_scripts/rates_xml_to_parquet.py"
DefaultArguments: {
"--s3_output_path": !Sub "s3://${DataBucketName}/electricity_rates_parquet",
"--source_glue_database": !Ref GlueDatabase,
"--source_glue_table": "electricity_rates_xml",
"--job-bookmark-option": "job-bookmark-enable",
"--enable-spark-ui": "true",
"--spark-event-logs-path": !Sub "s3://${LogBucketName}/glue-etl-jobs/"
}
Description: "Convert electrical rates XML data to Parquet"
ExecutionProperty:
MaxConcurrentRuns: 2
MaxRetries: 0
Name: rates-xml-to-parquet
Role: !GetAtt "CrawlerRole.Arn"
DependsOn:
- CrawlerRole
- GlueDatabase
- DataBucket
- ScriptBucket
- LogBucket
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment