Skip to content

Instantly share code, notes, and snippets.

@g-a-d
Created August 24, 2017 10:56
Show Gist options
  • Save g-a-d/033c57ec013b9a30e7062d3975159fb8 to your computer and use it in GitHub Desktop.
Save g-a-d/033c57ec013b9a30e7062d3975159fb8 to your computer and use it in GitHub Desktop.
Adding parameters to a pyspark job in AWS Glue
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.types import *
from awsglue.dynamicframe import DynamicFrame
## @params: [JOB_NAME, CUSTOM1, CUSTOM2, CUSTOM3]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'CUSTOM1', 'CUSTOM2', 'CUSTOM3'])
# use args as usual:
print "Custom 1 is {}".format(args['CUSTOM1'])
@g-a-d
Copy link
Author

g-a-d commented Aug 24, 2017

NOTE that in order for this to work, the key-value pairs you trigger your job with should be of the form:
--CUSTOM1 value1
--CUSTOM2 value2

Note the double-dash option specifier; this is required by the 'getResolvedOptions' function.
This was non-obvious from the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment