Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Glue Job Script for reading data from DataDirect Salesforce JDBC driver and write it to S3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
##Read Data from Salesforce using DataDirect JDBC driver in to DataFrame
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=<token>").option("dbtable", "SFORCE.OPPORTUNITY").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "user@mail.com").option("password", "pass123").load()
job.init(args['JOB_NAME'], args)
##Convert DataFrames to AWS Glue's DynamicFrames Object
dynamic_dframe = DynamicFrame.fromDF(source_df, glueContext, "dynamic_df")
##Write Dynamic Frames to S3 in CSV format. You can write it to any rds/redshift, by using the connection that you have defined previously in Glue
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dynamic_dframe, connection_type = "s3", connection_options = {"path": "s3://glueuserdata"}, format = "csv", transformation_ctx = "datasink4")
job.commit()
@rudresh-ajgaonkar

This comment has been minimized.

Copy link

rudresh-ajgaonkar commented Feb 6, 2018

Hey, Did you try to read the database via Scala in AWS Glue? If yes it would be great if you could share some code sample.

@saiteja09

This comment has been minimized.

Copy link
Owner Author

saiteja09 commented Mar 26, 2018

Rudresh,
Unfortunately, I haven't tried it with Scala.

@undertruck

This comment has been minimized.

Copy link

undertruck commented Jun 30, 2018

Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?

@SachinThanas

This comment has been minimized.

Copy link

SachinThanas commented Oct 16, 2018

I need to keep the log of above job in another file and db .How can I keep the log in csv file?Can anyone please help me on this

@saiteja09

This comment has been minimized.

Copy link
Owner Author

saiteja09 commented Oct 24, 2018

Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?

From what I have read, you can have your query in the option 'dbtable'
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=").option("dbtable", "your query").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "user@mail.com").option("password", "pass123").load()

@Sid-19

This comment has been minimized.

Copy link

Sid-19 commented Oct 3, 2019

Hi,
How can i copy all the tables in salesforce through this script to s3?
Thanks in advance!

@saiteja09

This comment has been minimized.

Copy link
Owner Author

saiteja09 commented Oct 3, 2019

Technically, you can. You would have to iterate through all the tables and load it up. You might have to change the script a bit though.

@Sid-19

This comment has been minimized.

Copy link

Sid-19 commented Oct 4, 2019

Can You please Help me with that since I am very new to AWS glue .

i tired :

query ="show tables"

for i in query:
source_df =spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=xxxl").option("StmtCallLimit", "0").option("dbtable", i)......load()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.