Skip to content

Instantly share code, notes, and snippets.

@saiteja09
Created November 6, 2017 16:17
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
Glue Job Script for reading data from DataDirect Salesforce JDBC driver and write it to S3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
##Read Data from Salesforce using DataDirect JDBC driver in to DataFrame
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=<token>").option("dbtable", "SFORCE.OPPORTUNITY").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "user@mail.com").option("password", "pass123").load()
job.init(args['JOB_NAME'], args)
##Convert DataFrames to AWS Glue's DynamicFrames Object
dynamic_dframe = DynamicFrame.fromDF(source_df, glueContext, "dynamic_df")
##Write Dynamic Frames to S3 in CSV format. You can write it to any rds/redshift, by using the connection that you have defined previously in Glue
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dynamic_dframe, connection_type = "s3", connection_options = {"path": "s3://glueuserdata"}, format = "csv", transformation_ctx = "datasink4")
job.commit()
@rudresh-ajgaonkar
Copy link

Hey, Did you try to read the database via Scala in AWS Glue? If yes it would be great if you could share some code sample.

@saiteja09
Copy link
Author

Rudresh,
Unfortunately, I haven't tried it with Scala.

@undertruck
Copy link

Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?

@SachinThanas
Copy link

SachinThanas commented Oct 16, 2018

I need to keep the log of above job in another file and db .How can I keep the log in csv file?Can anyone please help me on this

@saiteja09
Copy link
Author

Hi, new to Glue and Spark. Where is the actual query in the code? If I want to download last one month opportunities, how can I change this code? Is it possible to sync Salesforce data using this approach?

From what I have read, you can have your query in the option 'dbtable'
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=").option("dbtable", "your query").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "user@mail.com").option("password", "pass123").load()

@Sid-19
Copy link

Sid-19 commented Oct 3, 2019

Hi,
How can i copy all the tables in salesforce through this script to s3?
Thanks in advance!

@saiteja09
Copy link
Author

Technically, you can. You would have to iterate through all the tables and load it up. You might have to change the script a bit though.

@Sid-19
Copy link

Sid-19 commented Oct 4, 2019

Can You please Help me with that since I am very new to AWS glue .

i tired :

query ="show tables"

for i in query:
source_df =spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=xxxl").option("StmtCallLimit", "0").option("dbtable", i)......load()

@prateekpuresoftware
Copy link

can you please help me read the data in csv and write the dataframe in salesforce table

@prateekpuresoftware
Copy link

I tried this code
val df = sparkSession.read.format("com.databricks.spark.csv").option("header", "true").load("your bucket location")
df.printSchema()

df.write.format("com.springml.spark.salesforce").option("login","https://test.salesforce.com/").option("username", "username").option("password","password+token").option("datasetName", "tableName").save()

I got the issue inavliSfObjectfault error

@Zee9Team
Copy link

Hello,
have you ever tried with custom OpenEdge DB?
What part of the script should change in this case?
Progress provides no information.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment