Skip to content

Instantly share code, notes, and snippets.

@saiteja09
Created November 6, 2017 16:17
Show Gist options
  • Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
Save saiteja09/2af441049f253d90e7677fb1f2db50cc to your computer and use it in GitHub Desktop.
Glue Job Script for reading data from DataDirect Salesforce JDBC driver and write it to S3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
##Read Data from Salesforce using DataDirect JDBC driver in to DataFrame
source_df = spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=<token>").option("dbtable", "SFORCE.OPPORTUNITY").option("driver", "com.ddtek.jdbc.sforce.SForceDriver").option("user", "user@mail.com").option("password", "pass123").load()
job.init(args['JOB_NAME'], args)
##Convert DataFrames to AWS Glue's DynamicFrames Object
dynamic_dframe = DynamicFrame.fromDF(source_df, glueContext, "dynamic_df")
##Write Dynamic Frames to S3 in CSV format. You can write it to any rds/redshift, by using the connection that you have defined previously in Glue
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dynamic_dframe, connection_type = "s3", connection_options = {"path": "s3://glueuserdata"}, format = "csv", transformation_ctx = "datasink4")
job.commit()
@Sid-19
Copy link

Sid-19 commented Oct 3, 2019

Hi,
How can i copy all the tables in salesforce through this script to s3?
Thanks in advance!

@saiteja09
Copy link
Author

Technically, you can. You would have to iterate through all the tables and load it up. You might have to change the script a bit though.

@Sid-19
Copy link

Sid-19 commented Oct 4, 2019

Can You please Help me with that since I am very new to AWS glue .

i tired :

query ="show tables"

for i in query:
source_df =spark.read.format("jdbc").option("url","jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=xxxl").option("StmtCallLimit", "0").option("dbtable", i)......load()

@prateekpuresoftware
Copy link

can you please help me read the data in csv and write the dataframe in salesforce table

@prateekpuresoftware
Copy link

I tried this code
val df = sparkSession.read.format("com.databricks.spark.csv").option("header", "true").load("your bucket location")
df.printSchema()

df.write.format("com.springml.spark.salesforce").option("login","https://test.salesforce.com/").option("username", "username").option("password","password+token").option("datasetName", "tableName").save()

I got the issue inavliSfObjectfault error

@Zee9Team
Copy link

Hello,
have you ever tried with custom OpenEdge DB?
What part of the script should change in this case?
Progress provides no information.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment