Skip to content

Instantly share code, notes, and snippets.

@mtrcn
Created January 28, 2019 10:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mtrcn/6fdf5d1b279cad4d419cb29e1dd82732 to your computer and use it in GitHub Desktop.
Save mtrcn/6fdf5d1b279cad4d419cb29e1dd82732 to your computer and use it in GitHub Desktop.
def compareFields(jdbcDF, domainDf):
print "Missing Columns:"
print [col_name for col_name in [x.lower().replace(" ", "") for x in jdbcDF.columns] if not col_name in domainDf.columns]
print "Redundant Columns:"
print [col_name for col_name in domainDf.columns if not col_name in [x.lower().replace(" ", "") for x in jdbcDF.columns]]
mismatched_types = dict()
for col in (col_name for col_name in jdbcDF.columns if col_name.lower().replace(" ", "") in requests_consultaanvraag_df.columns):
colLowerName = col.lower().replace(" ", "")
destinationColType = str(domainDf.schema.fields[domainDf.columns.index(colLowerName)].dataType)
sourceColType = str(jdbcDF.schema.fields[jdbcDF.columns.index(col)].dataType)
if sourceColType != destinationColType:
if sourceColType in mismatched_types:
mismatched_types[sourceColType].append(colLowerName)
else:
mismatched_types[sourceColType] = [colLowerName]
print "Source Column Count: " + str(len(jdbcDF.columns)) + " Destination Column Count: " + str(len(domainDf.columns))
print "Mismatched Types:"
print mismatched_types
@mtrcn
Copy link
Author

mtrcn commented Jan 28, 2019

Connect SQL Server:

jdbcDF = spark.read.format("jdbc").option("url", "jdbc:sqlserver://NLC1PRODCI01:1433;databasename=CustomerIntelligence;integratedSecurity=true").option("dbtable", "domain.Requests_ConsultAanvraag").load()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment