orellabac/sma-assistant-prompts.yaml

## sma-assistant-prompts.yaml
prompts:
  - name: "DataFrameReader.load"
    prompt: |
      Explain this tag:
      ------ TAG
      @@firstline

      and how it affects the following line of code:
      ------
       **CODE**
      @@secondline
      ------
      Consider these notes:
      ------
      **NOTES**
      The DataFrameReader api in Spark follows a pattern of
      df.read.option(…).option(…).format(format_name).load(path,options)
      and the SnowparkAPI uses a pattern of
      df.read.option(...).option(...).[csv|parquet|json|orc|...](path,options)
      when migrating these statements you will need to adjust the order of the calls and replace the load method with the appropriate format method. Also some options names might
      be different.
      For example:
      * in Spark the option name for skipping a header is "header",false
      * while in Snowpark it is "SKIP_HEADER",1
      * in Spark you use inferSchema and in snowpark INFER_SCHEMA true and PARSE_HEADER true
      * in Spark you use nullValue and in snowpark you use NULL_IF
      * in spark you use SEP and in snowpark you use FIELD_DELIMITER
      * PARSE_HEADER and SKIP_HEADER options are not used together
      * the default for FIELD_DELIMITER and SEP is ',' is not specified in the original code you dont need to pass it
      * In snowpark you can specify a FILE_FORMAT but when you translate from .load to .csv|.json|.parquet and others
      you dont need to specify it
      ------
      And based on that information provided some rewrite suggestions.
    matcher: ".*pyspark.sql.readwriter.DataFrameReader.load.*"

  - name: "RDD.map"
    prompt: |
      Explain this tag:
      ------ TAG
      @@firstline

      and how it affects the following line of code:
      ------
       **CODE**
      @@secondline
      ------
      Consider these notes:
      ------
      **NOTES**
      Spark has a concept of RDD which is not present in
      In Spark you can use the map function to transform an RDD,
      In snowpark you need to use dataframe operations.
      The following examples show how these operations can be translated from Spark to Snowpark.
      `rdd=df.rdd.map(lambda x: (x[0]+","+x[1],x[2],x[3]*2))`
      can be rewritten as:
      `df.select(df[0]+","+df[1],df[2],df[3]*2)`
      operations like:
      `rdd2=rdd.reduceByKey(lambda a,b: a+b)`
      can be rewritten as:
      `df.groupBy(df[0]).agg(sum(df[1]))`
      operations like:
      `rdd3=rdd2.filter(lambda x: x[1]>10)`
      can be rewritten as:
      `df.filter(df[1]>10)`
      operations like:
      `rdd.flatMap(lambda x: range(1, x))`
      Return a new RDD by first applying a function to all elements of this
      RDD, and then flattening the results.
      A similar approach in snowpark will
      `df=df.select(sequence(lit(1), df[0]).alias("flatmap"))`
      `df=df.select(explode("flatmap"))`
      ------
      And based on that information provided some rewrite suggestions.
    matcher: ".*pyspark.rdd.RDD..*map.*"
	prompts:
	- name: "DataFrameReader.load"
	prompt: \|
	Explain this tag:
	------ TAG
	@@firstline

	and how it affects the following line of code:
	------
	CODE
	@@secondline
	------
	Consider these notes:
	------
	NOTES
	The DataFrameReader api in Spark follows a pattern of
	df.read.option(…).option(…).format(format_name).load(path,options)
	and the SnowparkAPI uses a pattern of
	df.read.option(...).option(...).[csv\|parquet\|json\|orc\|...](path,options)
	when migrating these statements you will need to adjust the order of the calls and replace the load method with the appropriate format method. Also some options names might
	be different.
	For example:
	* in Spark the option name for skipping a header is "header",false
	* while in Snowpark it is "SKIP_HEADER",1
	* in Spark you use inferSchema and in snowpark INFER_SCHEMA true and PARSE_HEADER true
	* in Spark you use nullValue and in snowpark you use NULL_IF
	* in spark you use SEP and in snowpark you use FIELD_DELIMITER
	* PARSE_HEADER and SKIP_HEADER options are not used together
	* the default for FIELD_DELIMITER and SEP is ',' is not specified in the original code you dont need to pass it
	* In snowpark you can specify a FILE_FORMAT but when you translate from .load to .csv\|.json\|.parquet and others
	you dont need to specify it
	------
	And based on that information provided some rewrite suggestions.
	matcher: ".pyspark.sql.readwriter.DataFrameReader.load."

	- name: "RDD.map"
	prompt: \|
	Explain this tag:
	------ TAG
	@@firstline

	and how it affects the following line of code:
	------
	CODE
	@@secondline
	------
	Consider these notes:
	------
	NOTES
	Spark has a concept of RDD which is not present in
	In Spark you can use the map function to transform an RDD,
	In snowpark you need to use dataframe operations.
	The following examples show how these operations can be translated from Spark to Snowpark.
	`rdd=df.rdd.map(lambda x: (x[0]+","+x[1],x[2],x[3]*2))`
	can be rewritten as:
	`df.select(df[0]+","+df[1],df[2],df[3]*2)`
	operations like:
	`rdd2=rdd.reduceByKey(lambda a,b: a+b)`
	can be rewritten as:
	`df.groupBy(df[0]).agg(sum(df[1]))`
	operations like:
	`rdd3=rdd2.filter(lambda x: x[1]>10)`
	can be rewritten as:
	`df.filter(df[1]>10)`
	operations like:
	`rdd.flatMap(lambda x: range(1, x))`
	Return a new RDD by first applying a function to all elements of this
	RDD, and then flattening the results.
	A similar approach in snowpark will
	`df=df.select(sequence(lit(1), df[0]).alias("flatmap"))`
	`df=df.select(explode("flatmap"))`
	------
	And based on that information provided some rewrite suggestions.
	matcher: ".pyspark.rdd.RDD..map.*"