Casey Juanxi Li caseyliqb

## PySpark Dataframes from Scratch.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              1 comment
            
          
          
            
              
              0 stars
            
          
        
        
          
              
          
          
            
                caseyliqb
                / PySpark Dataframes from Scratch.md
            
            
              Last active
              February 27, 2020 12:13
            
          
        
      
        
  
      
    Why is this so hard to remember?

from pyspark.sql.types import StructType, StructField, StringType

rdd = sc.parallelize([("moo this has stopwords b", "bat this one does not"),
                      ("apple orange banana", "cookie jar bla la")])

schema = StructType([StructField('entity', StringType(), True),
 StructField('cleaned_entity', StringType(), True),