This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def splitDataFrameList(df,target_column,separator): | |
''' df = dataframe to split, | |
target_column = the column containing the values to split | |
separator = the symbol used to perform the split | |
returns: a dataframe with each entry for the target column separated, with each element moved into a new row. | |
The values in the other columns are duplicated across the newly divided rows. | |
''' | |
def splitListToRows(row,row_accumulator,target_column,separator): | |
split_row = row[target_column].split(separator) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def findNull(row:Row):String = { | |
if (row.anyNull) { | |
val indices = (0 to row.length-1).toArray.filter(i => row.isNullAt(i)) | |
indices.mkString(",") | |
} | |
else "-1" | |
} | |
sqlContext.udf.register("findNull", findNull _) | |
df = df.withColumn("MissingGroups",callUDF("findNull",struct(df.columns.map(df(_)) : _*))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-01-26T17:26:00.000Z","Disposal Date":"2017-01-26T22:23:35.035Z","Disposal Site":"Toxfree Tom Price","Quantity":70.580847884,"Route":"http:\/\/rawgit.com\/jlln\/a66029dd454096408ff3b95c3ef36a73\/raw\/c8251d6f5f99271dab57398835d47da07c141a1e\/marandoo_route.json","Waste":"Scrap Metal"},{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-04-19T07:44:00.000Z","Disposal Date":"2017-04-19T10:16:49.867Z","Disposal Site":"Toxfree Tom Price","Quantity":1146.433313241,"Route":"http:\/\/rawgit.com\/jlln\/a66029dd454096408ff3b95c3ef36a73\/raw\/c8251d6f5f99271dab57398835d47da07c141a1e\/marandoo_route.json","Waste":"Paper"},{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-01-23T14:04:00.000Z","Disposal Date":"2017-01-23T21:02:21.944Z","Disposal Site":"Toxfree Tom Price","Quantity":430.0891917931,"Route":"http:\/\/rawgit.com\/jlln\/a660 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import scala.collection.JavaConverters._ | |
import org.apache.spark.sql.types.{StructType,StructField,StringType} | |
import org.apache.spark.sql.Row | |
def identityMatrix(n:Int):Array[Array[String]]=Array.tabulate(n,n)((x,y) => if(x==y) "1" else "0") | |
def encodeStringOneHot(table:org.apache.spark.sql.DataFrame,column:String) = { | |
//Accepts the dataframe and the target column name. Returns a new dataframe in which the target column has been replaced with a one-hot/dummy encoding. | |
table.registerTempTable("temp") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'{"The Cinema-Jill\'s House": 2630, "The Cinema-The Bird": 2442.4, "The Cinema-Mount Street": 4963.4, "The Cinema-Petition": 2686.9, "The Cinema-Print Hall": 3280.4, "Jill\'s House-The Cinema": 2865, "Jill\'s House-The Bird": 3145.4, "Jill\'s House-Mount Street": 5128.4, "Jill\'s House-Petition": 4125.1, "Jill\'s House-Print Hall": 4266.1, "The Bird-The Cinema": 2658.6, "The Bird-Jill\'s House": 3304.5, "The Bird-Mount Street": 2604.6, "The Bird-Petition": 795.8, "The Bird-Print Hall": 921.7, "Mount Street-The Cinema": 4524.9, "Mount Street-Jill\'s House": 4703.7, "Mount Street-The Bird": 2635.7, "Mount Street-Petition": 1843.1, "Mount Street-Print Hall": 1553.6, "Petition-The Cinema": 2685.4, "Petition-Jill\'s House": 4137.1, "Petition-The Bird": 796.2, "Petition-Mount Street": 2442.6, "Petition-Print Hall": 759.6, "Print Hall-The Cinema": 3268.5, "Print Hall-Jill\'s House": 4505.1, "Print Hall-The Bird": 1379.3, "Print Hall-Mount Street": 1964.7, "Print Hall-Petition": 586.7}' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'{"The Cinema-Jill\'s House": [[115.873041, -31.933399], [115.873084, -31.933423], [115.873001, -31.933523], [115.872214, -31.934468], [115.872172, -31.934532], [115.872109, -31.934476], [115.87209, -31.934461], [115.871733, -31.934147], [115.871664, -31.934041], [115.871475, -31.933874], [115.871207, -31.93365], [115.87037, -31.932928], [115.869605, -31.932268], [115.86886, -31.931624], [115.86796, -31.930862], [115.867442, -31.930412], [115.86704, -31.930764], [115.866678, -31.931114], [115.866585, -31.93122], [115.866509, -31.931307], [115.866386, -31.931307], [115.866128, -31.931306], [115.862526, -31.931299], [115.859753, -31.931285], [115.859337, -31.931281], [115.85919, -31.93128], [115.859187, -31.930709], [115.859106, -31.930709], [115.858906, -31.930649], [115.858646, -31.930524], [115.858145, -31.930267], [115.857675, -31.930026], [115.857168, -31.929767], [115.856875, -31.929619], [115.856715, -31.929524], [115.854249, -31.92827], [115.853432, -31.927853], [115.853182, -31.92773], [115.852981, -31 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[{"latitude":-22.57949,"longitude":118.01649},{"latitude":-22.57956,"longitude":118.01641},{"latitude":-22.57964,"longitude":118.01629},{"latitude":-22.5797,"longitude":118.01618},{"latitude":-22.57985,"longitude":118.01598},{"latitude":-22.57998,"longitude":118.01579},{"latitude":-22.58009,"longitude":118.01566},{"latitude":-22.5802,"longitude":118.01553},{"latitude":-22.58031,"longitude":118.01549},{"latitude":-22.58039,"longitude":118.01552},{"latitude":-22.58047,"longitude":118.01562},{"latitude":-22.58057,"longitude":118.01572},{"latitude":-22.58078,"longitude":118.01593},{"latitude":-22.58088,"longitude":118.01604},{"latitude":-22.58118,"longitude":118.01623},{"latitude":-22.58143,"longitude":118.01636},{"latitude":-22.58206,"longitude":118.01661},{"latitude":-22.58329,"longitude":118.01708},{"latitude":-22.58426,"longitude":118.01744},{"latitude":-22.58516,"longitude":118.01774},{"latitude":-22.5858,"longitude":118.01794},{"latitude":-22.587,"longitude":118.01832},{"latitude":-22.58797,"longitude":118. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[{"latitude":-22.67651,"longitude":117.5986},{"latitude":-22.67541,"longitude":117.60056},{"latitude":-22.67518,"longitude":117.60106},{"latitude":-22.67496,"longitude":117.60164},{"latitude":-22.67457,"longitude":117.60283},{"latitude":-22.67399,"longitude":117.60479},{"latitude":-22.67292,"longitude":117.60904},{"latitude":-22.67266,"longitude":117.60999},{"latitude":-22.6724,"longitude":117.61063},{"latitude":-22.67207,"longitude":117.61123},{"latitude":-22.67167,"longitude":117.61178},{"latitude":-22.67119,"longitude":117.61226},{"latitude":-22.67068,"longitude":117.61267},{"latitude":-22.67033,"longitude":117.61289},{"latitude":-22.66996,"longitude":117.61308},{"latitude":-22.6696,"longitude":117.6132},{"latitude":-22.66938,"longitude":117.61328},{"latitude":-22.66897,"longitude":117.61337},{"latitude":-22.66855,"longitude":117.61342},{"latitude":-22.66792,"longitude":117.61341},{"latitude":-22.6675,"longitude":117.61336},{"latitude":-22.66668,"longitude":117.61314},{"latitude":-22.66594,"longitude":117. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-03-18T18:01:00.000Z","Disposal Date":"2017-03-19T03:49:47.776Z","Disposal Site":"Toxfree Tom Price","Quantity":161.8588019834,"Route":"marandoo_to_tom_price.csv","Waste":"Scrap Metal"},{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-04-15T23:56:00.000Z","Disposal Date":"2017-04-16T01:13:04.611Z","Disposal Site":"Toxfree Tom Price","Quantity":532.6795696387,"Route":"marandoo_to_tom_price.csv","Waste":"Paper"},{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-01-17T08:05:00.000Z","Disposal Date":"2017-01-17T16:46:13.680Z","Disposal Site":"Toxfree Tom Price","Quantity":244.271272112,"Route":"marandoo_to_tom_price.csv","Waste":"Paper"},{"latitude":-22.57949,"longitude":118.01649,"Collection Site":"Marandoo","Collection Date":"2017-10-01T10:45:00.000Z","Disposal Date":"2017-10-01T23:49:44.563Z","Disposal Site":"Toxfree Tom |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[{"Unnamed: 0":0,"DisposalProvider":"Shire of Roebourne Landfill","Outcome":"Landfill","PurchaseOrder":1590,"Region":"Coastal","Site":"2 Mile","Stream":"Solid Waste","Waste":"GENERAL ","Date":"2015-01-01","SiteCoordinates":null,"TotalCost":225.8,"TotalQuantity":4.0,"SiteCoordinates.1":null,"CollectionType":"5.05 3m3 Frontlift - General Waste","UnitCost":56.45,"Year":2015,"Month":1,"YearAndMonth":"2015-1"},{"Unnamed: 0":1,"DisposalProvider":"Shire of Roebourne Landfill","Outcome":"Landfill","PurchaseOrder":2043,"Region":"Coastal","Site":"2 Mile","Stream":"Solid Waste","Waste":"GENERAL ","Date":"2016-01-01","SiteCoordinates":null,"TotalCost":1114.6,"TotalQuantity":7.0,"SiteCoordinates.1":null,"CollectionType":"5.05 3m3 Frontlift - General Waste","UnitCost":159.2285714286,"Year":2016,"Month":1,"YearAndMonth":"2016-1"},{"Unnamed: 0":2,"DisposalProvider":"Shire of Roebourne Landfill","Outcome":"Landfill","PurchaseOrder":2508,"Region":"Coastal","Site":"2 Mile","Stream":"Solid Waste","Waste":"GENERAL ","Date":"2017- |
NewerOlder