Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Created January 12, 2018 15:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/a60322146d5aa654acb280a3400fa850 to your computer and use it in GitHub Desktop.
Save dgadiraju/a60322146d5aa654acb280a3400fa850 to your computer and use it in GitHub Desktop.
// Function to get top n priced products using Scala collections API
val products = sc.textFile("/public/retail_db/products")
val productsMap = products.
filter(product => product.split(",")(4) != "").
map(product => (product.split(",")(1).toInt, product))
val productsGroupByCategory = productsMap.groupByKey
def getTopNPricedProducts(productsIterable: Iterable[String], topN: Int): Iterable[String] = {
val productPrices = productsIterable.map(p => p.split(",")(4).toFloat).toSet
val topNPrices = productPrices.toList.sortBy(p => -p).take(topN)
val productsSorted = productsIterable.toList.sortBy(product => -product.split(",")(4).toFloat)
val minOfTopNPrices = topNPrices.min
val topNPricedProducts = productsSorted.takeWhile(product => product.split(",")(4).toFloat >= minOfTopNPrices)
topNPricedProducts
}
val productsIterable = productsGroupByCategory.first._2
getTopNPricedProducts(productsIterable, 3).foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment