Skip to content

Instantly share code, notes, and snippets.

@rzykov
Last active October 6, 2021 12:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rzykov/3a3a902b3cda5a4aaf0386a12043cb98 to your computer and use it in GitHub Desktop.
Save rzykov/3a3a902b3cda5a4aaf0386a12043cb98 to your computer and use it in GitHub Desktop.
DataAnalysisIntro2.scala
//CODE:
//The most popular category.
dataAov.map { x => x.categoryId } // select the categoryId field
.countByValue() // calculate how often each categoryId appears
.toSeq
.sortBy( - _._2) // sort by frequency in descending order
.take(10) //take the top 10 records
//OUT:
//format: (categoryId, count)
ArrayBuffer(
(314,3068),
(132,2229),
(128,1770), // use this categoryid (128)
(270,1483),
(139,1379),
(107,1366),
(177,1311),
(226,1268),
(103,1259),
(127,1204))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment