Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
A scala function to parse a URL (parse a URI) into sections. Useful for processing log files to extract a core domain for aggregations and analytics.
import scala.util.matching.Regex
* parse a URI / URL into a core domain or the trailing path
* e.g.
* the core domain of
* is
* returns Option type so you might need to use getOrElse(something)
* e.g. urlParse(url,1).getOrElse(somedefault)
def urlParse(url: String, urlSection: Int):Option[String] = {
require(Set(0,1,2) contains urlSection,
s"urlSection out of bounds. Given $urlSection but must be one of 0 (full url), 1 (core domain) or 2 (path)")
val urlPattern = new Regex("""^(?:https?:\/\/)?(?:www\.)?([^:\/\n\?\=@]+)(\/.*)?""")
try {
} catch {
case e: Exception =>
val test = """"""
val zero = urlParse(test,0) // should return the full url. If not then the regex pattern isn't matching everything
val one = urlParse(test,1) // should return core domain
val two = urlParse(test,2) // should return the trailing path after the core domain
val three = urlParse(test,3) // should throw require error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.