Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A scala function to parse a URL (parse a URI) into sections. Useful for processing log files to extract a core domain for aggregations and analytics.
import scala.util.matching.Regex
/**
* parse a URI / URL into a core domain or the trailing path
* e.g.
* the core domain of https://wwww.epicwebsite.com.au/path/to/asset/cute_cat_picture.png
* is epicwebsite.com.au
* returns Option type so you might need to use getOrElse(something)
* e.g. urlParse(url,1).getOrElse(somedefault)
*/
def urlParse(url: String, urlSection: Int):Option[String] = {
require(Set(0,1,2) contains urlSection,
s"urlSection out of bounds. Given $urlSection but must be one of 0 (full url), 1 (core domain) or 2 (path)")
val urlPattern = new Regex("""^(?:https?:\/\/)?(?:www\.)?([^:\/\n\?\=@]+)(\/.*)?""")
try {
Some(urlPattern.findFirstMatchIn(url).get.group(urlSection))
} catch {
case e: Exception =>
println(e.getMessage)
None
}
}
val test = """https://wwww.epicwebsite.com.au/path/to/asset/cute_cat_picture.png"""
val zero = urlParse(test,0) // should return the full url. If not then the regex pattern isn't matching everything
val one = urlParse(test,1) // should return core domain
val two = urlParse(test,2) // should return the trailing path after the core domain
val three = urlParse(test,3) // should throw require error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.