Skip to content

Instantly share code, notes, and snippets.

@manuzhang
Last active July 29, 2019 23:16
Show Gist options
  • Save manuzhang/cc1a0cc49638f0f2a30af2e2d06dc6d3 to your computer and use it in GitHub Desktop.
Save manuzhang/cc1a0cc49638f0f2a30af2e2d06dc6d3 to your computer and use it in GitHub Desktop.
Analyzing my overcast data
import java.time.ZonedDateTime
import java.time.format.DateTimeFormatter
import java.time.temporal.ChronoUnit
import $ivy.`com.lihaoyi::requests:0.1.8`
import $ivy.`org.scala-lang.modules::scala-xml:1.2.0`
import $ivy.`org.seleniumhq.selenium:selenium-chrome-driver:3.0.1`
import org.openqa.selenium.JavascriptExecutor
import org.openqa.selenium.chrome.{ChromeDriver, ChromeOptions}
import requests.TimeoutException
import scala.util.{Failure, Success, Try}
import scala.xml.XML
val doc = XML.loadFile("overcast.opml")
System.setProperty("webdriver.chrome.driver", "/Users/doriadong/bin/chromedriver")
val options = new ChromeOptions()
options.addArguments("--proxy-server=http://127.0.0.1:1087")
val driver = new ChromeDriver(options)
(doc \\ "outline").foreach {
outline =>
if ((outline \@ "type") == "rss") {
val podcast = outline \@ "text"
val total = outline.nonEmptyChildren.map { node =>
if (node \@ "played" == "1") {
val title = node \@ "title"
val url = node \@ "overcastUrl"
val source = node \@ "enclosureUrl"
val listenStr = node \@ "userUpdatedDate"
val pattern = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ssXXX")
val listenTime = ZonedDateTime.from(pattern.parse(listenStr))
val today = ZonedDateTime.now()
val from = today.minusDays(today.getDayOfWeek.getValue - 1).truncatedTo(ChronoUnit.DAYS)
val status = Try {
requests.get(source)
} match {
case Success(resp) => resp.statusCode
case Failure(e) =>
if (e.isInstanceOf[TimeoutException]) {
200
} else {
e.printStackTrace()
400
}
}
// if (listenTime.isAfter(from) && !listenTime.isAfter(today)) {
if (status != 400) {
driver.get(url)
var duration = 0.0
var n = 0
var skip = false
val executor = driver.asInstanceOf[JavascriptExecutor]
while (duration == 0.0 && !skip) {
Thread.sleep(100 * n)
Try {
duration = executor.executeScript(
"return document.getElementById('audioplayer').duration").asInstanceOf[Double]
} match {
case Success(_) =>
case Failure(e) =>
println(s"Failed to get duration of $url from $podcast becasue of ${e.getMessage}")
skip = true
}
n += 1
}
duration
// }
} else {
println(s"$source of $url from $podcast not found")
0.0
}
} else {
0.0
}
}.sum
os.write.append(os.pwd / "overcast_report.csv", s"$podcast,$total\n")
}
}
driver.quit()
@manuzhang
Copy link
Author

manuzhang commented Jul 29, 2019

Output till 2019-07-21T20:00:00-04:00

人间指南,3912.803438
UX Coffee 设计咖,2310.0605
博物志,35208.277884
声东击西,22266.437738
Today, Explained,5860.51907
交差点,7225.522134
Casticle,1732.135438
Software Engineering Daily,61499.555286999996
Exponent,2796.486531
Acquired,64049.255988000004
迟早更新,63871.895404
The Changelog,19002.148572000002
Kubernetes Podcast from Google,16027.376264
36氪·硅谷早知道,34151.097058
Reply All,3923.278178
不可理论,9427.822544999999
Fork It,17688.531515
提前怀旧,6405.19875
Streaming Audio: a Confluent podcast about Apache Kafka,3059.939313
Freakonomics Radio,9228.015816
ChinaEconTalk,4626.729796
99% Invisible,2310.321633
文化土豆 Culture Potato,123420.36582800004
The Daily,174840.69431899997
硬影像,3338.16
一天世界,15297.880814999999
Techmeme Ride Home,115156.18945699993
不太重要,3672.171438
Data Engineering Podcast,57924.67591800001
所建所闻,2924.674756
疯投圈,27143.681375
忽左忽右,34029.455035
Byte.Coffee,7958.126128
Recode Decode,23730.964217999997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment