Skip to content

Instantly share code, notes, and snippets.

@ole-boss
ole-boss / Big Query Google Analytics sessions export including site speed metrics.sql
Last active June 13, 2020 14:41
This script helps you creating sessions from Google Analytics raw data on hit level in Google Big Query. The export will also include some metrics related to site speed based on Google's Latency Tracking KPIs.
SELECT
first_sessions.sid AS sessionId,
visitorId,
first_transactions.transactionId AS transactionId,
timestamp,
deviceCategory,
landingPage,
pageviews,
timeOnSite,
channel,
@joshlk
joshlk / faster_toPandas.py
Last active May 15, 2023 13:48
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
import org.apache.spark.sql.types.SQLUserDefinedType
@SQLUserDefinedType(udt = classOf[ElementWithCountUDT])
case class ElementWithCount(element:String, count:Int) extends Serializable {
override def toString: String = {
Seq(
element,
count
@ezhulenev
ezhulenev / spark-thred-safe.scala
Created August 11, 2015 22:16
Thread-safe Spark Sql Context
object ServerSparkContext {
private[this] lazy val _sqlContext = {
val conf = new SparkConf()
.setAppName("....")
val sc = new SparkContext(conf)
// TODO: Bug in Spark: http://stackoverflow.com/questions/30323212
val ctx = new HiveContext(sc)
ctx.setConf("spark.sql.hive.convertMetastoreParquet", "false")
@wenzhixin
wenzhixin / ubuntu14.04-command-line-install-android-sdk
Last active January 16, 2024 21:15
Ubuntu 14.04 command line install android sdk
# install openjdk
sudo apt-get install openjdk-7-jdk
# download android sdk
wget http://dl.google.com/android/android-sdk_r24.2-linux.tgz
tar -xvf android-sdk_r24.2-linux.tgz
cd android-sdk-linux/tools
# install all sdk packages
@rxin
rxin / df.py
Last active January 26, 2017 00:44
DataFrame simple aggregation performance benchmark
data = sqlContext.load("/home/rxin/ints.parquet")
data.groupBy("a").agg(col("a"), avg("num")).collect()
@jackgolding
jackgolding / python
Created January 27, 2015 09:04
[APSCHEDULER] cron jobs administrating a scheduled job
from apscheduler.jobstores.base import JobLookupError
from apscheduler.schedulers.background import BackgroundScheduler
import time
def hello():
print(time.localtime().tm_sec)
def kill_hello(scheduler):
@mbedward
mbedward / gist:6e3dbb232bafec0792ba
Last active September 26, 2021 14:08
Scala macro to convert between a case class instance and a Map of constructor parameters. Developed by Jonathan Chow (see http://blog.echo.sh/post/65955606729/exploring-scala-macros-map-to-case-class-conversion for description and usage). This version simply updates Jonathan's code to Scala 2.11.2
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
trait Mappable[T] {
def toMap(t: T): Map[String, Any]
def fromMap(map: Map[String, Any]): T
}
object Mappable {
@telendt
telendt / gist:1972797
Created March 4, 2012 12:31
multiprocess zgrep
$ time find /opt/local/share/emacs/23.4/lisp/ -type f -name \*.gz -exec zgrep --color=yes -H -n -e "conf-mode" {} \;
real 0m12.239s
user 0m7.327s
sys 0m6.804s
$ time find /opt/local/share/emacs/23.4/lisp/ -type f -name \*.gz -exec zgrep --color=yes -H -n -e "conf-mode" {} +
real 0m8.574s
user 0m4.950s
sys 0m5.995s
@endolith
endolith / gcd_and_lcm.py
Last active June 22, 2022 23:33
GCD and LCM functions in Python for several numbers
# Greatest common divisor of 1 or more numbers.
from functools import reduce
def gcd(*numbers):
"""
Return the greatest common divisor of 1 or more integers
Examples
--------