Skip to content

Instantly share code, notes, and snippets.

@endolith
endolith / gcd_and_lcm.py
Last active June 22, 2022 23:33
GCD and LCM functions in Python for several numbers
# Greatest common divisor of 1 or more numbers.
from functools import reduce
def gcd(*numbers):
"""
Return the greatest common divisor of 1 or more integers
Examples
--------
@telendt
telendt / gist:1972797
Created March 4, 2012 12:31
multiprocess zgrep
$ time find /opt/local/share/emacs/23.4/lisp/ -type f -name \*.gz -exec zgrep --color=yes -H -n -e "conf-mode" {} \;
real 0m12.239s
user 0m7.327s
sys 0m6.804s
$ time find /opt/local/share/emacs/23.4/lisp/ -type f -name \*.gz -exec zgrep --color=yes -H -n -e "conf-mode" {} +
real 0m8.574s
user 0m4.950s
sys 0m5.995s
@mbedward
mbedward / gist:6e3dbb232bafec0792ba
Last active September 26, 2021 14:08
Scala macro to convert between a case class instance and a Map of constructor parameters. Developed by Jonathan Chow (see http://blog.echo.sh/post/65955606729/exploring-scala-macros-map-to-case-class-conversion for description and usage). This version simply updates Jonathan's code to Scala 2.11.2
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
trait Mappable[T] {
def toMap(t: T): Map[String, Any]
def fromMap(map: Map[String, Any]): T
}
object Mappable {
@jackgolding
jackgolding / python
Created January 27, 2015 09:04
[APSCHEDULER] cron jobs administrating a scheduled job
from apscheduler.jobstores.base import JobLookupError
from apscheduler.schedulers.background import BackgroundScheduler
import time
def hello():
print(time.localtime().tm_sec)
def kill_hello(scheduler):
@rxin
rxin / df.py
Last active January 26, 2017 00:44
DataFrame simple aggregation performance benchmark
data = sqlContext.load("/home/rxin/ints.parquet")
data.groupBy("a").agg(col("a"), avg("num")).collect()
@wenzhixin
wenzhixin / ubuntu14.04-command-line-install-android-sdk
Last active July 4, 2024 05:29
Ubuntu 14.04 command line install android sdk
# install openjdk
sudo apt-get install openjdk-7-jdk
# download android sdk
wget http://dl.google.com/android/android-sdk_r24.2-linux.tgz
tar -xvf android-sdk_r24.2-linux.tgz
cd android-sdk-linux/tools
# install all sdk packages
@ezhulenev
ezhulenev / spark-thred-safe.scala
Created August 11, 2015 22:16
Thread-safe Spark Sql Context
object ServerSparkContext {
private[this] lazy val _sqlContext = {
val conf = new SparkConf()
.setAppName("....")
val sc = new SparkContext(conf)
// TODO: Bug in Spark: http://stackoverflow.com/questions/30323212
val ctx = new HiveContext(sc)
ctx.setConf("spark.sql.hive.convertMetastoreParquet", "false")
import org.apache.spark.sql.types.SQLUserDefinedType
@SQLUserDefinedType(udt = classOf[ElementWithCountUDT])
case class ElementWithCount(element:String, count:Int) extends Serializable {
override def toString: String = {
Seq(
element,
count
@joshlk
joshlk / faster_toPandas.py
Last active July 22, 2024 14:15
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
@ole-boss
ole-boss / Big Query Google Analytics sessions export including site speed metrics.sql
Last active June 13, 2020 14:41
This script helps you creating sessions from Google Analytics raw data on hit level in Google Big Query. The export will also include some metrics related to site speed based on Google's Latency Tracking KPIs.
SELECT
first_sessions.sid AS sessionId,
visitorId,
first_transactions.transactionId AS transactionId,
timestamp,
deviceCategory,
landingPage,
pageviews,
timeOnSite,
channel,