Skip to content

Instantly share code, notes, and snippets.

View kymtwyf's full-sized avatar
:octocat:

Wu Yongfeng kymtwyf

:octocat:
View GitHub Profile
@oshai
oshai / scala-to-kotlin.kts
Last active February 9, 2023 10:45
Stupid and simple convert a Scala file (already renamed to .kt extension) to Kotlin
#!/usr/bin/env kscript
import java.io.File
// usage - one argument a .kt file (Scala file that was only renamed)
// or a directory
try {
main(args)
} catch (e: Exception) {
e.printStackTrace()
@sanchezzzhak
sanchezzzhak / clickhouse-get-tables-size.sql
Created January 18, 2018 13:43
clickhouse get tables size
SELECT table,
formatReadableSize(sum(bytes)) as size,
min(min_date) as min_date,
max(max_date) as max_date
FROM system.parts
WHERE active
GROUP BY table
@rainsunny
rainsunny / spark_withColumns.md
Created December 4, 2017 12:17
Spark/Scala repeated calls to withColumn() using the same function on multiple columns [foldLeft]

Suppose you need to apply the same function to multiple columns in one DataFrame, one straight way is like this:

val newDF = oldDF.withColumn("colA", func("colA")).withColumn("colB", func("colB")).withColumn("colC", func("colC"))

If you want to save some type, you can try this:

  1. Use select with varargs including *:
import spark.implicits._
@rbq
rbq / docker.yaml
Last active October 19, 2023 11:57
Install Docker CE on Ubuntu using Ansible
---
- hosts: all
tasks:
- name: Install prerequisites for Docker repository
apt:
name: ['apt-transport-https', 'ca-certificates', 'curl', 'gnupg2', 'software-properties-common']
update_cache: yes
- name: Add Docker GPG key
apt_key:
@crocker
crocker / spark-duplicates.scala
Last active July 2, 2020 12:15
Find duplicates in a Spark DataFrame
val transactions = spark.read
.option("header", "true")
.option("inferSchema", "true")
.json("s3n://bucket-name/transaction.json")
transactions.groupBy("id", "organization").count.sort($"count".desc).show

How to setup AWS lambda function to talk to the internet and VPC

I'm going to walk you through the steps for setting up a AWS Lambda to talk to the internet and a VPC. Let's dive in.

So it might be really unintuitive at first but lambda functions have three states.

  1. No VPC, where it can talk openly to the web, but can't talk to any of your AWS services.
  2. VPC, the default setting where the lambda function can talk to your AWS services but can't talk to the web.
  3. VPC with NAT, The best of both worlds, AWS services and web.
@sryze
sryze / proxy
Last active September 3, 2023 10:20
Quickly toggle HTTP(S) proxy on Mac OS X from command line
#!/bin/sh
SERVICE="Ethernet" # or "Wi-Fi"
PROXY_HOST="127.0.0.1"
PROXY_PORT="8888"
while [[ $# > 0 ]]
do
case "$1" in
on)
# Reference:
https://www.cloudgear.net/blog/2015/5-minutes-kubernetes-setup/
# install homebrew and cask
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# install virtualbox
brew cask install virtualbox
# install dockertoolbox
@ourmaninamsterdam
ourmaninamsterdam / LICENSE
Last active April 24, 2024 18:56
Arrayzing - The JavaScript array cheatsheet
The MIT License (MIT)
Copyright (c) 2015 Justin Perry
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

浅谈《【原创】深度分析Twitter Heron》

有幸拜读了《【原创】深度分析Twitter Heron》 ( http://www.longda.us/?p=529 )一文,十分感动国内社区对Heron的关注。但此文中有诸多重要问题值得商榷,我谨在此行文指出,还望能够帮助大家更好的理解Heron。转载烦请注明出处:https://gist.github.com/maosongfu/c3aeb1bb5eb7b39fcdc5

我是符茂松,目前在Twitter工作,是Heron的作者之一。这个领域水深,我也是初窥门径,希望能够与大家多多交流。

微博:符茂松

Twitter: Louis_Fumaosong