Skip to content

Instantly share code, notes, and snippets.

View xiaodaigh's full-sized avatar

evalparse xiaodaigh

View GitHub Profile
@xiaodaigh
xiaodaigh / gist:46e3edad9c72dd10ae415e08ac2953c3
Created October 20, 2021 12:50
Simple demonstration of writing Parquet to S3
# to run this firstly set up the MinIO server
# on Windows download this https://dl.min.io/server/minio/release/windows-amd64/minio.exe
# download("https://dl.min.io/server/minio/release/windows-amd64/minio.exe")
# to run minio make sure the executable is in the PATH
# then run
# ;minio.exe server /path/to/minio/data
# e.g. minio.exe server c:/minio-data/
using Minio, Parquet, Parquet2, DataFrames, AWSS3
@xiaodaigh
xiaodaigh / 0_get_data.jl
Created June 4, 2021 12:50
Tang Dynasty poetry
using Gumbo, Cascadia, HTTP
using Serialization
urls= ["https://www.shicimingju.com/shicimark/tangshisanbaishou.html"]
urls = vcat(urls, ["https://www.shicimingju.com/shicimark/tangshisanbaishou_$(i)_0__0.html" for i in 2:16])
function get_chars(poem::Vector{<:AbstractString})::Set{Char}
mapreduce(Set, union, poem)
end
@xiaodaigh
xiaodaigh / nongshimcup.jl
Last active February 21, 2021 12:07
Nongshim Cup simulation
using Revise
includet("utils.jl")
function remove_player!(team)
if length(team) == 1
team.players = []
else
team.players = team.players[2:end]
end
team
@xiaodaigh
xiaodaigh / gist:0326a03d97a0b7ded0aa88d4aeeef812
Last active January 17, 2021 23:29
2021 Chunlan Cup simulation from quarters
struct Player
name::String
rating::Int
end
struct Match
best_of::Int
end
@xiaodaigh
xiaodaigh / 0_code.jl
Last active August 30, 2020 14:29
Fast implementation of `nuniuqe` in a SORTED vector
x = rand(1:1_000_000, 1_000_000_000)
using SortingLab
fsort!(x)
function unroll_loop(x)
count = 0
@inbounds count += x[1] != x[2]
@inbounds count += x[2] != x[3]
@xiaodaigh
xiaodaigh / code.jl
Created August 30, 2020 05:57
df1[B] = df2[B] where df1 and df2 are DataFrames B is boolean array
using DataFrames
df1 = DataFrame(a = repeat([1], 100), b = "a")
df2 = DataFrame(a = repeat([2], 100), b = "b")
B = Array{Bool, 2}(undef, 100, 2)
df1[B] # doesn't work
# Let's overload get index get index
@xiaodaigh
xiaodaigh / Dockerfile
Created December 30, 2019 04:14
Dockerfile for minimal r and python docker with arrow
FROM python:3.7-alpine3.10
RUN apk add --no-cache \
build-base \
cmake \
bash \
boost-dev \
autoconf \
zlib-dev \
flex \
@xiaodaigh
xiaodaigh / install-1.3-rc2.bash
Last active July 9, 2021 12:45
Install Julia Andriod
apt-get update && apt-get upgrade -y
apt-get install wget -y
apt-get install proot -y
apt-get install git -y
cd ~
git clone https://github.com/MFDGaming/ubuntu-in-termux.git
cd ubuntu-in-termux
chmod +x ubuntu.sh
./ubuntu.sh
cp ~/ubuntu-in-termux/resolv.conf ~/ubuntu-in-termux/ubuntu-fs/etc/
@xiaodaigh
xiaodaigh / data.table_vs_disk.frame.r
Created September 22, 2019 01:28
Benchmarking data.table vs disk.frame
library(data.table)
library(disk.frame)
setup_disk.frame()
bench_disk.frame_data.table_group_by <- function(data1,n) {
setDT(data1)
a.sharded.df = as.disk.frame(data1, shardby = c("year", "month", "day"))
a.not_sharded.df = as.disk.frame(data1)
@xiaodaigh
xiaodaigh / benchmarks.jl
Created September 17, 2019 13:21
Benchmark R vs Julia dataframe on disk format
using CSV, Feather
#using JLD2
#using JLD#, JLSO
using JDF, FileIO, Blosc, StatsPlots, RCall
using DataFrames, WeakRefStrings # required for JLD2, JDF
Blosc.set_num_threads(6)
gen_benchmark(dirpath, largest_file, outpath, data_label; delim = ',', header=true) = begin
if !isdir(outpath)
mkpath(outpath)