Skip to content

Instantly share code, notes, and snippets.

View liwh's full-sized avatar
🎯
Focusing

robie lee liwh

🎯
Focusing
View GitHub Profile
@liwh
liwh / avazu_ftrl_concurrent.go
Created June 11, 2020 02:11 — forked from jack281291/avazu_ftrl_concurrent.go
Kaggle Avazu Challenge: FTRL-Proximal with L1 & L2 implemented in Go (Concurrent/Multi-threaded)
// Based on tinrtgu's Python script here:
// https://www.kaggle.com/c/avazu-ctr-prediction/forums/t/10927/beat-the-benchmark-with-less-than-1mb-of-memory
package main
import (
"encoding/csv"
"os"
"strconv"
"hash/fnv"
"math"
https://press.one/p/address/v?s=2aa2051c648db6d09b4ce24be0bb00d278e15c6697b5602919f40a060083bb41d01415ab13403565bbc89a06edf25d6c5dd7b5d9ddeca73717ab6e04a3414e501&h=503c45393fbf9ecbefae56f94b5ee6157e1f1fc57cdf4fd6f72be55fe9a834a5&a=702e876f6bb7a019e13e94b2a86f167c9770a0b3&f=P1&v=2
@liwh
liwh / elasticsearch_best_practices.txt
Created March 21, 2018 03:07 — forked from duydo/elasticsearch_best_practices.txt
Elasticsearch - Index best practices from Shay Banon
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:
- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so
[core]
# The home folder for airflow, default is ~/airflow
airflow_home = /Users/p1nox/airflow
# The folder where your airflow pipelines live, most likely a
# subfolder in a code repository
dags_folder = /Users/p1nox/airflow/dags
# The folder where airflow should store its log files. This location
base_log_folder = /Users/p1nox/airflow/logs
[uwsgi]
socket = /data/app/run/%n.sock
pidfile2 = /data/app/run/%n.pid
logto2 = /data/app/logs/uwsgi.log
logdate = true
log-format = [%(addr)] [%(ctime)] [%(method)] [%(uri)] [%(proto)] [%(status)] [%(msecs)] [%(referer)] [%(uagent)]
memory-report = true
- certain endpoints are always blocked
if nginx_uri == "/_access_token" or nginx_uri == "/_me" then
ngx.exit(403)
end
-- import requirements
local cjson = require "cjson"
-- setup some app-level vars
local app_id = "APP_ID"
290a4210ec26f55ddbf5dae952ad0c3c
package
==> Installing dependencies for python: readline, sqlite, gdbm
==> Installing python dependency: readline
==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/readline-6.2.4.mavericks.bottle.2.tar.gz
######################################################################## 100.0%
==> Pouring readline-6.2.4.mavericks.bottle.2.tar.gz
==> Caveats
This formula is keg-only, so it was not symlinked into /usr/local.
OS X provides the BSD libedit library, which shadows libreadline.
In order to prevent conflicts when programs look for libreadline we are