Skip to content

Instantly share code, notes, and snippets.

View duydo's full-sized avatar

Duy Do duydo

View GitHub Profile
@duydo
duydo / ByteTokenizer.java
Last active June 16, 2023 22:21
The byte tokenizer class allows an application to break a byte array into tokens.
/**
* @(#)ByteTokenizer.java Sep 23, 2008
* Copyright (C) 2008 Duy Do. All Rights Reserved.
*/
package com.duydo.util;
import java.util.Enumeration;
import java.util.NoSuchElementException;
/**
@duydo
duydo / bot.rb
Created April 27, 2023 04:21 — forked from dideler/bot.rb
Sending a notification message to Telegram using its HTTP API via cURL
# Use this script to test that your Telegram bot works.
#
# Install the dependency
#
# $ gem install telegram_bot
#
# Run the bot
#
# $ ruby bot.rb
#
@duydo
duydo / elasticsearch_best_practices.txt
Last active December 15, 2021 06:12
Elasticsearch - Index best practices from Shay Banon
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:
- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so
@duydo
duydo / proc_net_tcp_decode
Created August 2, 2019 07:15 — forked from jkstill/proc_net_tcp_decode
decode entries in /proc/net/tcp
Decoding the data in /proc/net/tcp:
Linux 5.x /proc/net/tcp
Linux 6.x /proc/PID/net/tcp
Given a socket:
$ ls -l /proc/24784/fd/11
lrwx------ 1 jkstill dba 64 Dec 4 16:22 /proc/24784/fd/11 -> socket:[15907701]
@duydo
duydo / elasticsearch.yml
Created September 28, 2018 05:02 — forked from reyjrar/elasticsearch.yml
ElasticSearch config for a write-heavy cluster
##################################################################
# /etc/elasticsearch/elasticsearch.yml
#
# Base configuration for a write heavy cluster
#
# Cluster / Node Basics
cluster.name: logng
# Node can have abritrary attributes we can use for routing
@duydo
duydo / gist:9270e6e9ac326184dab5b9b11ecde2e3
Created August 29, 2018 14:00 — forked from SegFaultAX/gist:10507478
Dotted path expansion for Python dictionary keys
import operator
from pprint import pprint
def is_dict(d):
return isinstance(d, dict)
def get(c, k, default=None):
try:
return c[k]
except (IndexError, KeyError, TypeError):
@duydo
duydo / gist:9587121
Last active March 21, 2018 12:15
Elasticsearch mutiple language mapping
{
"analysis": {
"filter": {
"ar_stop_filter": {
"type": "stop",
"stopwords": ["_arabic_"]
},
"bg_stop_filter": {
"type": "stop",
"stopwords": ["_bulgarian_"]
@duydo
duydo / gist:ac4358ec3bddcaba02cf347369923674
Last active July 26, 2017 07:12
ES mapping example to keep special chars
PUT test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"test_analyzer": {
"type":"custom",
"tokenizer": "whitespace",
@duydo
duydo / twitter_mapping.sh
Created October 17, 2013 09:52
Preserving Special Characters During Tokenization twitter message with elasticsearch
curl -XPUT 'http://localhost:9200/twitter' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"tweet_filter" : {
"type" : "word_delimiter",
@duydo
duydo / .gitignore
Created December 8, 2016 03:12 — forked from karmi/.gitignore
Example Nginx configurations for Elasticsearch
nginx/
!nginx/.gitkeep
!nginx/logs/.gitkeep
src/
tmp/