Skip to content

Instantly share code, notes, and snippets.

@FeepingCreature
FeepingCreature / youre_doing_json_apis_wrong.md
Last active January 7, 2024 19:33
You Are Doing JSON APIs Wrong

You are doing JSON APIs wrong.

When you use JSON to call an API - not a REST API, but something like JSON-RPC - you will usually want to encode one of several possible messages.

Your request body looks like this:

{
 "type": "MessageWithA",
@luistung
luistung / tokenization.cpp
Created October 11, 2019 12:02
c++ version of bert tokenize
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include <utf8proc.h>
//https://unicode.org/reports/tr15/#Norm_Forms
//https://ssl.icu-project.org/apiref/icu4c/uchar_8h.html
@stefan-it
stefan-it / run_ner.py
Last active April 2, 2022 13:28
NER fine-tuning with PyTorch-Transformers (heavily based on https://github.com/kamalkraj/BERT-NER)
from __future__ import absolute_import, division, print_function
import argparse
import glob
import logging
import os
import random
import numpy as np
import torch
@JoeyBurzynski
JoeyBurzynski / 55-bytes-of-css.md
Last active July 20, 2024 05:29
58 bytes of css to look great nearly everywhere

58 bytes of CSS to look great nearly everywhere

When making this website, i wanted a simple, reasonable way to make it look good on most displays. Not counting any minimization techniques, the following 58 bytes worked well for me:

main {
  max-width: 38rem;
  padding: 2rem;
  margin: auto;
}
@fnky
fnky / ANSI.md
Last active July 27, 2024 22:42
ANSI Escape Codes

ANSI Escape Sequences

Standard escape codes are prefixed with Escape:

  • Ctrl-Key: ^[
  • Octal: \033
  • Unicode: \u001b
  • Hexadecimal: \x1B
  • Decimal: 27
@suxue
suxue / 1.cpp
Created October 22, 2018 11:12
multi_index
#include <boost/flyweight.hpp>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/member.hpp>
#include <string>
#include <cstdint>
#include <vector>
#include <iostream>
#include <tuple>
typedef std::tuple<short, std::uint8_t, std::uint8_t> Date;
@robertdale
robertdale / describe.groovy
Last active September 27, 2019 07:10
JanusGraph Schema Describe Command
// This can be imported via ./bin/gremlin.sh -i describe.groovy
// A variable 'graph' must be defined with a JanusGraph graph
// Run it as a plugin command ':schema'
// :schema describe
//
import org.janusgraph.graphdb.database.management.MgmtLogType
import org.codehaus.groovy.tools.shell.Groovysh
import org.codehaus.groovy.tools.shell.CommandSupport
  • What do Etcd, Consul, and Zookeeper do?
    • Service Registration:
      • Host, port number, and sometimes authentication credentials, protocols, versions numbers, and/or environment details.
    • Service Discovery:
      • Ability for client application to query the central registry to learn of service location.
    • Consistent and durable general-purpose K/V store across distributed system.
      • Some solutions support this better than others.
      • Based on Paxos or some derivative (i.e. Raft) algorithm to quickly converge to a consistent state.
  • Centralized locking can be based on this K/V store.
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active June 28, 2024 12:37
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@msmfsd
msmfsd / es7-async-await.js
Last active February 4, 2024 17:38
Javascript fetch JSON with ES7 Async Await
// Async/Await requirements: Latest Chrome/FF browser or Babel: https://babeljs.io/docs/plugins/transform-async-to-generator/
// Fetch requirements: Latest Chrome/FF browser or Github fetch polyfill: https://github.com/github/fetch
// async function
async function fetchAsync () {
// await response of fetch call
let response = await fetch('https://api.github.com');
// only proceed once promise is resolved
let data = await response.json();
// only proceed once second promise is resolved