Skip to content

Instantly share code, notes, and snippets.

View kocolosk's full-sized avatar

Adam Kocoloski kocolosk

View GitHub Profile
@kocolosk
kocolosk / snapshots-for-indexing.py
Created March 2, 2021 01:20
Indexer can't make progress because of serializability conflicts
#!/usr/bin/env python3
import json
import random
import requests
import threading
URL = 'http://127.0.0.1:15984/snapshots-for-indexing'
USER = 'adm'
PASS = 'pass'
@kocolosk
kocolosk / hot-partition-reducer.js
Created January 22, 2021 00:40
Find hot partitions in CouchDB
function (keys, values, rereduce) {
var topTenPlusBoundaryKeys = function(partitions) {
// preserve boundary keys because we may not have the correct count for them yet
// not that it matters, but the array is reversed so these labels are correct
var first = partitions.pop();
var last = partitions.shift();
// sort the remaining entries by value
partitions.sort(function(p1, p2) { return p2.count - p1.count; });
[notice] 2019-08-13T17:51:13.420803Z nonode@nohost <0.4242.0> -------- 127.0.0.1 - - GET /db-7d3b1fc07b063c0fece110b7f013b3ca 200
[notice] 2019-08-13T17:51:13.433480Z nonode@nohost <0.4353.0> -------- 127.0.0.1 - - GET /db-b114040ecd03e61e2b576ce1edbf9310 200
[notice] 2019-08-13T17:51:13.435036Z nonode@nohost <0.4347.0> -------- Received {[{<<"source">>,<<"http://127.0.0.1:40837/db-7d3b1fc07b063c0fece110b7f013b3ca">>},{<<"target">>,<<"http://127.0.0.1:40837/db-b114040ecd03e61e2b576ce1edbf9310">>}]} {user_ctx,null,[<<"_admin">>],undefined}
[notice] 2019-08-13T17:51:13.631469Z nonode@nohost <0.4347.0> -------- Checking authorization
[notice] 2019-08-13T17:51:13.631626Z nonode@nohost <0.4347.0> -------- Starting listener
[notice] 2019-08-13T17:51:13.631844Z nonode@nohost <0.4347.0> -------- Starting replication loop
@kocolosk
kocolosk / gist:e2f025d6ee13a0642e21e4e722d6df68
Last active July 31, 2019 14:01 — forked from davisp/gist:480ccab2548a79d602d6cc8e1ee16cff
Build CouchDB in a docker in a fresh VM
apt-get update
apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add -
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
@kocolosk
kocolosk / map.js
Last active February 12, 2019 16:42
View to retrieve list of top 10 partitions by document count (always includes an extra 2 partitions as an implementation artifact)
function (doc) {
var partition = doc._id.slice(0, doc._id.indexOf(':'))
emit(partition, 1);
}
@kocolosk
kocolosk / Provision V100 GPU instance
Last active January 23, 2019 14:31
ibmcloud CLI examples
➜ ~ ibmcloud sl vs help
NAME:
ibmcloud sl vs - Gen1 infrastructure Virtual Servers
USAGE:
ibmcloud sl vs command [arguments...] [command options]
COMMANDS:
cancel Cancel virtual server instance
capture Capture virtual server instance into an image
create Create virtual server instance

Heterogeneous Schema Discovery

Moving documents from Cloudant into dashDB means transforming JSON documents into relational records. To build the correct relational representation, we first have to describe the schema of the JSON document itself. That is a trivial process for a single document but becomes very complex if many JSON documents have to be mapped to many relational records while still fitting a limited set of tables.

The Schema Discovery Process (SDP described in this playbook) has been designed to find such a set of tables suitable to hold as many (ideally all) JSON documents in a Cloudant database. The question is how to compute this set of tables?

The answer is again fairly simple if all JSON documents in a Cloudant database implement the same homogenous schema (i.e. have the same attributes, values of the same type, the same levels of nested objects and arrays). The answer is more complex if the documents have heterogenous schemata.

What is a heterogenous sc

# This file just grabs the list of pod IP addresses for the service
# called "couchdb" and feeds those as `couchdb@PODIP` nodes to mem3.
# It's just a proof of concept; the proper solution is likely a module
# in mem3 itself.
import json
import os
import requests
import time
@kocolosk
kocolosk / config-settings.yaml
Created October 6, 2016 23:09
CouchDB Kubernetes Pet Set
kind: ConfigMap
apiVersion: v1
metadata:
name: couchdb
data:
# Erlang VM settings. The -name flag activates the Erlang distribution; there
# should be no reason to change this setting. The -setcookie flag is used to
# control the Erlang magic cookie. CouchDB cluster nodes can only establish a
# connection with one another if they share the same magic cookie.
erlflags: >
@kocolosk
kocolosk / deployment-couchdb.yaml
Last active May 27, 2019 14:33
CouchDB 2.0 in Kubernetes
# Start a 3 node cluster and join it together automatically. Uses
# local ephemeral disk for database storage.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: couchdb
spec:
replicas: 3
template: