Skip to content

Instantly share code, notes, and snippets.

View zak10's full-sized avatar

Zach Kauffman zak10

View GitHub Profile
package blockchain
import (
cr "crypto/sha256"
"encoding/hex"
"encoding/json"
"time"
"fmt"
)
package main
import (
"encoding/json"
"fmt"
"golang.org/x/net/html"
"net/http"
"bufio"
"os"
"log"
// services.yml
catalog.kernel.request_event_listener:
class: App\CoreBundle\EventListener\KernelBootDatabaseSwitchListener
arguments: [@request, @doctrine.dbal.default_connection, @logger]
scope: request
tags:
- { name: kernel.event_listener, event: kernel.request, method: onKernelRequest }
// KernelContextListener
/**
@zak10
zak10 / Hydra.md
Last active January 11, 2017 13:56

The Problem

The update process in hive that joins the product catalog data to f_product_performance has been causing out of memory failures among many other different types of errors that are nearly impossible to diagnose.

Proposal 1

The first proposal involves sending aggregated data to a key/value store for hydra to consume during the selection process.

  • Create aggregation script at the end of stats pipeline in pyspark for last 60 days of product performance
  • Send aggregated data to (in memory) key/value store for O(1) lookup in hydra's selection process
  • Update hydra's filter to retrieve performance data from key/value store

Pros: speed up update process and reliability in hive, stateless, keeps hydra's speed.