Skip to content

Instantly share code, notes, and snippets.

View ksheng-'s full-sized avatar
😵

Kevin Sheng ksheng-

😵
  • Cooper Union
  • New York, NY
View GitHub Profile
@ksheng-
ksheng- / dbt.txt
Last active September 10, 2020 21:39
{"1":["cross","1","1","1","1","0","00FFFF","000000","FF0000","5","2","0","0","1","persistent"],"2":["circle","0","FFFFFF","000000","FF0000","2","18","2","50","0","1","persistent"],"3":["circle","1","FFFFFF00","000000","FF0000","2","22","1","50","0","0","persistent"],"d":1080}
{"1":["dot",0,"00FFFF","000000","FF0000","3","2","round","persistent","0"],"2":["circle","1","FFFFFF80","000000","FF0000","3","40","1","33","60","0","persistent"],"3":["none"],"d":1080}
6333f91b-22b6-434b-a068-a33f4936224e
CREATE OR REPLACE VIEW view_dealbook AS
SELECT p.trade_date,
p.source,
p.group_id,
p.create_us,
p.market_type,
p.hold_time,
p.symbol,
p.settlement_ccy,
p.size,
CREATE OR REPLACE VIEW view_dealbook AS
SELECT p.trade_date,
p.source,
p.group_id,
p.create_us,
p.market_type,
p.hold_time,
p.symbol,
p.settlement_ccy,
p.size,
update_spread_tiers_for_agents naive loop -> ~30s
w/ multiprocessing on update_spread_tiers_1_agent_from_file -> 6s
w/ multiprocessing on push_file_to_prod -> ?s
begin;
delete
from order_event
where trade_date < '2020-06-02'
  and event_type != 'FILL'
  and event_type != 'PARTIAL';
commit;
@ksheng-
ksheng- / kafka.md
Last active August 6, 2020 10:35
redeploy old md pipeline kafka

when kafkacat (or any other consumer) reads from a new topic, it automatically creates a 0 partition topic as a result, the mdcs can't publish and the mdp and mbs can't consume, throwing .unwrap errors in the lb logs (UnknownTopicOrPartition)

on each broker (deploy@lucera-ld4-kz-<00|01|02>) cd ~/src/kafka_2.10-0.8.2.1/config add delete.topic.enable=true to server.properties to enable topic deletion

restart broker: stop_k;sleep 5;stop_zk start_zk;sleep 5;start_k

https://colab.research.google.com/drive/1YjGz2f74abea8SqOp09o3u7vaHsHFfrb

I've been experimenting with the main umap repo and testing / reproducing results in a Colab high-memory test env.

The project is still under very active development and it looks like the author is very responsive to issues and feature requests. I ran into a few quirks trying to build the latest master (0.4dev), which includes a lot of new optimizations and things like an inverse transform.

If you build the latest version without having pynndescent installed, it'll fall back to a more naive approximate nearest neighbors algorithm, which is very expensive RAM wise (I was able to run in on up to ~7000 samples of MNIST, which used about 18GB of ram). Reverting to the latest stable release (0.3, with numba==0.46.0 and llvmlite==0.30), the memory usage went down to 2GB, which tracks with the original paper which ran on an 8GB instance (https://arxiv.org/pdf/1802.03426.pdf#page=26&zoom=auto,-205,300).

Building the latest master with `

WITH cte AS (
SELECT trade_date, source
FROM (
SELECT generate_series::date as trade_date
from generate_series(
($__timeFrom() AT TIME ZONE 'America/New_York' + interval '6h 58m')::date,
($__timeTo() AT TIME ZONE 'America/New_York' + interval '6h 58m')::date,
'1 day'
)
) c
import argparse
from confluent_kafka import Producer
from random import randint
parser = argparse.ArgumentParser()
parser.add_argument("--broker", default="lumefx-mdp-prod00", help="kafka broker")
args = parser.parse_args()
p = Producer({'bootstrap.servers': args.broker})
.Q.hdpf[`::5000;`:db/fnfx_ny4/;(.data.tradedate .z.p) - 1;`sym]