Skip to content

Instantly share code, notes, and snippets.

View paramsingh's full-sized avatar

Param Singh paramsingh

View GitHub Profile
@paramsingh
paramsingh / ask yc.txt
Created March 3, 2023 16:05
Ask YC demo
(transcribe) param@Params-MacBook-Pro transcribe % python -m transcribe.scripts.generate_ycombinator_index
no embedding for link: https://www.youtube.com/watch?v=qh8sHetf-Nk
INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 0 tokens
question: How do I come up with startup ideas?
⠴ thinking...INFO:root:> [query] Total LLM token usage: 12781 tokens
INFO:root:> [query] Total embedding token usage: 0 tokens
To come up with startup ideas, the best way is to notice them organically.
@paramsingh
paramsingh / playlist.py
Created January 6, 2022 23:11
Copilot is helpful
"""
{
"playlist": {
"creator": "iliekcomputers",
"date": "2021-02-28T11:41:15.994353+00:00",
"extension": {
"https://musicbrainz.org/doc/jspf#playlist": {
"creator": "iliekcomputers",
"last_modified_at": "2021-02-28T11:41:45.453529+00:00",
"public": true
* no order by, no limit -- https://explain.depesz.com/s/8fsx
* new query -- ll.id >= 10**6, with order by -- https://explain.depesz.com/s/Svet
* only join ll and hl, ll.id >= 10** 6, with order by -- https://explain.depesz.com/s/tlqV

SELECT JOIN everything without greater than

acousticbrainz=> explain analyze select ll.gid::text, llj.data::text, ll.id from lowlevel as ll join lowlevel_json as llj on llj.id = ll.id left join highlevel as hl on ll.id = hl.id where hl.mbid is null limit 100;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------
-------------------------
 Limit  (cost=29.72..52.30 rows=1 width=68) (actual time=0.093..0.093 rows=0 loops=1)
   ->  Nested Loop  (cost=29.72..52.30 rows=1 width=68) (actual time=0.092..0.092 rows=0 loops=1)

SELECT JOIN everything without greater than

acousticbrainz=> explain analyze select ll.gid::text, llj.data::text, ll.id from lowlevel as ll join lowlevel_json as llj on llj.id = ll.id left join highlevel as hl on ll.id = hl.id where hl.mbid is null limit 100;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------
-------------------------
 Limit  (cost=29.72..52.30 rows=1 width=68) (actual time=0.093..0.093 rows=0 loops=1)
   ->  Nested Loop  (cost=29.72..52.30 rows=1 width=68) (actual time=0.092..0.092 rows=0 loops=1)
https://stackoverflow.com/questions/31671634/handling-unicode-sequences-in-postgresql
==============================================================
batch size: 100, collect used
===============================================================
Query to get list of users proccessed in 3.80 s
Number of users: 1091
Doing a batch of 100 users
time taken to run all queries for 100 users: 2.75
time taken by collect call: 42.41
Query to calculate artist stats proccessed in 45.24 s
total time taken by batch: 45.24
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 INFO hdfs.StateChange: BLOCK* allocate blk_1073741882_1058{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-0048a7ab-07dc-4ead-9662-953be463b484:NORMAL:10.0.0.56:50010|RBW], ReplicaUC[[DISK]DS-cf419549-7b27-4d64-8f9e-3addcffb4784:NORMAL:10.0.0.98:50010|RBW], ReplicaUC[[DISK]DS-7b8a44e2-b9e4-48fc-9875-3bb881f428ac:NORMAL:10.0.0.106:50010|RBW]]} for /data/fdajklafjds.j3jkiop4/_temporary/0/_temporary/attempt_20181219154503_0015_m_000002_0/part-00002-5235a4ab-6792-4d4f-b6c8-4c58dfe6f2e5-c000.json
18/12/19 15:45:03 WARN
2018-12-18 10:27:46,511 INFO hdfs.StateChange: DIR* completeFile: /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000000_0/part-00000-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json is closed by DFSClient_NONMAPREDUCE_-31797042_86
2018-12-18 10:27:46,584 INFO hdfs.StateChange: BLOCK* allocate blk_1073743002_2178, replicas=10.0.0.37:9866, 10.0.0.24:9866, 10.0.0.23:9866 for /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000007_0/part-00007-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json
2018-12-18 10:27:46,592 INFO hdfs.StateChange: BLOCK* allocate blk_1073743003_2179, replicas=10.0.0.23:9866, 10.0.0.24:9866, 10.0.0.37:9866 for /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000005_0/part-00005-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json
2018-12-18 10:27:46,595 INFO hdfs.StateChange: BLOCK* allocate blk_1073743004_2180, replicas=10.0.0.23:9866, 10.0.0.37:9866, 10.0.0.24:9866 for /data/
>>> spark.createDataFrame(sc.parallelize([1, 2, 3]).map(lambda x: Row(val=x))).write.format('json').save('hdfs://hadoop-master.spark-network:9000/data/listenbrainz/fdajklafjss.json')
[Stage 4:====================================> (5 + 3) / 8]2018-12-18 10:27:46 WARN DFSClient:557 - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000005_0/part-00005-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2117)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:287)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2691)
at org.apache.had