param paramsingh

## ask yc.txt
(transcribe) param@Params-MacBook-Pro transcribe % python -m transcribe.scripts.generate_ycombinator_index
no embedding for link: https://www.youtube.com/watch?v=qh8sHetf-Nk
INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 0 tokens
question: How do I come up with startup ideas?
⠴ thinking...INFO:root:> [query] Total LLM token usage: 12781 tokens
INFO:root:> [query] Total embedding token usage: 0 tokens


To come up with startup ideas, the best way is to notice them organically.

## playlist.py
"""
{
  "playlist": {
    "creator": "iliekcomputers",
    "date": "2021-02-28T11:41:15.994353+00:00",
    "extension": {
      "https://musicbrainz.org/doc/jspf#playlist": {
        "creator": "iliekcomputers",
        "last_modified_at": "2021-02-28T11:41:45.453529+00:00",
        "public": true

## gist:741a06789221f128a93a3c9c2543a6d5
* no order by, no limit -- https://explain.depesz.com/s/8fsx
* new query -- ll.id >= 10**6, with order by -- https://explain.depesz.com/s/Svet
* only join ll and hl, ll.id >= 10** 6, with order by -- https://explain.depesz.com/s/tlqV

## query_plans.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                paramsingh
                / query_plans.md
            
            
              Created
              May 16, 2019 18:34
            
          
    SELECT JOIN everything without greater than

acousticbrainz=> explain analyze select ll.gid::text, llj.data::text, ll.id from lowlevel as ll join lowlevel_json as llj on llj.id = ll.id left join highlevel as hl on ll.id = hl.id where hl.mbid is null limit 100;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------
-------------------------
 Limit  (cost=29.72..52.30 rows=1 width=68) (actual time=0.093..0.093 rows=0 loops=1)
   ->  Nested Loop  (cost=29.72..52.30 rows=1 width=68) (actual time=0.092..0.092 rows=0 loops=1)


## plans.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                paramsingh
                / plans.md
            
            
              Last active
              May 16, 2019 18:31
            
          
    SELECT JOIN everything without greater than

acousticbrainz=> explain analyze select ll.gid::text, llj.data::text, ll.id from lowlevel as ll join lowlevel_json as llj on llj.id = ll.id left join highlevel as hl on ll.id = hl.id where hl.mbid is null limit 100;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------
-------------------------
 Limit  (cost=29.72..52.30 rows=1 width=68) (actual time=0.093..0.093 rows=0 loops=1)
   ->  Nested Loop  (cost=29.72..52.30 rows=1 width=68) (actual time=0.092..0.092 rows=0 loops=1)


## gist:e82f4815811caba7adbc5428efefbdc1
https://stackoverflow.com/questions/31671634/handling-unicode-sequences-in-postgresql


## gist:45e7a5f29d1984070424d63baa5aadcc
==============================================================
batch size: 100, collect used
===============================================================
Query to get list of users proccessed in 3.80 s
Number of users: 1091
Doing a batch of 100 users
time taken to run all queries for 100 users: 2.75
time taken by collect call: 42.41
Query to calculate artist stats proccessed in 45.24 s
total time taken by batch: 45.24

## namenode
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
18/12/19 15:45:03 INFO hdfs.StateChange: BLOCK* allocate blk_1073741882_1058{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-0048a7ab-07dc-4ead-9662-953be463b484:NORMAL:10.0.0.56:50010|RBW], ReplicaUC[[DISK]DS-cf419549-7b27-4d64-8f9e-3addcffb4784:NORMAL:10.0.0.98:50010|RBW], ReplicaUC[[DISK]DS-7b8a44e2-b9e4-48fc-9875-3bb881f428ac:NORMAL:10.0.0.106:50010|RBW]]} for /data/fdajklafjds.j3jkiop4/_temporary/0/_temporary/attempt_20181219154503_0015_m_000002_0/part-00002-5235a4ab-6792-4d4f-b6c8-4c58dfe6f2e5-c000.json
18/12/19 15:45:03 WARN

## gist:ee188838e283e931203fec21fcb49fbd
2018-12-18 10:27:46,511 INFO hdfs.StateChange: DIR* completeFile: /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000000_0/part-00000-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json is closed by DFSClient_NONMAPREDUCE_-31797042_86
2018-12-18 10:27:46,584 INFO hdfs.StateChange: BLOCK* allocate blk_1073743002_2178, replicas=10.0.0.37:9866, 10.0.0.24:9866, 10.0.0.23:9866 for /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000007_0/part-00007-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json
2018-12-18 10:27:46,592 INFO hdfs.StateChange: BLOCK* allocate blk_1073743003_2179, replicas=10.0.0.23:9866, 10.0.0.24:9866, 10.0.0.37:9866 for /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000005_0/part-00005-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json
2018-12-18 10:27:46,595 INFO hdfs.StateChange: BLOCK* allocate blk_1073743004_2180, replicas=10.0.0.23:9866, 10.0.0.37:9866, 10.0.0.24:9866 for /data/

## gist:d426df3307e559c4f252f80f2f23bdb1
>>> spark.createDataFrame(sc.parallelize([1, 2, 3]).map(lambda x: Row(val=x))).write.format('json').save('hdfs://hadoop-master.spark-network:9000/data/listenbrainz/fdajklafjss.json')

[Stage 4:====================================>                      (5 + 3) / 8]2018-12-18 10:27:46 WARN  DFSClient:557 - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000005_0/part-00005-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2117)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:287)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2691)
	at org.apache.had
	(transcribe) param@Params-MacBook-Pro transcribe % python -m transcribe.scripts.generate_ycombinator_index
	no embedding for link: https://www.youtube.com/watch?v=qh8sHetf-Nk
	INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
	INFO:root:> [build_index_from_documents] Total embedding token usage: 0 tokens
	question: How do I come up with startup ideas?
	⠴ thinking...INFO:root:> [query] Total LLM token usage: 12781 tokens
	INFO:root:> [query] Total embedding token usage: 0 tokens


	To come up with startup ideas, the best way is to notice them organically.
	"""
	{
	"playlist": {
	"creator": "iliekcomputers",
	"date": "2021-02-28T11:41:15.994353+00:00",
	"extension": {
	"https://musicbrainz.org/doc/jspf#playlist": {
	"creator": "iliekcomputers",
	"last_modified_at": "2021-02-28T11:41:45.453529+00:00",
	"public": true
	* no order by, no limit -- https://explain.depesz.com/s/8fsx
	* new query -- ll.id >= 10**6, with order by -- https://explain.depesz.com/s/Svet
	* only join ll and hl, ll.id >= 10** 6, with order by -- https://explain.depesz.com/s/tlqV
	https://stackoverflow.com/questions/31671634/handling-unicode-sequences-in-postgresql
	==============================================================
	batch size: 100, collect used
	===============================================================
	Query to get list of users proccessed in 3.80 s
	Number of users: 1091
	Doing a batch of 100 users
	time taken to run all queries for 100 users: 2.75
	time taken by collect call: 42.41
	Query to calculate artist stats proccessed in 45.24 s
	total time taken by batch: 45.24
	18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
	18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
	18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
	18/12/19 15:45:03 WARN net.NetworkTopology: The cluster does not contain node: /default-rack/10.0.0.56:50010
	18/12/19 15:45:03 INFO hdfs.StateChange: BLOCK* allocate blk_1073741882_1058{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-0048a7ab-07dc-4ead-9662-953be463b484:NORMAL:10.0.0.56:50010\|RBW], ReplicaUC[[DISK]DS-cf419549-7b27-4d64-8f9e-3addcffb4784:NORMAL:10.0.0.98:50010\|RBW], ReplicaUC[[DISK]DS-7b8a44e2-b9e4-48fc-9875-3bb881f428ac:NORMAL:10.0.0.106:50010\|RBW]]} for /data/fdajklafjds.j3jkiop4/_temporary/0/_temporary/attempt_20181219154503_0015_m_000002_0/part-00002-5235a4ab-6792-4d4f-b6c8-4c58dfe6f2e5-c000.json
	18/12/19 15:45:03 WARN
	2018-12-18 10:27:46,511 INFO hdfs.StateChange: DIR* completeFile: /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000000_0/part-00000-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json is closed by DFSClient_NONMAPREDUCE_-31797042_86
	2018-12-18 10:27:46,584 INFO hdfs.StateChange: BLOCK* allocate blk_1073743002_2178, replicas=10.0.0.37:9866, 10.0.0.24:9866, 10.0.0.23:9866 for /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000007_0/part-00007-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json
	2018-12-18 10:27:46,592 INFO hdfs.StateChange: BLOCK* allocate blk_1073743003_2179, replicas=10.0.0.23:9866, 10.0.0.24:9866, 10.0.0.37:9866 for /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000005_0/part-00005-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json
	2018-12-18 10:27:46,595 INFO hdfs.StateChange: BLOCK* allocate blk_1073743004_2180, replicas=10.0.0.23:9866, 10.0.0.37:9866, 10.0.0.24:9866 for /data/
	>>> spark.createDataFrame(sc.parallelize([1, 2, 3]).map(lambda x: Row(val=x))).write.format('json').save('hdfs://hadoop-master.spark-network:9000/data/listenbrainz/fdajklafjss.json')

	[Stage 4:====================================> (5 + 3) / 8]2018-12-18 10:27:46 WARN DFSClient:557 - DataStreamer Exception
	org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/listenbrainz/fdajklafjss.json/_temporary/0/_temporary/attempt_20181218102746_0004_m_000005_0/part-00005-35dbdc62-ad4a-46a5-a7c6-b8150a593fe1-c000.json could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2117)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:287)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2691)
	at org.apache.had