Skip to content

Instantly share code, notes, and snippets.

@fhoffa
Last active December 23, 2015 23:09
Show Gist options
  • Save fhoffa/6708379 to your computer and use it in GitHub Desktop.
Save fhoffa/6708379 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "bigquery discovering data"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": "import bq\nimport datetime\nimport pandas as pd\nimport tabulate\nclient = bq.Client.Get()",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We'll play with the [bigquery-samples:reddit.full] dataset.\n\nFirst question is: How big is this dataset?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT COUNT(*) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])\n",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r6927b004530b87f9_0000014161ef1857_2 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n c\n-------\n2267098"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": "2,267,098 rows! Each representing one submission. What columns do we have?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "schema = client.GetTableSchema({'projectId': 'bigquery-samples', 'datasetId': 'reddit', 'tableId': 'words'})\nprint tabulate.tabulate([[x['name'], x['type']] for x in schema['fields']])\n\n",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "------------ ---------\nranked INTEGER\nsubr STRING\nwordpos INTEGER\ncreated_utc TIMESTAMP\nscore INTEGER\ndomain STRING\nid STRING\ntitle STRING\nauthor STRING\nups INTEGER\ndowns INTEGER\nnum_comments INTEGER\ntotal INTEGER\n------------ ---------\n"
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Let's explore this dataset.\n\nCOUNT(DISTINCT x) is a fast approximate counter of distinct values.\n\nTip: If you need a precise count, you can adjust COUNT DISTINCT, or use GROUP EACH."
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT COUNT(DISTINCT subr) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r72978a08a67354ae_0000014161ef2ebf_3 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n c\n----\n2526"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": "There are 2,526 different 'subr' (subreddits) in this dataset. Can we get a precise count?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT COUNT(DISTINCT subr, 5000) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])\n\n\nquery = \"\"\"\nSELECT COUNT(*) c FROM (SELECT subr FROM [bigquery-samples:reddit.full] GROUP BY 1)\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])\n",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_rfee8101457737db_0000014161ef398c_4 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n c\n----\n2499"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n\rWaiting on bqjob_r672b541ac2d01a42_0000014161ef3e7e_5 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n c\n----\n2499"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Another cool, fast, approximate function is TOP(). We can quickly find the most repeated elements on a column."
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT TOP(author) author, COUNT(*) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_ra0906b7992e6d7e_0000014161ef4db5_6 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nauthor c\n------------------ -----\n 77143\nanutensil 4119\natticus138 2866\ngst 1850\nmepper 1482\ndavidreiss666 1436\nNickster79 1324\nkirbyfood 1262\nAschebescher 1255\nsoupyhands 1160\nMind_Virus 1143\ncoconutwaters 1118\nFenrirIII 1039\ntone_is_everything 1005\nrising_threads_bot 999\nPoliticBot 998\nserendipitybot 996\nmarquis_of_chaos 983\ntabledresser 954\ncongressbot 947"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": "So from the 23,231,049 top posts, this tells us that there are some users that have authored a lot of them.\n\nCheck them out on http://www.reddit.com/user/anutensil, http://www.reddit.com/user/mepper, http://www.reddit.com/user/davidreiss666, etc.\n\nWe also see that a lot of posts don't have an author. Maybe they are deleted accounts? \n "
},
{
"cell_type": "markdown",
"metadata": {},
"source": "What are the top voted submissions?\n\nLEFT(x, 80) cuts the title so the table doesn't overflow."
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT subr, score, LEFT(title, 80)\nFROM [bigquery-samples:reddit.full]\nORDER BY 2 DESC\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r7aa1cec801843bc3_0000014161ef5be0_7 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr score f0_\n------ ------- --------------------------------------------------------------------------------\npics 21868 test post please ignore\nfunny 21304 The Bus Knight\nfunny 18793 I live in the same valley as Adam West. I decided to look him up in the phone bo\nvideos 17369 This guy is a reporter on Fox 2 here in Detroit. His name is Charlie LeDuff. He\nfunny 14883 After searching FB for people with the same name as me, I'd replicate their prof\nIAmA 14749 I am Barack Obama, President of the United States -- AMA\npics 14568 I sent Tom Hanks a 1934 Smith Corona typewriter with a typed invitation to come\nfunny 14148 Babe caught me slippin'.\nfunny 12962 I hate my job...\nfunny 12942 When I found out I could upvote by pressing 'A'"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Funny how a post titled \"please ignore\" is the one with most votes in all history.\n\nWhat are the subreddits that get the most votes?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT subr, SUM(score) sum_score\nFROM [bigquery-samples:reddit.full]\nGROUP BY 1\nORDER BY 2 DESC\nLIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r392994d37bd93528_0000014161ef69fa_8 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr sum_score\n----------------- -----------\npics 3752388\nfunny 3713684\nvideos 3201899\npolitics 3157350\ntechnology 2894626\nworldnews 2876818\naww 2857179\nWTF 2665757\ntodayilearned 2636914\ngaming 2633408\ngifs 2615749\nAdviceAnimals 2614179\nmildlyinteresting 2487454\n4chan 2443942\nreactiongifs 2320331\nscience 2320319\ncringepics 2320307\natheism 2254504\ntrees 2239064\nAskReddit 2198059"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Interesting. Now what is the #1 post for each subreddit. Window functions, let's do a simple example first:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date\nFROM [bigquery-samples:reddit.full]\nWHERE subr = 'pics'\nLIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r1b63d608e75fe8b_0000014161ef76d8_9 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr rank title date\n------ ------ ---------------------------------------- ----------\npics 1 test post please ignore 2009-07-18\npics 2 I sent Tom Hanks a 1934 Smith Corona typ 2012-10-09\npics 3 My friend calls him \"Mr Ridiculously Pho 2012-04-03\npics 4 The only way my 3 day-old daughter would 2012-12-28\npics 5 Photo I took of the Chicago skyline from 2012-07-13\npics 6 This is called humanity. 2011-07-22\npics 7 The frog that has lived in my shower for 2012-12-28\npics 8 Came across an overgrown cemetery in the 2013-03-12\npics 9 Tried taking a picture of a sink drainin 2012-11-12\npics 10 Poster ad for the Canadian Paralympics 2012-05-04\npics 11 NY Cop giving a brand new pair boots to 2012-11-28\npics 12 I'm 7 foot. For Halloween I went as a no 2012-04-11\npics 13 I wish I was a dog just so I could wear 2012-08-09\npics 14 Dropped my camera while taking a multish 2012-12-28\npics 15 Totally worth battling traffic, the cold 2012-06-07\npics 16 Photo of a North Korean family being del 2012-08-19\npics 17 Pipe Cleaner Wolf 2013-06-07\npics 18 Just discovered Lost Egyptian city found 2013-06-08\npics 19 Kinetic light sculptures created by Brit 2013-04-08\npics 20 My friend and his daughter dressed as a 2012-10-28"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": "So RANK() adds a column with the position of each post, depending on the order. Now we can filter with an external query only posts that have rank==1."
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT subr, title, date, score FROM (\n SELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full]\n)\nWHERE rank = 1\nLIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r2f14af0a937fa494_0000014161ef8276_10 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr title date score\n----------------- ---------------------------------------- ---------- -------\n0x10c The reason for slow 2012-08-16 430\n2007scape What Rank 235,661 cooking looks like 2013-03-09 463\n24hoursupport My reddit account of 3+ years (and other 2013-02-08 69\n30ROCK No big deal. Just a picture of Tina Fey 2013-02-01 1874\n3DS Quick question. Don't upvote please. 2013-02-04 575\n3Dprinting We 3D Printed someone's face 2013-07-17 259\n3FrameMovies [3FM] The Dark Knight Trilogy 2012-09-27 744\n3amjokes Why was the soldier pinned down? 2013-05-15 518\n45thworldproblems [PR_PHECY: 10b/2m/1o] The atomic chorus 2012-12-11 141\n49ers Update: The downvote button is now a Rav 2013-01-22 868\n4chan It's my gf 2012-12-28 7237\n4x4 While I was eating breakfast at my hotel 2013-07-20 217\n52weeksofcooking Official Weekly Challenge List 2011-11-29 108\n90daysgoal My motivation 2011-10-07 65\n90sHipHop The Pharcyde - Passin' Me By 2013-01-23 113\n90scartoons Grandpa Lou doesn't mess around 2012-11-17 208\n911truth Most Convincing Explanation for Why 9/11 2011-11-21 111\nABraThatFits What I picture every time I visit this s 2013-06-25 649\nADHD The most accurate representation of my d 2013-03-04 743\nAFL I am Mitch Robinson from the Carlton Foo 2013-08-15 308"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Can we only look at the top 20 subreddits? Easy with another subquery and the IN keyword:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT subr, title, date, score FROM (\n SELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full]\n WHERE subr IN (\n SELECT subr FROM (\n SELECT subr, SUM(score) sum_score\n FROM [bigquery-samples:reddit.full]\n GROUP BY 1\n ORDER BY 2 DESC\n LIMIT 20))\n)\nWHERE rank = 1\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r5d6091dd80d946b5_0000014161f5776f_30 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr title date score\n----------------- ---------------------------------------- ---------- -------\n4chan It's my gf 2012-12-28 7237\nAdviceAnimals Every time I talk on the phone 2012-11-28 7866\nAskReddit Throwaway time! What's your secret that 2012-05-01 12914\nWTF HOLY SHIT THE ONION 2012-09-13 9409\natheism Hurt me good r/atheism, $.50 to Doctors 2011-12-03 6303\naww My local humane society posts pictures o 2012-09-16 10813\ncringepics Text Me. 2012-12-29 10502\nfunny The Bus Knight 2012-07-28 21304\ngaming I've been playing the same game of Civil 2012-06-12 12060\ngifs 24 Hz sound + 24 fps camera + water = ma 2013-03-12 7095\nmildlyinteresting Buried a friend knee deep in sand. When 2013-04-09 7353\npics test post please ignore 2009-07-18 21868\npolitics Obama wins the Presidency! 2008-11-05 8535\nreactiongifs When the computer freezes while watching 2012-12-27 7951\nscience Mars Science Laboratory Curiosity has la 2012-08-06 5785\ntechnology \"Dear Google Fiber: Please, please, plea 2012-11-28 9486\ntodayilearned TIL that after needing 13 liters of bloo 2013-01-11 8198\ntrees Can't believe I met [10] Guy today 2012-07-01 9759\nvideos This guy is a reporter on Fox 2 here in 2012-10-16 17369\nworldnews \"What we have... is... concrete proof of 2013-06-08 8797"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Instead of always writing a long sub-query, it would be more efficient if we create a temporary table with the results of the top subreddits query.\n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\n SELECT subr, SUM(score) sum_score\n FROM [bigquery-samples:reddit.full]\n GROUP BY 1\n ORDER BY 2 DESC\n LIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(\n client.Query(query, destination_table='fh-bigquery:reddit.top20', write_disposition='WRITE_TRUNCATE')['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r1332d487b3a199af_0000014161f6fe7a_34 ... (0s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r1332d487b3a199af_0000014161f6fe7a_34 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr sum_score\n----------------- -----------\npics 3752388\nfunny 3713684\nvideos 3201899\npolitics 3157350\ntechnology 2894626\nworldnews 2876818\naww 2857179\nWTF 2665757\ntodayilearned 2636914\ngaming 2633408\ngifs 2615749\nAdviceAnimals 2614179\nmildlyinteresting 2487454\n4chan 2443942\nreactiongifs 2320331\nscience 2320319\ncringepics 2320307\natheism 2254504\ntrees 2239064\nAskReddit 2198059"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now the previous query gets simpler:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT subr, title, date, score FROM (\n SELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full]\n WHERE subr IN (\n SELECT subr FROM [fh-bigquery:reddit.top20])\n)\nWHERE rank = 1\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r2d934f3c94d36a4_0000014161f71c34_35 ... (0s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r2d934f3c94d36a4_0000014161f71c34_35 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nsubr title date score\n----------------- ---------------------------------------- ---------- -------\n4chan It's my gf 2012-12-28 7237\nAdviceAnimals Every time I talk on the phone 2012-11-28 7866\nAskReddit Throwaway time! What's your secret that 2012-05-01 12914\nWTF HOLY SHIT THE ONION 2012-09-13 9409\natheism Hurt me good r/atheism, $.50 to Doctors 2011-12-03 6303\naww My local humane society posts pictures o 2012-09-16 10813\ncringepics Text Me. 2012-12-29 10502\nfunny The Bus Knight 2012-07-28 21304\ngaming I've been playing the same game of Civil 2012-06-12 12060\ngifs 24 Hz sound + 24 fps camera + water = ma 2013-03-12 7095\nmildlyinteresting Buried a friend knee deep in sand. When 2013-04-09 7353\npics test post please ignore 2009-07-18 21868\npolitics Obama wins the Presidency! 2008-11-05 8535\nreactiongifs When the computer freezes while watching 2012-12-27 7951\nscience Mars Science Laboratory Curiosity has la 2012-08-06 5785\ntechnology \"Dear Google Fiber: Please, please, plea 2012-11-28 9486\ntodayilearned TIL that after needing 13 liters of bloo 2013-01-11 8198\ntrees Can't believe I met [10] Guy today 2012-07-01 9759\nvideos This guy is a reporter on Fox 2 here in 2012-10-16 17369\nworldnews \"What we have... is... concrete proof of 2013-06-08 8797"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 36
},
{
"cell_type": "markdown",
"metadata": {},
"source": "The IN keyword is equivalent to doing a JOIN"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT a.subr, title, date, score FROM (\n SELECT a.subr, RANK() OVER(PARTITION BY a.subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full] a\n JOIN [fh-bigquery:reddit.top20] b\n ON a.subr = b.subr\n)\nWHERE rank = 1\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_re5ab3e58cec13c3_0000014161f5be38_33 ... (0s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_re5ab3e58cec13c3_0000014161f5be38_33 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_subr title date score\n----------------- ---------------------------------------- ---------- -------\n4chan It's my gf 2012-12-28 7237\nAdviceAnimals Every time I talk on the phone 2012-11-28 7866\nAskReddit Throwaway time! What's your secret that 2012-05-01 12914\nWTF HOLY SHIT THE ONION 2012-09-13 9409\natheism Hurt me good r/atheism, $.50 to Doctors 2011-12-03 6303\naww My local humane society posts pictures o 2012-09-16 10813\ncringepics Text Me. 2012-12-29 10502\nfunny The Bus Knight 2012-07-28 21304\ngaming I've been playing the same game of Civil 2012-06-12 12060\ngifs 24 Hz sound + 24 fps camera + water = ma 2013-03-12 7095\nmildlyinteresting Buried a friend knee deep in sand. When 2013-04-09 7353\npics test post please ignore 2009-07-18 21868\npolitics Obama wins the Presidency! 2008-11-05 8535\nreactiongifs When the computer freezes while watching 2012-12-27 7951\nscience Mars Science Laboratory Curiosity has la 2012-08-06 5785\ntechnology \"Dear Google Fiber: Please, please, plea 2012-11-28 9486\ntodayilearned TIL that after needing 13 liters of bloo 2013-01-11 8198\ntrees Can't believe I met [10] Guy today 2012-07-01 9759\nvideos This guy is a reporter on Fox 2 here in 2012-10-16 17369\nworldnews \"What we have... is... concrete proof of 2013-06-08 8797"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 34
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now let's play with words. The table [bigquery-samples:reddit.words] has a row for each word in a title, making it easier to analyze individual words.\n\nCan we count how many times a word appeared on a top title each day?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT title, DATE(created_utc), COUNT(*)\nFROM [bigquery-samples:reddit.words]\nGROUP BY 1, 2\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (0s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (1s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (2s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (3s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (4s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (5s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (6s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (8s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (9s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (10s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (10s) Current status: DONE "
},
{
"ename": "BigqueryInvalidQueryError",
"evalue": "Error processing job 'data-sensing-lab:bqjob_r4cda068db9995646_0000014161efcb11_15': Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mBigqueryInvalidQueryError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-16-aec099f5b94e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[0mLIMIT\u001b[0m \u001b[1;36m10\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 6\u001b[0m \"\"\"\n\u001b[1;32m----> 7\u001b[1;33m \u001b[0mfields\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mReadSchemaAndRows\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mQuery\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'configuration'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'destinationTable'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmax_rows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1000\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 8\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mtabulate\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtabulate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'name'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mfields\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mQuery\u001b[1;34m(self, query, destination_table, create_disposition, write_disposition, priority, preserve_nulls, allow_large_results, dry_run, use_cache, **kwds)\u001b[0m\n\u001b[0;32m 1678\u001b[0m \u001b[0mrequest\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mquery_config\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1679\u001b[0m \u001b[0m_ApplyParameters\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdry_run\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdry_run\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1680\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mExecuteJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1681\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1682\u001b[0m def Load(self, destination_table_reference, source,\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mExecuteJob\u001b[1;34m(self, configuration, sync, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m 1450\u001b[0m job = self.RunJobSynchronously(\n\u001b[0;32m 1451\u001b[0m \u001b[0mconfiguration\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mproject_id\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mproject_id\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mupload_file\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mupload_file\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1452\u001b[1;33m job_id=job_id)\n\u001b[0m\u001b[0;32m 1453\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1454\u001b[0m job = self.StartJob(\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRunJobSynchronously\u001b[1;34m(self, configuration, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m 1439\u001b[0m \u001b[0mjob_reference\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mBigqueryClient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mConstructObjectReference\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1440\u001b[0m \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mWaitJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mjob_reference\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1441\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mRaiseIfJobError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1442\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1443\u001b[0m def ExecuteJob(self, configuration, sync=None,\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRaiseIfJobError\u001b[1;34m(job)\u001b[0m\n\u001b[0;32m 816\u001b[0m raise BigqueryError.Create(\n\u001b[0;32m 817\u001b[0m \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror_ls\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 818\u001b[1;33m job_ref=BigqueryClient.ConstructObjectReference(job))\n\u001b[0m\u001b[0;32m 819\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mjob\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mBigqueryInvalidQueryError\u001b[0m: Error processing job 'data-sensing-lab:bqjob_r4cda068db9995646_0000014161efcb11_15': Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": "It fails! There are too many groups for GROUP BY to group by. But there is a workaround: GROUP EACH. GROUP EACH introduces some limitations on queries, but allows much larger groupings."
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT title, DATE(created_utc) date, COUNT(*) c\nFROM [bigquery-samples:reddit.words]\nGROUP EACH BY 1, 2\nHAVING c > 2\nORDER BY 2 DESC, 3\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r5b50db9b3c30c23e_0000014161f023db_16 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\ntitle date c\n------- ---------- ---\ni 2013-08-20 3\n2 2013-08-20 3\nwhy 2013-08-20 3\ngo 2013-08-20 3\nMy 2013-08-20 3\nA 2013-08-20 3\nIf 2013-08-20 3\nwho 2013-08-20 3\nThis 2013-08-20 3\nit's 2013-08-20 3"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": "As we try to count words we can see a problem here: They are not normalized, including caps and punctuation. Luckily BigQuery can also run regular expressions over massive datasets with ease:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\nFROM [bigquery-samples:reddit.words]\nWHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\nGROUP EACH BY 1, 2\nHAVING c > 2\nORDER BY 2 DESC, 3\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r3a31e6a53d706d7_0000014161f02dd3_17 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nword date c\n---------- ---------- ---\nyear 2013-08-20 3\nstill 2013-08-20 3\nhad 2013-08-20 3\nsql 2013-08-20 3\nmap 2013-08-20 3\nexperience 2013-08-20 3\nhere 2013-08-20 3\nreally 2013-08-20 3\nnow 2013-08-20 3\nlong 2013-08-20 3"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": "(the over-complicated regular expression allows for some traditionally used punctuation inside words)\n\nLet's look at what has been the timeline of the 'sql' word in the reddit top posts:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\nFROM [bigquery-samples:reddit.words]\nWHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\nGROUP EACH BY 1, 2\nHAVING word = 'sql'\nORDER BY 2 DESC\nLIMIT 30\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r4dd564868c41fdda_0000014161f03b3c_18 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nword date c\n------ ---------- ---\nsql 2013-08-20 3\nsql 2013-08-19 1\nsql 2013-08-17 1\nsql 2013-08-16 3\nsql 2013-08-15 1\nsql 2013-08-14 4\nsql 2013-08-13 1\nsql 2013-08-12 3\nsql 2013-08-11 1\nsql 2013-08-09 4\nsql 2013-08-08 2\nsql 2013-08-07 1\nsql 2013-08-06 3\nsql 2013-08-05 1\nsql 2013-08-04 2\nsql 2013-07-31 4\nsql 2013-07-30 2\nsql 2013-07-29 2\nsql 2013-07-28 1\nsql 2013-07-25 1\nsql 2013-07-23 1\nsql 2013-07-19 2\nsql 2013-07-18 1\nsql 2013-07-16 3\nsql 2013-07-14 1\nsql 2013-07-13 3\nsql 2013-07-12 2\nsql 2013-07-11 3\nsql 2013-07-10 3\nsql 2013-07-09 1"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now let's try something fun: Can we find what other word has behaved like the word 'sql' within the top posts?\n\nFirst, we'll join this table to itself:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT a.word, b.word, a.date, a.c, b.c \nFROM (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n HAVING word = 'sql'\n) a\nJOIN (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n) b\nON a.date=b.date\nLIMIT 30\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (0s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (1s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (2s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (3s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (4s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (5s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (6s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (8s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (9s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (10s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (11s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (12s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (13s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (14s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (15s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (16s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (17s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (18s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (19s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (20s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (21s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (22s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (23s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (24s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (25s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (26s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (27s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (28s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (29s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (30s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (31s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (32s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (33s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (34s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (35s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (36s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (37s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (38s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (39s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (40s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (41s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (42s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (43s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (44s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (45s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (46s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (47s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (48s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (49s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (50s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (51s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (52s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (53s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (54s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (55s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (56s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (57s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (58s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (59s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (60s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (61s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (62s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (63s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (64s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (65s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (66s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (67s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (68s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (69s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (70s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (71s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (72s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (73s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (74s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (75s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (76s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (77s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (78s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (79s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (80s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (81s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (82s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (83s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (84s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (85s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (87s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (88s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (89s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (90s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (91s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (92s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (93s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (94s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (95s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (96s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (97s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (98s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (99s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (100s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (101s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (102s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (103s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (104s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (105s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (106s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (107s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (108s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (109s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (110s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (111s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (112s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (113s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (114s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (115s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (116s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (117s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (118s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (119s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (120s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (121s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (122s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (123s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (124s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (125s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (126s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (127s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (128s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (129s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (130s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (131s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (132s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (133s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (134s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (135s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (136s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (137s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (138s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (139s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (140s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (141s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (142s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (143s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (144s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (145s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (146s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (147s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (148s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (149s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (150s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (151s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (152s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (153s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (154s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (155s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (156s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (157s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (158s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (159s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (160s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (161s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (162s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (163s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (164s) Current status: RUNNING"
},
{
"output_type": "stream",
"stream": "stdout",
"text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (164s) Current status: DONE "
},
{
"ename": "BigqueryInvalidQueryError",
"evalue": "Error processing job 'data-sensing-lab:bqjob_r5b325e72a15a0bfd_0000014161f0448d_19': Table too large for JOIN. Consider using JOIN EACH. For more details, https://developers.google.com/bigquery/docs/query-reference#joins",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mBigqueryInvalidQueryError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-20-729781de8f2e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 17\u001b[0m \u001b[0mLIMIT\u001b[0m \u001b[1;36m30\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 18\u001b[0m \"\"\"\n\u001b[1;32m---> 19\u001b[1;33m \u001b[0mfields\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mReadSchemaAndRows\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mQuery\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'configuration'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'destinationTable'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmax_rows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1000\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 20\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mtabulate\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtabulate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'name'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mfields\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mQuery\u001b[1;34m(self, query, destination_table, create_disposition, write_disposition, priority, preserve_nulls, allow_large_results, dry_run, use_cache, **kwds)\u001b[0m\n\u001b[0;32m 1678\u001b[0m \u001b[0mrequest\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mquery_config\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1679\u001b[0m \u001b[0m_ApplyParameters\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdry_run\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdry_run\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1680\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mExecuteJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1681\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1682\u001b[0m def Load(self, destination_table_reference, source,\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mExecuteJob\u001b[1;34m(self, configuration, sync, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m 1450\u001b[0m job = self.RunJobSynchronously(\n\u001b[0;32m 1451\u001b[0m \u001b[0mconfiguration\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mproject_id\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mproject_id\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mupload_file\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mupload_file\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1452\u001b[1;33m job_id=job_id)\n\u001b[0m\u001b[0;32m 1453\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1454\u001b[0m job = self.StartJob(\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRunJobSynchronously\u001b[1;34m(self, configuration, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m 1439\u001b[0m \u001b[0mjob_reference\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mBigqueryClient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mConstructObjectReference\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1440\u001b[0m \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mWaitJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mjob_reference\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1441\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mRaiseIfJobError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1442\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1443\u001b[0m def ExecuteJob(self, configuration, sync=None,\n",
"\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRaiseIfJobError\u001b[1;34m(job)\u001b[0m\n\u001b[0;32m 816\u001b[0m raise BigqueryError.Create(\n\u001b[0;32m 817\u001b[0m \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror_ls\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 818\u001b[1;33m job_ref=BigqueryClient.ConstructObjectReference(job))\n\u001b[0m\u001b[0;32m 819\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mjob\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mBigqueryInvalidQueryError\u001b[0m: Error processing job 'data-sensing-lab:bqjob_r5b325e72a15a0bfd_0000014161f0448d_19': Table too large for JOIN. Consider using JOIN EACH. For more details, https://developers.google.com/bigquery/docs/query-reference#joins"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": "That didn't work. We can only JOIN if the second table is small enough. Luckily, we can use JOIN EACH for arbitrarily big tables.\n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = \"\"\"\nSELECT a.word, b.word, a.date, a.c, b.c \nFROM (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n HAVING word = 'sql'\n LIMIT 10\n) a\nJOIN EACH (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n) b\nON a.date=b.date\nWHERE a.word != b.word\nAND b.word > 'a'\nORDER BY 2, 3 DESC\nLIMIT 30\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r668abdc1c6cb4bc1_0000014161f35c27_20 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word a_date a_c b_c\n-------- --------- ---------- ----- -----\nsql a-18 2013-06-11 2 1\nsql a-basin 2012-07-03 2 1\nsql a-listers 2012-11-15 2 1\nsql a-rod 2012-10-08 3 1\nsql a-z 2013-06-11 2 1\nsql a10 2013-02-04 2 1\nsql a1c 2013-06-11 2 1\nsql a1c 2012-11-15 2 1\nsql a6m5 2012-11-15 2 1\nsql aaa 2013-02-04 2 1\nsql aaa 2011-12-27 1 1\nsql aaaand 2012-07-03 2 1\nsql aaaawwww 2012-11-15 2 1\nsql aaah 2012-10-08 3 1\nsql aaahh 2013-06-11 2 1\nsql aalegends 2012-10-08 3 2\nsql aan 2013-06-11 2 1\nsql aapa 2012-10-08 3 1\nsql aaron 2013-06-11 2 1\nsql aaron 2013-02-04 2 1\nsql aaron 2012-10-08 3 1\nsql aarseth 2012-03-26 1 1\nsql ab-soul 2013-02-04 2 1\nsql ab-soul 2012-11-15 2 1\nsql abandon 2013-06-11 2 1\nsql abandon 2013-02-04 2 1\nsql abandon 2012-10-08 3 1\nsql abandoned 2013-06-11 2 2\nsql abandoned 2013-02-04 2 8\nsql abandoned 2012-11-15 2 6"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You might be wondering what I'm doing here: Creating timelines of how many times a word has been mentioned, and putting their count side by side with a JOIN. Once I have that, I can look for correlations: Words that have behaved in the same way."
},
{
"cell_type": "code",
"collapsed": false,
"input": "base_query = \"\"\"\nSELECT a.word, b.word, CORR(a.c, b.c) corr, COUNT(*) c\nFROM (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, UTC_USEC_TO_MONTH(TIMESTAMP_TO_USEC(created_utc)) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") = '%s'\n GROUP EACH BY 1, 2\n) a\nJOIN EACH (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, UTC_USEC_TO_MONTH(TIMESTAMP_TO_USEC(created_utc)) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n) b\nON a.date=b.date\nWHERE a.word != b.word\nGROUP BY 1, 2\nHAVING c > 30\nORDER BY 3 DESC\nLIMIT 5\n\"\"\"\nquery = base_query % 'sql'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r5b346ff48c1cfff4_0000014161f366c2_21 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- -------- -------- ---\nsql server 0.962023 53\nsql david 0.959003 56\nsql policy 0.956993 55\nsql reality 0.95693 53\nsql across 0.95687 56"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'religion'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r2ee255a470a0edbb_0000014161f36d12_22 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- -------- -------- ---\nreligion brain 0.978323 61\nreligion maps 0.972462 50\nreligion network 0.970603 58\nreligion society 0.969936 56\nreligion human 0.969745 61"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'holidays'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r56e55315cff890ef_0000014161f37461_23 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- --------- -------- ---\nholidays christmas 0.994448 31\nholidays santa 0.990515 33\nholidays holiday 0.968293 33\nholidays gifts 0.942386 31\nholidays presents 0.859437 31"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'pizza'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r19f4aff7d827e811_0000014161f37b6d_24 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- -------- -------- ---\npizza great 0.988512 48\npizza office 0.986625 47\npizza food 0.986477 48\npizza making 0.98625 48\npizza better 0.986143 47"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'crazy'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r6c83abe5755ac7dc_0000014161f38240_25 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- -------- -------- ---\ncrazy coming 0.995498 62\ncrazy girl 0.995198 62\ncrazy used 0.995093 60\ncrazy john 0.99475 61\ncrazy fire 0.994627 57"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'election'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r1c170cbe5d0407c2_0000014161f38800_26 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- --------- -------- ---\nelection gaza 0.937955 40\nelection polling 0.912526 34\nelection nov 0.911871 37\nelection palestine 0.907705 31\nelection november 0.896156 46"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'homework'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r1f297722592a09b0_0000014161f38d5f_27 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- -------- -------- ---\nhomework dorm 0.917931 31\nhomework lecture 0.905694 40\nhomework class 0.904165 41\nhomework college 0.903107 41\nhomework campus 0.898893 36"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'neighbor'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r404b8034f90b36b_0000014161f3932f_28 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n-------- --------- -------- ---\nneighbor dog 0.987744 40\nneighbor apartment 0.985745 37\nneighbor town 0.984668 41\nneighbor tried 0.98371 41\nneighbor edge 0.983477 39"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": "query = base_query % 'volleyball'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\rWaiting on bqjob_r2847fa7a71dd67b4_0000014161f398db_29 ... (0s) Current status: DONE "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\na_word b_word corr c\n---------- -------- -------- ---\nvolleyball olympic 0.843229 31\nvolleyball olympics 0.835825 33\nvolleyball athletes 0.805992 33\nvolleyball rises 0.803989 32\nvolleyball ceremony 0.770325 33"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 30
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Note that by the nature of this dataset and looking for correlations by month by month, the best results come up when trying seasonal words that repeat year after year. "
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment