fhoffa/bigquery discovering data.ipynb

## bigquery discovering data.ipynb
{
 "metadata": {
  "name": "bigquery discovering data"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "import bq\nimport datetime\nimport pandas as pd\nimport tabulate\nclient = bq.Client.Get()",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "We'll play with the [bigquery-samples:reddit.full] dataset.\n\nFirst question is: How big is this dataset?"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT COUNT(*) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])\n",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r6927b004530b87f9_0000014161ef1857_2 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n      c\n-------\n2267098"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "2,267,098 rows! Each representing one submission. What columns do we have?"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "schema = client.GetTableSchema({'projectId': 'bigquery-samples', 'datasetId': 'reddit', 'tableId': 'words'})\nprint tabulate.tabulate([[x['name'], x['type']] for x in schema['fields']])\n\n",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "------------  ---------\nranked        INTEGER\nsubr          STRING\nwordpos       INTEGER\ncreated_utc   TIMESTAMP\nscore         INTEGER\ndomain        STRING\nid            STRING\ntitle         STRING\nauthor        STRING\nups           INTEGER\ndowns         INTEGER\nnum_comments  INTEGER\ntotal         INTEGER\n------------  ---------\n"
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Let's explore this dataset.\n\nCOUNT(DISTINCT x) is a fast approximate counter of distinct values.\n\nTip: If you need a precise count, you can adjust COUNT DISTINCT, or use GROUP EACH."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT COUNT(DISTINCT subr) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r72978a08a67354ae_0000014161ef2ebf_3 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n   c\n----\n2526"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "There are 2,526 different 'subr' (subreddits) in this dataset. Can we get a precise count?"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT COUNT(DISTINCT subr, 5000) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])\n\n\nquery = \"\"\"\nSELECT COUNT(*) c FROM (SELECT subr FROM [bigquery-samples:reddit.full] GROUP BY 1)\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])\n",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_rfee8101457737db_0000014161ef398c_4 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n   c\n----\n2499"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n\rWaiting on bqjob_r672b541ac2d01a42_0000014161ef3e7e_5 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n   c\n----\n2499"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Another cool, fast, approximate function is TOP(). We can quickly find the most repeated elements on a column."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT TOP(author) author, COUNT(*) c FROM [bigquery-samples:reddit.full]\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_ra0906b7992e6d7e_0000014161ef4db5_6 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nauthor                  c\n------------------  -----\n                    77143\nanutensil            4119\natticus138           2866\ngst                  1850\nmepper               1482\ndavidreiss666        1436\nNickster79           1324\nkirbyfood            1262\nAschebescher         1255\nsoupyhands           1160\nMind_Virus           1143\ncoconutwaters        1118\nFenrirIII            1039\ntone_is_everything   1005\nrising_threads_bot    999\nPoliticBot            998\nserendipitybot        996\nmarquis_of_chaos      983\ntabledresser          954\ncongressbot           947"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "So from the 23,231,049 top posts, this tells us that there are some users that have authored a lot of them.\n\nCheck them out on http://www.reddit.com/user/anutensil, http://www.reddit.com/user/mepper, http://www.reddit.com/user/davidreiss666, etc.\n\nWe also see that a lot of posts don't have an author. Maybe they are deleted accounts?            \n            "
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "What are the top voted submissions?\n\nLEFT(x, 80) cuts the title so the table doesn't overflow."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT subr, score, LEFT(title, 80)\nFROM [bigquery-samples:reddit.full]\nORDER BY 2 DESC\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r7aa1cec801843bc3_0000014161ef5be0_7 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr      score  f0_\n------  -------  --------------------------------------------------------------------------------\npics      21868  test post please ignore\nfunny     21304  The Bus Knight\nfunny     18793  I live in the same valley as Adam West. I decided to look him up in the phone bo\nvideos    17369  This guy is a reporter on Fox 2 here in Detroit. His name is Charlie LeDuff. He\nfunny     14883  After searching FB for people with the same name as me, I'd replicate their prof\nIAmA      14749  I am Barack Obama, President of the United States -- AMA\npics      14568  I sent Tom Hanks a 1934 Smith Corona typewriter with a typed invitation to come\nfunny     14148  Babe caught me slippin'.\nfunny     12962  I hate my job...\nfunny     12942  When I found out I could upvote by pressing 'A'"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Funny how a post titled \"please ignore\" is the one with most votes in all history.\n\nWhat are the subreddits that get the most votes?"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT subr, SUM(score) sum_score\nFROM [bigquery-samples:reddit.full]\nGROUP BY 1\nORDER BY 2 DESC\nLIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r392994d37bd93528_0000014161ef69fa_8 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr                 sum_score\n-----------------  -----------\npics                   3752388\nfunny                  3713684\nvideos                 3201899\npolitics               3157350\ntechnology             2894626\nworldnews              2876818\naww                    2857179\nWTF                    2665757\ntodayilearned          2636914\ngaming                 2633408\ngifs                   2615749\nAdviceAnimals          2614179\nmildlyinteresting      2487454\n4chan                  2443942\nreactiongifs           2320331\nscience                2320319\ncringepics             2320307\natheism                2254504\ntrees                  2239064\nAskReddit              2198059"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Interesting. Now what is the #1 post for each subreddit. Window functions, let's do a simple example first:"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date\nFROM [bigquery-samples:reddit.full]\nWHERE subr = 'pics'\nLIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r1b63d608e75fe8b_0000014161ef76d8_9 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr      rank  title                                     date\n------  ------  ----------------------------------------  ----------\npics         1  test post please ignore                   2009-07-18\npics         2  I sent Tom Hanks a 1934 Smith Corona typ  2012-10-09\npics         3  My friend calls him \"Mr Ridiculously Pho  2012-04-03\npics         4  The only way my 3 day-old daughter would  2012-12-28\npics         5  Photo I took of the Chicago skyline from  2012-07-13\npics         6  This is called humanity.                  2011-07-22\npics         7  The frog that has lived in my shower for  2012-12-28\npics         8  Came across an overgrown cemetery in the  2013-03-12\npics         9  Tried taking a picture of a sink drainin  2012-11-12\npics        10  Poster ad for the Canadian Paralympics    2012-05-04\npics        11  NY Cop giving a brand new pair boots to   2012-11-28\npics        12  I'm 7 foot. For Halloween I went as a no  2012-04-11\npics        13  I wish I was a dog just so I could wear   2012-08-09\npics        14  Dropped my camera while taking a multish  2012-12-28\npics        15  Totally worth battling traffic, the cold  2012-06-07\npics        16  Photo of a North Korean family being del  2012-08-19\npics        17  Pipe Cleaner Wolf                         2013-06-07\npics        18  Just discovered Lost Egyptian city found  2013-06-08\npics        19  Kinetic light sculptures created by Brit  2013-04-08\npics        20  My friend and his daughter dressed as a   2012-10-28"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "So RANK() adds a column with the position of each post, depending on the order. Now we can filter with an external query only posts that have rank==1."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT subr, title, date, score FROM (\n SELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full]\n)\nWHERE rank = 1\nLIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r2f14af0a937fa494_0000014161ef8276_10 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr               title                                     date          score\n-----------------  ----------------------------------------  ----------  -------\n0x10c              The reason for slow                       2012-08-16      430\n2007scape          What Rank 235,661 cooking looks like      2013-03-09      463\n24hoursupport      My reddit account of 3+ years (and other  2013-02-08       69\n30ROCK             No big deal. Just a picture of Tina Fey   2013-02-01     1874\n3DS                Quick question. Don't upvote please.      2013-02-04      575\n3Dprinting         We 3D Printed someone's face              2013-07-17      259\n3FrameMovies       [3FM] The Dark Knight Trilogy             2012-09-27      744\n3amjokes           Why was the soldier pinned down?          2013-05-15      518\n45thworldproblems  [PR_PHECY: 10b/2m/1o] The atomic chorus   2012-12-11      141\n49ers              Update: The downvote button is now a Rav  2013-01-22      868\n4chan              It's my gf                                2012-12-28     7237\n4x4                While I was eating breakfast at my hotel  2013-07-20      217\n52weeksofcooking   Official Weekly Challenge List            2011-11-29      108\n90daysgoal         My motivation                             2011-10-07       65\n90sHipHop          The Pharcyde - Passin' Me By              2013-01-23      113\n90scartoons        Grandpa Lou doesn't mess around           2012-11-17      208\n911truth           Most Convincing Explanation for Why 9/11  2011-11-21      111\nABraThatFits       What I picture every time I visit this s  2013-06-25      649\nADHD               The most accurate representation of my d  2013-03-04      743\nAFL                I am Mitch Robinson from the Carlton Foo  2013-08-15      308"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Can we only look at the top 20 subreddits? Easy with another subquery and the IN keyword:"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT subr, title, date, score FROM (\n SELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full]\n WHERE subr IN (\n  SELECT subr FROM (\n    SELECT subr, SUM(score) sum_score\n    FROM [bigquery-samples:reddit.full]\n    GROUP BY 1\n    ORDER BY 2 DESC\n    LIMIT 20))\n)\nWHERE rank = 1\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r5d6091dd80d946b5_0000014161f5776f_30 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr               title                                     date          score\n-----------------  ----------------------------------------  ----------  -------\n4chan              It's my gf                                2012-12-28     7237\nAdviceAnimals      Every time I talk on the phone            2012-11-28     7866\nAskReddit          Throwaway time! What's your secret that   2012-05-01    12914\nWTF                HOLY SHIT THE ONION                       2012-09-13     9409\natheism            Hurt me good r/atheism, $.50 to Doctors   2011-12-03     6303\naww                My local humane society posts pictures o  2012-09-16    10813\ncringepics         Text Me.                                  2012-12-29    10502\nfunny              The Bus Knight                            2012-07-28    21304\ngaming             I've been playing the same game of Civil  2012-06-12    12060\ngifs               24 Hz sound + 24 fps camera + water = ma  2013-03-12     7095\nmildlyinteresting  Buried a friend knee deep in sand. When   2013-04-09     7353\npics               test post please ignore                   2009-07-18    21868\npolitics           Obama wins the Presidency!                2008-11-05     8535\nreactiongifs       When the computer freezes while watching  2012-12-27     7951\nscience            Mars Science Laboratory Curiosity has la  2012-08-06     5785\ntechnology         \"Dear Google Fiber: Please, please, plea  2012-11-28     9486\ntodayilearned      TIL that after needing 13 liters of bloo  2013-01-11     8198\ntrees              Can't believe I met [10] Guy today        2012-07-01     9759\nvideos             This guy is a reporter on Fox 2 here in   2012-10-16    17369\nworldnews          \"What we have... is... concrete proof of  2013-06-08     8797"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 31
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Instead of always writing a long sub-query, it would be more efficient if we create a temporary table with the results of the top subreddits query.\n"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\n    SELECT subr, SUM(score) sum_score\n    FROM [bigquery-samples:reddit.full]\n    GROUP BY 1\n    ORDER BY 2 DESC\n    LIMIT 20\n\"\"\"\nfields, data = client.ReadSchemaAndRows(\n    client.Query(query, destination_table='fh-bigquery:reddit.top20', write_disposition='WRITE_TRUNCATE')['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r1332d487b3a199af_0000014161f6fe7a_34 ... (0s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r1332d487b3a199af_0000014161f6fe7a_34 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr                 sum_score\n-----------------  -----------\npics                   3752388\nfunny                  3713684\nvideos                 3201899\npolitics               3157350\ntechnology             2894626\nworldnews              2876818\naww                    2857179\nWTF                    2665757\ntodayilearned          2636914\ngaming                 2633408\ngifs                   2615749\nAdviceAnimals          2614179\nmildlyinteresting      2487454\n4chan                  2443942\nreactiongifs           2320331\nscience                2320319\ncringepics             2320307\natheism                2254504\ntrees                  2239064\nAskReddit              2198059"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 35
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Now the previous query gets simpler:"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT subr, title, date, score FROM (\n SELECT subr, RANK() OVER(PARTITION BY subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full]\n WHERE subr IN (\n  SELECT subr FROM [fh-bigquery:reddit.top20])\n)\nWHERE rank = 1\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r2d934f3c94d36a4_0000014161f71c34_35 ... (0s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r2d934f3c94d36a4_0000014161f71c34_35 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nsubr               title                                     date          score\n-----------------  ----------------------------------------  ----------  -------\n4chan              It's my gf                                2012-12-28     7237\nAdviceAnimals      Every time I talk on the phone            2012-11-28     7866\nAskReddit          Throwaway time! What's your secret that   2012-05-01    12914\nWTF                HOLY SHIT THE ONION                       2012-09-13     9409\natheism            Hurt me good r/atheism, $.50 to Doctors   2011-12-03     6303\naww                My local humane society posts pictures o  2012-09-16    10813\ncringepics         Text Me.                                  2012-12-29    10502\nfunny              The Bus Knight                            2012-07-28    21304\ngaming             I've been playing the same game of Civil  2012-06-12    12060\ngifs               24 Hz sound + 24 fps camera + water = ma  2013-03-12     7095\nmildlyinteresting  Buried a friend knee deep in sand. When   2013-04-09     7353\npics               test post please ignore                   2009-07-18    21868\npolitics           Obama wins the Presidency!                2008-11-05     8535\nreactiongifs       When the computer freezes while watching  2012-12-27     7951\nscience            Mars Science Laboratory Curiosity has la  2012-08-06     5785\ntechnology         \"Dear Google Fiber: Please, please, plea  2012-11-28     9486\ntodayilearned      TIL that after needing 13 liters of bloo  2013-01-11     8198\ntrees              Can't believe I met [10] Guy today        2012-07-01     9759\nvideos             This guy is a reporter on Fox 2 here in   2012-10-16    17369\nworldnews          \"What we have... is... concrete proof of  2013-06-08     8797"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 36
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "The IN keyword is equivalent to doing a JOIN"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT a.subr, title, date, score FROM (\n SELECT a.subr, RANK() OVER(PARTITION BY a.subr ORDER BY score desc) rank, LEFT(title, 40) title, DATE(created_utc) date, score\n FROM [bigquery-samples:reddit.full] a\n JOIN [fh-bigquery:reddit.top20] b\n ON a.subr = b.subr\n)\nWHERE rank = 1\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_re5ab3e58cec13c3_0000014161f5be38_33 ... (0s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_re5ab3e58cec13c3_0000014161f5be38_33 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_subr             title                                     date          score\n-----------------  ----------------------------------------  ----------  -------\n4chan              It's my gf                                2012-12-28     7237\nAdviceAnimals      Every time I talk on the phone            2012-11-28     7866\nAskReddit          Throwaway time! What's your secret that   2012-05-01    12914\nWTF                HOLY SHIT THE ONION                       2012-09-13     9409\natheism            Hurt me good r/atheism, $.50 to Doctors   2011-12-03     6303\naww                My local humane society posts pictures o  2012-09-16    10813\ncringepics         Text Me.                                  2012-12-29    10502\nfunny              The Bus Knight                            2012-07-28    21304\ngaming             I've been playing the same game of Civil  2012-06-12    12060\ngifs               24 Hz sound + 24 fps camera + water = ma  2013-03-12     7095\nmildlyinteresting  Buried a friend knee deep in sand. When   2013-04-09     7353\npics               test post please ignore                   2009-07-18    21868\npolitics           Obama wins the Presidency!                2008-11-05     8535\nreactiongifs       When the computer freezes while watching  2012-12-27     7951\nscience            Mars Science Laboratory Curiosity has la  2012-08-06     5785\ntechnology         \"Dear Google Fiber: Please, please, plea  2012-11-28     9486\ntodayilearned      TIL that after needing 13 liters of bloo  2013-01-11     8198\ntrees              Can't believe I met [10] Guy today        2012-07-01     9759\nvideos             This guy is a reporter on Fox 2 here in   2012-10-16    17369\nworldnews          \"What we have... is... concrete proof of  2013-06-08     8797"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 34
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Now let's play with words. The table [bigquery-samples:reddit.words] has a row for each word in a title, making it easier to analyze individual words.\n\nCan we count how many times a word appeared on a top title each day?"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT title, DATE(created_utc), COUNT(*)\nFROM [bigquery-samples:reddit.words]\nGROUP BY 1, 2\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (0s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (1s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (2s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (3s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (4s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (5s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (6s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (8s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (9s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (10s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r4cda068db9995646_0000014161efcb11_15 ... (10s) Current status: DONE   "
      },
      {
       "ename": "BigqueryInvalidQueryError",
       "evalue": "Error processing job 'data-sensing-lab:bqjob_r4cda068db9995646_0000014161efcb11_15': Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby",
       "output_type": "pyerr",
       "traceback": [
        "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mBigqueryInvalidQueryError\u001b[0m                 Traceback (most recent call last)",
        "\u001b[1;32m<ipython-input-16-aec099f5b94e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[0mLIMIT\u001b[0m \u001b[1;36m10\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      6\u001b[0m \"\"\"\n\u001b[1;32m----> 7\u001b[1;33m \u001b[0mfields\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mReadSchemaAndRows\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mQuery\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'configuration'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'destinationTable'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmax_rows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1000\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      8\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mtabulate\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtabulate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'name'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mfields\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mQuery\u001b[1;34m(self, query, destination_table, create_disposition, write_disposition, priority, preserve_nulls, allow_large_results, dry_run, use_cache, **kwds)\u001b[0m\n\u001b[0;32m   1678\u001b[0m     \u001b[0mrequest\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mquery_config\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1679\u001b[0m     \u001b[0m_ApplyParameters\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdry_run\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdry_run\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1680\u001b[1;33m     \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mExecuteJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1681\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1682\u001b[0m   def Load(self, destination_table_reference, source,\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mExecuteJob\u001b[1;34m(self, configuration, sync, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m   1450\u001b[0m       job = self.RunJobSynchronously(\n\u001b[0;32m   1451\u001b[0m           \u001b[0mconfiguration\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mproject_id\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mproject_id\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mupload_file\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mupload_file\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1452\u001b[1;33m           job_id=job_id)\n\u001b[0m\u001b[0;32m   1453\u001b[0m     \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1454\u001b[0m       job = self.StartJob(\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRunJobSynchronously\u001b[1;34m(self, configuration, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m   1439\u001b[0m       \u001b[0mjob_reference\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mBigqueryClient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mConstructObjectReference\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1440\u001b[0m       \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mWaitJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mjob_reference\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1441\u001b[1;33m     \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mRaiseIfJobError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1442\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1443\u001b[0m   def ExecuteJob(self, configuration, sync=None,\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRaiseIfJobError\u001b[1;34m(job)\u001b[0m\n\u001b[0;32m    816\u001b[0m       raise BigqueryError.Create(\n\u001b[0;32m    817\u001b[0m           \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror_ls\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 818\u001b[1;33m           job_ref=BigqueryClient.ConstructObjectReference(job))\n\u001b[0m\u001b[0;32m    819\u001b[0m     \u001b[1;32mreturn\u001b[0m \u001b[0mjob\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
        "\u001b[1;31mBigqueryInvalidQueryError\u001b[0m: Error processing job 'data-sensing-lab:bqjob_r4cda068db9995646_0000014161efcb11_15': Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "It fails! There are too many groups for GROUP BY to group by. But there is a workaround: GROUP EACH. GROUP EACH introduces some limitations on queries, but allows much larger groupings."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT title, DATE(created_utc) date, COUNT(*) c\nFROM [bigquery-samples:reddit.words]\nGROUP EACH BY 1, 2\nHAVING c > 2\nORDER BY 2 DESC, 3\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r5b50db9b3c30c23e_0000014161f023db_16 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\ntitle    date          c\n-------  ----------  ---\ni        2013-08-20    3\n2        2013-08-20    3\nwhy      2013-08-20    3\ngo       2013-08-20    3\nMy       2013-08-20    3\nA        2013-08-20    3\nIf       2013-08-20    3\nwho      2013-08-20    3\nThis     2013-08-20    3\nit's     2013-08-20    3"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "As we try to count words we can see a problem here: They are not normalized, including caps and punctuation. Luckily BigQuery can also run regular expressions over massive datasets with ease:"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\nFROM [bigquery-samples:reddit.words]\nWHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\nGROUP EACH BY 1, 2\nHAVING c > 2\nORDER BY 2 DESC, 3\nLIMIT 10\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r3a31e6a53d706d7_0000014161f02dd3_17 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nword        date          c\n----------  ----------  ---\nyear        2013-08-20    3\nstill       2013-08-20    3\nhad         2013-08-20    3\nsql         2013-08-20    3\nmap         2013-08-20    3\nexperience  2013-08-20    3\nhere        2013-08-20    3\nreally      2013-08-20    3\nnow         2013-08-20    3\nlong        2013-08-20    3"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "(the over-complicated regular expression allows for some traditionally used punctuation inside words)\n\nLet's look at what has been the timeline of the 'sql' word in the reddit top posts:"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\nFROM [bigquery-samples:reddit.words]\nWHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\nGROUP EACH BY 1, 2\nHAVING word = 'sql'\nORDER BY 2 DESC\nLIMIT 30\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r4dd564868c41fdda_0000014161f03b3c_18 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\nword    date          c\n------  ----------  ---\nsql     2013-08-20    3\nsql     2013-08-19    1\nsql     2013-08-17    1\nsql     2013-08-16    3\nsql     2013-08-15    1\nsql     2013-08-14    4\nsql     2013-08-13    1\nsql     2013-08-12    3\nsql     2013-08-11    1\nsql     2013-08-09    4\nsql     2013-08-08    2\nsql     2013-08-07    1\nsql     2013-08-06    3\nsql     2013-08-05    1\nsql     2013-08-04    2\nsql     2013-07-31    4\nsql     2013-07-30    2\nsql     2013-07-29    2\nsql     2013-07-28    1\nsql     2013-07-25    1\nsql     2013-07-23    1\nsql     2013-07-19    2\nsql     2013-07-18    1\nsql     2013-07-16    3\nsql     2013-07-14    1\nsql     2013-07-13    3\nsql     2013-07-12    2\nsql     2013-07-11    3\nsql     2013-07-10    3\nsql     2013-07-09    1"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Now let's try something fun: Can we find what other word has behaved like the word 'sql' within the top posts?\n\nFirst, we'll join this table to itself:"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT a.word, b.word, a.date, a.c, b.c \nFROM (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n HAVING word = 'sql'\n) a\nJOIN (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n) b\nON a.date=b.date\nLIMIT 30\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (0s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (1s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (2s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (3s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (4s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (5s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (6s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (8s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (9s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (10s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (11s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (12s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (13s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (14s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (15s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (16s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (17s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (18s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (19s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (20s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (21s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (22s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (23s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (24s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (25s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (26s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (27s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (28s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (29s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (30s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (31s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (32s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (33s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (34s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (35s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (36s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (37s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (38s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (39s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (40s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (41s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (42s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (43s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (44s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (45s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (46s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (47s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (48s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (49s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (50s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (51s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (52s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (53s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (54s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (55s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (56s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (57s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (58s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (59s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (60s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (61s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (62s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (63s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (64s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (65s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (66s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (67s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (68s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (69s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (70s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (71s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (72s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (73s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (74s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (75s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (76s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (77s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (78s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (79s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (80s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (81s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (82s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (83s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (84s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (85s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (87s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (88s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (89s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (90s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (91s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (92s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (93s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (94s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (95s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (96s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (97s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (98s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (99s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (100s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (101s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (102s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (103s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (104s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (105s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (106s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (107s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (108s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (109s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (110s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (111s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (112s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (113s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (114s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (115s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (116s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (117s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (118s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (119s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (120s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (121s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (122s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (123s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (124s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (125s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (126s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (127s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (128s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (129s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (130s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (131s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (132s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (133s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (134s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (135s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (136s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (137s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (138s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (139s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (140s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (141s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (142s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (143s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (144s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (145s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (146s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (147s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (148s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (149s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (150s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (151s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (152s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (153s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (154s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (155s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (156s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (157s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (158s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (159s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (160s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (161s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (162s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (163s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (164s) Current status: RUNNING"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": " \rWaiting on bqjob_r5b325e72a15a0bfd_0000014161f0448d_19 ... (164s) Current status: DONE   "
      },
      {
       "ename": "BigqueryInvalidQueryError",
       "evalue": "Error processing job 'data-sensing-lab:bqjob_r5b325e72a15a0bfd_0000014161f0448d_19': Table too large for JOIN. Consider using JOIN EACH. For more details, https://developers.google.com/bigquery/docs/query-reference#joins",
       "output_type": "pyerr",
       "traceback": [
        "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mBigqueryInvalidQueryError\u001b[0m                 Traceback (most recent call last)",
        "\u001b[1;32m<ipython-input-20-729781de8f2e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m     17\u001b[0m \u001b[0mLIMIT\u001b[0m \u001b[1;36m30\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     18\u001b[0m \"\"\"\n\u001b[1;32m---> 19\u001b[1;33m \u001b[0mfields\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mReadSchemaAndRows\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mclient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mQuery\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'configuration'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'destinationTable'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmax_rows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1000\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     20\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mtabulate\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtabulate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'name'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mfields\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mQuery\u001b[1;34m(self, query, destination_table, create_disposition, write_disposition, priority, preserve_nulls, allow_large_results, dry_run, use_cache, **kwds)\u001b[0m\n\u001b[0;32m   1678\u001b[0m     \u001b[0mrequest\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;34m'query'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mquery_config\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1679\u001b[0m     \u001b[0m_ApplyParameters\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdry_run\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdry_run\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1680\u001b[1;33m     \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mExecuteJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1681\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1682\u001b[0m   def Load(self, destination_table_reference, source,\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mExecuteJob\u001b[1;34m(self, configuration, sync, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m   1450\u001b[0m       job = self.RunJobSynchronously(\n\u001b[0;32m   1451\u001b[0m           \u001b[0mconfiguration\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mproject_id\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mproject_id\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mupload_file\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mupload_file\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1452\u001b[1;33m           job_id=job_id)\n\u001b[0m\u001b[0;32m   1453\u001b[0m     \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1454\u001b[0m       job = self.StartJob(\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRunJobSynchronously\u001b[1;34m(self, configuration, project_id, upload_file, job_id)\u001b[0m\n\u001b[0;32m   1439\u001b[0m       \u001b[0mjob_reference\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mBigqueryClient\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mConstructObjectReference\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1440\u001b[0m       \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mWaitJob\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mjob_reference\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1441\u001b[1;33m     \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mRaiseIfJobError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1442\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1443\u001b[0m   def ExecuteJob(self, configuration, sync=None,\n",
        "\u001b[1;32m/usr/local/lib/python2.7/dist-packages/bigquery-2.0.14-py2.7.egg/bigquery_client.pyc\u001b[0m in \u001b[0;36mRaiseIfJobError\u001b[1;34m(job)\u001b[0m\n\u001b[0;32m    816\u001b[0m       raise BigqueryError.Create(\n\u001b[0;32m    817\u001b[0m           \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merror_ls\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 818\u001b[1;33m           job_ref=BigqueryClient.ConstructObjectReference(job))\n\u001b[0m\u001b[0;32m    819\u001b[0m     \u001b[1;32mreturn\u001b[0m \u001b[0mjob\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
        "\u001b[1;31mBigqueryInvalidQueryError\u001b[0m: Error processing job 'data-sensing-lab:bqjob_r5b325e72a15a0bfd_0000014161f0448d_19': Table too large for JOIN. Consider using JOIN EACH. For more details, https://developers.google.com/bigquery/docs/query-reference#joins"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "That didn't work. We can only JOIN if the second table is small enough. Luckily, we can use JOIN EACH for arbitrarily big tables.\n"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = \"\"\"\nSELECT a.word, b.word, a.date, a.c, b.c \nFROM (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n HAVING word = 'sql'\n LIMIT 10\n) a\nJOIN EACH (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, DATE(created_utc) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n) b\nON a.date=b.date\nWHERE a.word != b.word\nAND b.word > 'a'\nORDER BY 2, 3 DESC\nLIMIT 30\n\"\"\"\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r668abdc1c6cb4bc1_0000014161f35c27_20 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word     a_date        a_c    b_c\n--------  ---------  ----------  -----  -----\nsql       a-18       2013-06-11      2      1\nsql       a-basin    2012-07-03      2      1\nsql       a-listers  2012-11-15      2      1\nsql       a-rod      2012-10-08      3      1\nsql       a-z        2013-06-11      2      1\nsql       a10        2013-02-04      2      1\nsql       a1c        2013-06-11      2      1\nsql       a1c        2012-11-15      2      1\nsql       a6m5       2012-11-15      2      1\nsql       aaa        2013-02-04      2      1\nsql       aaa        2011-12-27      1      1\nsql       aaaand     2012-07-03      2      1\nsql       aaaawwww   2012-11-15      2      1\nsql       aaah       2012-10-08      3      1\nsql       aaahh      2013-06-11      2      1\nsql       aalegends  2012-10-08      3      2\nsql       aan        2013-06-11      2      1\nsql       aapa       2012-10-08      3      1\nsql       aaron      2013-06-11      2      1\nsql       aaron      2013-02-04      2      1\nsql       aaron      2012-10-08      3      1\nsql       aarseth    2012-03-26      1      1\nsql       ab-soul    2013-02-04      2      1\nsql       ab-soul    2012-11-15      2      1\nsql       abandon    2013-06-11      2      1\nsql       abandon    2013-02-04      2      1\nsql       abandon    2012-10-08      3      1\nsql       abandoned  2013-06-11      2      2\nsql       abandoned  2013-02-04      2      8\nsql       abandoned  2012-11-15      2      6"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "You might be wondering what I'm doing here: Creating timelines of how many times a word has been mentioned, and putting their count side by side with a JOIN. Once I have that, I can look for correlations: Words that have behaved in the same way."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "base_query =  \"\"\"\nSELECT a.word, b.word, CORR(a.c, b.c) corr, COUNT(*) c\nFROM (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, UTC_USEC_TO_MONTH(TIMESTAMP_TO_USEC(created_utc)) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") = '%s'\n GROUP EACH BY 1, 2\n) a\nJOIN EACH (\n SELECT REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") word, UTC_USEC_TO_MONTH(TIMESTAMP_TO_USEC(created_utc)) date, COUNT(*) c\n FROM [bigquery-samples:reddit.words]\n WHERE REGEXP_EXTRACT(LOWER(title), \"([a-z0-9][a-z-0-9]+[';-]?[a-z0-9]+)\") != ''\n GROUP EACH BY 1, 2\n) b\nON a.date=b.date\nWHERE a.word != b.word\nGROUP BY 1, 2\nHAVING c > 30\nORDER BY 3 DESC\nLIMIT 5\n\"\"\"\nquery = base_query % 'sql'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r5b346ff48c1cfff4_0000014161f366c2_21 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word        corr    c\n--------  --------  --------  ---\nsql       server    0.962023   53\nsql       david     0.959003   56\nsql       policy    0.956993   55\nsql       reality   0.95693    53\nsql       across    0.95687    56"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'religion'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r2ee255a470a0edbb_0000014161f36d12_22 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word        corr    c\n--------  --------  --------  ---\nreligion  brain     0.978323   61\nreligion  maps      0.972462   50\nreligion  network   0.970603   58\nreligion  society   0.969936   56\nreligion  human     0.969745   61"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'holidays'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r56e55315cff890ef_0000014161f37461_23 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word         corr    c\n--------  ---------  --------  ---\nholidays  christmas  0.994448   31\nholidays  santa      0.990515   33\nholidays  holiday    0.968293   33\nholidays  gifts      0.942386   31\nholidays  presents   0.859437   31"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'pizza'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r19f4aff7d827e811_0000014161f37b6d_24 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word        corr    c\n--------  --------  --------  ---\npizza     great     0.988512   48\npizza     office    0.986625   47\npizza     food      0.986477   48\npizza     making    0.98625    48\npizza     better    0.986143   47"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'crazy'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r6c83abe5755ac7dc_0000014161f38240_25 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word        corr    c\n--------  --------  --------  ---\ncrazy     coming    0.995498   62\ncrazy     girl      0.995198   62\ncrazy     used      0.995093   60\ncrazy     john      0.99475    61\ncrazy     fire      0.994627   57"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'election'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r1c170cbe5d0407c2_0000014161f38800_26 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word         corr    c\n--------  ---------  --------  ---\nelection  gaza       0.937955   40\nelection  polling    0.912526   34\nelection  nov        0.911871   37\nelection  palestine  0.907705   31\nelection  november   0.896156   46"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'homework'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r1f297722592a09b0_0000014161f38d5f_27 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word        corr    c\n--------  --------  --------  ---\nhomework  dorm      0.917931   31\nhomework  lecture   0.905694   40\nhomework  class     0.904165   41\nhomework  college   0.903107   41\nhomework  campus    0.898893   36"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'neighbor'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r404b8034f90b36b_0000014161f3932f_28 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word    b_word         corr    c\n--------  ---------  --------  ---\nneighbor  dog        0.987744   40\nneighbor  apartment  0.985745   37\nneighbor  town       0.984668   41\nneighbor  tried      0.98371    41\nneighbor  edge       0.983477   39"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "query = base_query % 'volleyball'\nfields, data = client.ReadSchemaAndRows(client.Query(query)['configuration']['query']['destinationTable'], max_rows = 1000)\nprint tabulate.tabulate(data, headers=[x['name'] for x in fields])",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\rWaiting on bqjob_r2847fa7a71dd67b4_0000014161f398db_29 ... (0s) Current status: DONE   "
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\na_word      b_word        corr    c\n----------  --------  --------  ---\nvolleyball  olympic   0.843229   31\nvolleyball  olympics  0.835825   33\nvolleyball  athletes  0.805992   33\nvolleyball  rises     0.803989   32\nvolleyball  ceremony  0.770325   33"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Note that by the nature of this dataset and looking for correlations by month by month, the best results come up when trying seasonal words that repeat year after year. "
    }
   ],
   "metadata": {}
  }
 ]
}