jiankuang/ComputeEngine.md

## bigquery.md

      
    Raw
  

              bigquery.md
            
          
    Concepts

Tables


Native tables
External tables
Views

BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried.
Views can only reference other tables and views with the same Dataset location.


Storage

BigQuery stores data in the Capacitor columnar data format.
Slots

Under the hood, analytics throughput is measured in BigQuery slots. A BigQuery slot is a unit of computational capacity required to execute SQL queries. BigQuery automatically calculates how many slots are required by each query, depending on query size and complexity.

To check how many slots your account uses, see Monitoring BigQuery Using Stackdriver.
Loading Data

Preparing Data for Loading

Loading denormalized data

In BigQuery, you typically should denormalize the data structure in order to enable superfast querying. While JOINs on small datasets are possible with BigQuery, they're not as performant as a denormalized structure.
Transforming XML data

When dealing with large XML files, it's important to not use a DOM-based parser on the XML file, as the parser will attempt to load the entire file as an in-memory tree.

Instead, we suggest using a pull-based parser or a SAX parser, which are more efficient. Parsing a large XML file is simple when using Python.
Import Data from Avro file

Schema automatically generated for Avro files
Querying Data

Legacy SQL

Legacy SQL & Standard SQL

To enable standard SQL: Click Compose Query -> Click Show Options -> Uncheck the Use Legacy SQL checkbox.
Data Types

STRING, BYTES, INTEGER, FLOAT, BOOLEAN, RECORD, TIMESTAMP, DATE, TIME, DATETIME
TOP() examples

TOP() is equal to GROUP BY...ORDER BY...LIMIT, but the TOP query will execute much faster.

with TOP():

SELECT
  TOP(word, 10), COUNT(*)
FROM
  ds.Table
WHERE
  word contains 'th';


without TOP():

SELECT
  word, COUNT(*) AS cnt
FROM
  ds.Table
WHERE
  word CONTAINS 'th'
GROUP BY
  word
ORDER BY
  cnt DESC LIMIT 10;

Table Decorators

Snapshot decorators

You can use snapshot decorators to undelete a table within 2 days of table deletion.
bq Command-Line Tool

bq load --autodetect <destination_table> <data_source_uri>

Enable autodetection of schema and options for formats—such as CSV and JSON—that are not self-describing. The default value is --noautodetect.

It can even automatically convert 'yes|no' to true|false

eg: bq load --autodetect healthcare.business_rules BusinessRules.csv
The relationship between BigQuery and Google Analytics


Google Analytics Premium can export website data into BigQuery
Not sure if Google Analytics can import data from BigQuery


## bigtable.md

      
    Raw
  

              bigtable.md
            
          
    Cloud Bigtable and other storage options

Google Cloud SQL :  full SQL support for an online transaction processing (OLTP) system (not our use case)

Google BigQuery : interactive querying in an online analytical processing (OLAP) system (maybe our use case)

Google Cloud Storage : store immutable blobs larger than 10 MB, such as large images or movies (not our use case)

Cloud Datastore : store highly structured objects, or if you require support for ACID transactions and SQL-like queries (not sure)

  
## ComputeEngine.md

      
    Raw
  

              ComputeEngine.md
            
          
    SSH

gcloud compute ssh jian_kuang@compute-engine-instance-with-bigtable-access

Copy

gcloud compute copy-files Anaconda2-4.2.0-Linux-x86_64.sh jian_kuang@genia-datalab-compute:Anaconda2-4.2.0-Linux-x86_64.sh --zone us-central1-b

  
## ContainerEngine.md

      
    Raw
  

              ContainerEngine.md
            
          
    Create three nodes container engine:

gcloud container clusters create guestbook --num-nodes 3

Deploy NGINX in Kubernetes
kubectl run nginx --image=nginx --replicas=3

kubectl get pods -owide

expose the NGINX cluster as an external service:
kubectl expose deployment nginx --port=80 --target-port=80 --type=LoadBalancer

kubectl get service nginx