Skip to content

Instantly share code, notes, and snippets.

@leoricklin
Last active May 15, 2024 06:12
Show Gist options
  • Save leoricklin/69748c3f923c2be89ffe47e7b1ac8309 to your computer and use it in GitHub Desktop.
Save leoricklin/69748c3f923c2be89ffe47e7b1ac8309 to your computer and use it in GitHub Desktop.
20221109

2.1.Industry solutions

2.2.Application modernization

2.3.Artificial intelligence

2.4.APIs and applications

2.5.Databases

2.6.Data cloud

2.7 Digital transformation

2.8 Infrastructure modernization

2.9 Productivity and collaboration

2.10 Security

Anomaly detection

Data monetization

Environmental, social, and governance

General analytics

Health care and life sciences

Log analytics

Pattern recognition

Predictive forecasting

Real-time clickstream analytics

Time series analytics

Working with data lakes

2.12.Startups and SMB

2.13.Featured partner solutions

3.Products

The official set of icons to build architectural diagrams of Google Cloud Platform

  • GPUs on Compute Engine
  • GPUs on Google Kubernetes Engine
  • Attaching GPUs to Dataproc clusters

Getting started

Big Data and analysis

Machine learning

Architectural diagrams

7.Event,

Tune in every week to hear from our experts and partners who will help you discover how to get the most from Google Cloud.

Cloud OnBoard is a free online instructor-led training program that enables developers and IT professionals to expand their skill set into the cloud. Google Cloud Platform (GCP) Fundamentals Series brings the Google Cloud Community together for three consecutive days of interactive learning and hands-on labs. Choose one, two, or all three online half-day programs and take your skills to new heights:

  • Core Infrastructure
  • Big Data & Machine Learning
  • Application Development with GCP

8.Resources

  • Google Cloud Tech Youtube
  • Google Cloud Tech at Google I/O 2021, https://youtube.com/playlist?list=PLIivdWyY5sqL-MezjPxyWdB5L7iLNroUM
    • Developer Keynote (Google I/O '21) - American Sign Language
    • Go full-stack with Kotlin or Dart on Google Cloud | Session
    • Serverless demo derby | Session
    • AI/ML demo derby | Session
    • Build end-to-end solutions with Vertex AI | Session
      • Introducing Vertex AI
      • Streamlined model development
      • Productionizing models with pipelines
      • Monitoring deployed models
    • AI in the Cloud | Q&A
    • Secure and reliable development with Go | Q&A
    • Build voice bots for mobile with Dialogflow and Flutter | Workshop
    • Fuel your custom models on the Cloud AI Platform | Demo
    • ML Ops on Google Cloud | Demo
    • Strike a pose: Training a vision model on the AI Platform | Demo

9.Google Code,

Google Developers Codelabs provide a guided, tutorial, hands-on coding experience. Most codelabs will step you through the process of building a small application, or adding a new feature to an existing application. They cover a wide range of topics such as Android Wear, Google Compute Engine, Project Tango, and Google APIs on iOS.

是一个免费的 Jupyter 笔记本环境,不需要进行任何设置就可以使用,并且完全在云端运行。借助 Colaboratory,您可以编写和执行代码、保存和共享分析结果,以及利用强大的计算资源,所有这些都可通过浏览器免费使用。

The framework was created by seasoned experts at Google Cloud, including customer engineers, solution architects, cloud reliability engineers, and members of the professional service organization. It consists of the following series of articles:

  • Overview (this article)
  • Google Cloud system design considerations
  • Operational excellence
  • Security, privacy, and compliance
  • Reliability
  • Performance and cost optimization

Each principle section provides details on strategies, best practices, design questions, recommendations, key Google Cloud services, and links to resources.

  • Use recommended tools and products

    The following table lists recommended tools and products for each phase of the ML workflow as outlined in this document:

Machine learning workflow step Recommended tools and products
ML environment setup Notebooks, Vertex SDK for Python
ML development BigQuery, Cloud Storage, Notebooks, Vertex Data Labeling, Vertex Explainable AI, Vertex Feature Store, Vertex TensorBoard, Vertex Training
Data processing BigQuery, Dataflow, Dataproc, Managed datasets, TensorFlow Extended
Operationalized training Cloud Storage, PyTorch, TensorFlow Core, Vertex Feature Store, Vertex Pipelines, Vertex Training
Model deployment and serving Vertex Prediction, ML workflow orchestration, Kubeflow Pipelines, TensorFlow Extended, Vertex Pipelines
Artifact organization Artifact Registry
Model monitoring Vertex Explainable AI, Vertex Model Monitoring
  • MLOps

    • CI/CD pipeline compared to CT pipeline The availability of new data is one trigger to retrain the ML model. The availability of a new implementation of the ML pipeline (including new model architecture, feature engineering, and hyperparameters) is another important trigger to re-execute the ML pipeline. This new implementation of the ML pipeline serves as a new version of the model prediction service, for example, a microservice with a REST API for online serving. The difference between the two cases is as follows:

      • To train a new ML model with new data, the previously deployed CT pipeline is executed. No new pipelines or components are deployed; only a new prediction service or newly trained model is served at the end of the pipeline.

      • To train a new ML model with a new implementation, a new pipeline is deployed through a CI/CD pipeline. The following diagram shows the relationship between the CI/CD pipeline and the ML CT pipeline.

        Figure 1. CI/CD and ML CT pipelines.

    • Designing a TFX-based ML system

    • Orchestrating the ML system using Kubeflow Pipelines

    • Setting up CI/CD for ML on Google Cloud

      Figure 6: High-level overview of CI/CD for Kubeflow pipelines.

An ML model is useful only if it's deployed and ready to make predictions, but building an adapted ML serving system requires the following:

  • Knowing whether you need to provide predictions in real time or offline.
  • Balancing between model predictive performance and prediction latency.
  • Managing the input features required by the model in a low read-latency lookup store.
  • Knowing whether at least some of the predictions can be precomputed offline to be served online.

The article addresses these points in detail. It assumes that you're familiar with BigQuery, Dataflow, Cloud Storage, and AI Platform.

  • MLOps level 0: Manual process

  • MLOps level 1: ML pipeline automation

  • MLOps level 2: CI/CD pipeline automation

Figure 5. Stages of the CI/CD automated ML pipeline.

The service mesh control plane enables the proxies to perform the following functions:

  • Service discovery
  • Service routing
  • Load balancing
  • Authentication and authorization
  • Observability

Reference architecture

This section describes a conceptual architecture for building a passive data lineage system for a SQL-like data warehouse. You can implement the architecture in different ways.

Migrating data warehouses to BigQuery

The following diagram shows a high-level legacy architecture before the migration. It illustrates the catalog of available data sources, legacy data pipelines, legacy operational pipelines and feedback loops, and legacy BI reports and dashboards that are accessed by your end users.

<img src="https://cloud.google.com/architecture/dw2bq/images/dw-bq-migration-overview-architecture-before-migration.svg" style="display:block;margin-left:0;width:800px;"/>

During migration, you run both your legacy data warehouse and BigQuery, as detailed in this document. The reference architecture in the following diagram highlights that both data warehouses offer similar functionality and paths—both can ingest from the source systems, integrate with the business applications, and provide the required user access. Importantly, the diagram also highlights that data is synchronized from your data warehouse to BigQuery. This allows use cases to be offloaded during the entire duration of the migration effort.

<img src="https://cloud.google.com/architecture/dw2bq/images/dw-bq-migration-overview-architecture-during-migration.svg" style="display:block;margin-left:0;width:800px;"/>
  • Schema and data transfer overview
  • Data governance
  • Data pipelines
  • Reporting and analysis
  • Performance optimization

2.Resourcs

1.TensorFlow Developer Certificate

2.Google Cloud Certified - Professional Data Engineer

3.Cloud Architect

The Professional Cloud Architect certification exam assesses your ability to:

  • Design and plan a cloud solution architecture
  • Manage and provision the cloud solution infrastructure
  • Design for security and compliance
  • Analyze and optimize technical and business processes
  • Manage implementations of cloud architecture
  • Ensure solution and operations reliability

1.Designing and planning a cloud solution architecture

1.1.Designing a solution infrastructure that meets business requirements. Considerations include:

First Second
Business use cases and product strategy -
Cost optimization -
Supporting the application design -
Integration with external systems -
Movement of data -
Design decision trade-offs -
Build, buy, modify, or deprecate -
Success measurements (e.g., key performance indicators [KPI], return on investment [ROI], metrics) -
Compliance and observability -

1.2.Designing a solution infrastructure that meets technical requirements. Considerations include:

First Second
High availability and failover design -
Elasticity of cloud resources with respect to quotas and limits -
Scalability to meet growth requirements -
Performance and latency -

1.3.Designing network, storage, and compute resources. Considerations include:

First Second
Integration with on-premises/multi-cloud environments -
Cloud-native networking (VPC, peering, firewalls, container networking) -
Choosing data processing technologies -
Choosing appropriate storage types (e.g., object, file, databases) -
Choosing compute resources (e.g., preemptible, custom machine type, specialized workload) -
Mapping compute needs to platform products -

1.4.Creating a migration plan (i.e., documents and architectural diagrams). Considerations include:

First Second
Integrating solutions with existing systems -
Migrating systems and data to support the solution -
Software license mapping -
Network planning -
Testing and proofs of concept -
Dependency management planning -

1.5.Envisioning future solution improvements. Considerations include:

First Second
Cloud and technology improvements -
Evolution of business needs -
Evangelism and advocacy -

2.Managing and provisioning a solution infrastructure

2.1.Configuring network topologies. Considerations include:

First Second
Extending to on-premises environments (hybrid networking) -
Extending to a multi-cloud environment that may include Google Cloud to Google Cloud communication -
Security protection (e.g. intrusion protection, access control, firewalls) -

2.2.Configuring individual storage systems. Considerations include:

First Second
Data storage allocation -
Data processing/compute provisioning -
Security and access management -
Network configuration for data transfer and latency -
Data retention and data life cycle management -
Data growth planning -

2.3.Configuring compute systems. Considerations include:

First Second
Compute resource provisioning -
Compute volatility configuration (preemptible vs. standard) -
Network configuration for compute resources (Google Compute Engine, Google Kubernetes Engine, serverless networking) -
Infrastructure orchestration, resource configuration, and patch management -
Container orchestration -

3.Designing for security and compliance

3.1.Designing for security. Considerations include:

First Second
Identity and access management (IAM) -
Resource hierarchy (organizations, folders, projects) -
Data security (key management, encryption, secret management) -
Separation of duties (SoD) -
Security controls (e.g., auditing, VPC Service Controls, context aware access, organization policy) -
Managing customer-managed encryption keys with Cloud Key Management Service -
Remote access -

3.2.Designing for compliance. Considerations include:

First Second
Legislation (e.g., health record privacy, children’s privacy, data privacy, and ownership) -
Commercial (e.g., sensitive data such as credit card information handling, personally identifiable information [PII]) -
Industry certifications (e.g., SOC 2) -
Audits (including logs) -

4.Analyzing and optimizing technical and business processes

4.1.Analyzing and defining technical processes. Considerations include:

First Second
Software development life cycle (SDLC) -
Continuous integration / continuous deployment -
Troubleshooting / root cause analysis best practices -
Testing and validation of software and infrastructure -
Service catalog and provisioning -
Business continuity and disaster recovery -

4.2.Analyzing and defining business processes. Considerations include:

First Second
Stakeholder management (e.g. influencing and facilitation) -
Change management -
Team assessment / skills readiness -
Decision-making processes -
Customer success management -
Cost optimization / resource optimization (capex / opex) -

4.3.Developing procedures to ensure reliability of solutions in production (e.g., chaos engineering, penetration testing)

5.Managing implementation

5.1.Advising development/operation team(s) to ensure successful deployment of the solution. Considerations include:

First Second
Application development -
API best practices -
Testing frameworks (load/unit/integration) -
Data and system migration and management tooling -

5.2.Interacting with Google Cloud programmatically. Considerations include:

First Second
Google Cloud Shell -
Google Cloud SDK (gcloud, gsutil and bq) -
Cloud Emulators (e.g. Cloud Bigtable, Datastore, Spanner, Pub/Sub, Firestore) -

6.Ensuring solution and operations reliability

6.1.Monitoring/logging/profiling/alerting solution 6.2.Deployment and release management 6.3.Assisting with the support of deployed solutions 6.4.Evaluating quality control measures

Resources

Coursera

Google Cloud Platform Big Data and Machine Learning Fundamentals
Modernizing Data Lakes and Data Warehouses with GCP
Building Batch Data Pipelines on GCP
Building Resilient Streaming Analytics Systems on GCP
Smart Analytics, Machine Learning, and AI on GCP
Preparing for the Google Cloud Professional Data Engineer Exam
1. Google Cloud Platform Fundamentals: Core Infrastructure
  • 01.Introducing Google Cloud Platform
  • 02.Getting Started with Google Cloud Platform
  • 03.Virtual Machines in the Cloud
  • 04.Storage in the Cloud
  • 05.Containers in the Cloud
  • 06.Applications in the Cloud
  • 07.Developing, Deploying and Monitoring in the Cloud
  • 08.Big Data and Machine Learning in the Cloud
2. Essential Google Cloud Infrastructure: Foundation
  • 01.Introduction to GCP
  • 02.Virtual Networks
  • 03.Virtual Machines
3. Essential Google Cloud Infrastructure: Core Services
  • 01.Cloud IAM
  • 02.Storage and Database Services
  • 03.Resource Management
  • 04.Resource Monitoring
4. Elastic Google Cloud Infrastructure: Scaling and Automation
  • 01.Interconnecting Networks
  • 02.Load Balancing and Autoscaling
  • 03.Infrastructure Automation
  • 04.Managed Services
5. Reliable Google Cloud Infrastructure: Design and Process
  • 00.Introduction: Architecting systems is a matter of weighing the pros and cons of various solutions and trying to find the best solution given your requirements and constraints.
  • 01.Defining Services
  • 02.Microservice Design and Architecture
  • 03.DevOps Automation
  • 04.Choosing Storage Solutions
  • 05.Google Cloud and Hybrid Network Architecture
  • 06.Deploying Applications to Google Cloud
  • 07.Designing Reliable Systems
  • 08.Security
  • 09.Maintenance and Monitoring
6. Architecting with Google Kubernetes Engine: Foundations
  • 01.Introduction to Google Cloud
  • 02.Introduction to Containers and Kubernetes
  • 03.Kubernetes Architecture
7. Preparing for the Google Cloud Professional Cloud Architect Exam
  • 01.Welcome to Preparing for the Professional Cloud Architect Exam
  • 02.Sample Case Studies
  • 03.Designing and Implementing
  • 04.Optimizing and Operating
  • 05.Resources and next steps

6.Training,

Choose your path, build your skills, and validate your knowledge. All in one place. Register here before November 6th to claim your one month free training offer.

Choose from end-to-end training created by the Google Developers Training team, materials and tutorials for self-study, online courses and Nanodegrees through Udacity, and more. And when you're ready, you can take a Google Developers Certification exam to gain recognition for your development skills.

ML Concepts
  • Introduction to ML (3 min)
  • Framing (15 min)
  • Descending into ML (20 min)
  • Reducing Loss (60 min)
  • First Steps with TF (65 min)
  • Generalization (15 min)
  • Training and Test Sets (25 min)
  • Validation Set (35 min)
  • Representation (35 min)
  • Feature Crosses (70 min)
  • Regularization: Simplicity (40 min)
  • Logistic Regression (20 min)
  • Classification (90 min)
  • Regularization: Sparsity (20 min)
  • Neural Networks (65 min)
  • Training Neural Nets (10 min)
  • Multi-Class Neural Nets (45 min)
  • Embeddings (50 min)
ML Engineering
  • Production ML Systems (3 min)
  • Static vs. Dynamic Training (7 min)
  • Static vs. Dynamic Inference (7 min)
  • \Data Dependencies (14 min)
  • Fairness (70 min)
ML Systems in the Real World
  • Cancer Prediction (5 min)
  • Literature (5 min)
  • Guidelines (2 min)

AI and ML

App modernization

Cloud basics

Data analytics

What is Apache Hadoop?

Learn the basics of Apache Hadoop, including what it is, how it’s used, and what advantages it brings to big data environments.

What is Apache Kafka?

Learn about Apache Kafka, a platform for collecting, processing, and storing streaming data.

What is Apache Spark?

Learn about Apache Spark, an analytics engine for large-scale data processing.

What is big data?

Learn about big data with an overview, characteristics, and examples.

What is business intelligence?

Learn about Business intelligence (BI), the process of analyzing company data to improve operations.

Data governance defined

Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes the actions people must take, the processes they must follow, and the technology that supports them throughout the data life cycle.

What is data integration?

Learn about data integration—the process of unifying data from different sources into a more useful view.

What is a data lake?

Learn how data lakes store, process, and secure large amounts of data.

What is a data warehouse?

Learn about data warehouses (DW), which are systems for data analysis and reporting.

What is ETL?

Learn how ETL lets companies convert structured and unstructured data to drive business decisions.

What is predictive analytics?

Learn how predictive analytics uses data, statistics, modeling, and machine learning to help predict and plan for future events, or find opportunities.

What is Presto?

Learn how Presto, an open source distributed SQL query engine created by Facebook developers, runs interactive analytics against large volumes of data.

What is streaming analytics?

Learn about streaming analytics, which processes and analyzes data from sources that continuously send data.

What is time series?

Learn how to model historical time-series data in order to make predictions about future time points and common use cases.

Databases

Infrastructure

Security

Storage

Take the next step

Day 1 | Google Cloud Fundamentals and Data Cloud & Analytics, 3月28日

Welcome & Introduction, Russell Nash
Get started with Google Cloud, Vamsi Ramakrishnan
The new dynamics of Infrastructure - Workload Optimisation, Gustavo Fuchs
Customer Spotlight - GoPay's transformation with Google Cloud, Gaurav Anand, Giovanni Sakti Nugraha
Where should I run my stuff, Shikha Saxena
Google Developer Expert Panel - Top tips for building in the Cloud, Thu Ya Kyaw, Rendy B Junior, Rushabh Vasa, Jean-Klaas Gunnink
Developer Community Session - My experience as a woman in tech, Chanel Greco
Quiz Results & Break, Priyanka Vergadia, Russell Nash
Unleashing the Power of Cloud Analytics with Looker - From Data to Insights, Sudipto Guha
5 Reasons to use BigQuery as the Heart of your Data Analytics Platform, Russell Nash
Spotlight on BigQuery - Featuring Inshorts & MongoDB, Russell Nash, Rohan Walia, Saurabh Jain
Event-driven Architecture and Analytics, Prasanna Keny
Build HTAP and AI-powered applications with AlloyDB, Subhash Guddad
Ask the Experts, Priyanka Vergadia, Russell Nash, Sudipto Guha, Subhash Guddad
Quiz Results & Close, Priyanka Vergadia, Russell Nash

Day 2 | Applied Machine Learning / AI and Application Modernisation, 3月29日

Welcome & Introduction, Priyanka Vergadia, Russell Nash
Generative AI without a Phd, Erwin Huizenga
AI-UX - Transforming the user experience with Vertex AI and Discovery, Kaz Sato
Spotlight on Vertex AI - featuring Neo4j, Ezhil Vendhan
Computer Vision with Google Cloud, Priyanka Vergadia
Spotlight on Vision AI - Featuring 99.co, Ye Lin Aung
Certifications & Learning Paths - Featuring AirAsia & Tata Consultancy Services (TCS), Priyanka Vergadia, Erwin Huizenga, Pablo Sanz Salcedo, Mukul Sharma
Quiz Results & Break, Priyanka Vergadia, Russell Nash
Secure Software Supply Chain - Shift Left Security like Google with SLSA and Software Delivery Shield, Andrew Haschka
The Low-Ops way to reliably and scalably running your containers, Ashmita Kapoor
Spotlight on GKE - Featuring Mahindra Group, Rajesh Shewani, Abhishek Sukhwal
Don't leave your Database behind in your cloud journey, Anant Damle
Increase Productivity and Efficiency with low-code application integrations, Meng-Wai Tan, Aditya MP
Introducing the Cloud Hero Challenge, Russell Nash
Quiz Results & Close, Priyanka Vergadia, Russell Nash

Build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified AI platform.

  • Build with the groundbreaking ML tools that power Google, developed by Google Research
  • Deploy more models, faster, with 80% fewer lines code required for custom modeling
  • Use MLOps tools to easily manage your data and models with confidence and repeat at scale
  • USE CASES: Explore common ways to take advantage of Vertex AI
    • Data readiness

1.1.Guides

  • Overview
  • Training and tutorials
  • Use cases
  • Code samples

Get started

Prepare data and manage datasets

Train AutoML models

Create a model using custom training

Import models

Configure models and get predictions

Configure custom-trained models

Vertex AI provides Docker container images that you run as pre-built containers for serving predictions and explanations from trained model artifacts. These containers, which are organized by machine learning (ML) framework and framework version, provide HTTP prediction servers that you can use to serve predictions with minimal configuration. In many cases, using a pre-built container is simpler than creating your own custom container for prediction.

Available container images

  • TensorFlow
  • scikit-learn
  • XGBoost

Deploy a model

Get predictions

To get batch predictions from a custom-trained model, prepare your input data in one of the following ways:

  • JSON Lines
  • TFRecord
  • CSV

File list:

Create a text file where each row is the Cloud Storage URI to a file. Vertex AI reads each URI as binary, then base64-encodes it and sends it in a JSON instance to the container that serves your model's predictions.

If you plan to use the Google Cloud Console to get batch predictions, paste your file list directly into the Cloud Console. Otherwise save your file list in a Cloud Storage bucket.

Track model quality using Vertex AI Model Monitoring

Orchestrate your ML workflow using Vertex AI pipelines

  • Understanding ML pipelines

    Key Point:

    • Pipelines allow you to automate, monitor, and experiment with interdependent parts of a ML workflow.
    • ML Pipelines are portable, scalable, and based on containers.
    • Each individual part of your pipeline workflow (for example, creating a dataset or training a model) is defined by code. This code is referred to as a component. Each instance of a component is called a step.

    You can use Vertex Pipelines to run pipelines that were built using the Kubeflow Pipelines SDK or TensorFlow Extended. Learn more about choosing between the Kubeflow Pipelines SDK and TFX.

  • Which pipelines SDK should I use?

    Vertex Pipelines can run pipelines built using the Kubeflow Pipelines SDK v1.6 or higher, or TensorFlow Extended v0.30.0 or higher.

    • If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, we recommend that you build your pipeline using TFX.
    • For other use cases, we recommend that you build your pipeline using the Kubeflow Pipelines SDK. By building a pipeline with the Kubeflow Pipelines SDK, you can implement your workflow by building custom components or reusing prebuilt components, such as the Google Cloud pipeline components. Google Cloud pipeline components make it easier to use Vertex AI services like AutoML in your pipeline.

Google Cloud Pipeline Components

Google Cloud Pipeline Components (GCPC) are available through the Google Cloud Pipeline Components SDK provides a set of prebuilt components that are production quality, consistent, performant, and easy to use in Vertex AI Pipelines. You can use these components to perform ML tasks. For example, you can use components to complete the following:

  • Create a new dataset and load different data types into the dataset (image, tabular, text, or video).
  • Export data from a dataset to Cloud Storage.
  • Use AutoML to train a model using image, tabular, text, or video data.
  • Run a custom training job using a custom container or a Python package.
  • Upload an existing model to Vertex AI for batch prediction.
  • Create a new endpoint and deploy a model to it for online predictions.

Track and analyze ML metadata

Use Vertex AI Workbench

Experiment with Vertex AI TensorBoard

Optimize with Vertex AI Vizier

Manage features using Vertex AI Feature Store

1.2.Resources

ML Experiment Tracking with Vertex AI](https://youtu.be/a_YXZ5UltkU)

1.3.Tutorial

overview
Intro to Vertex AI
Use Case Overview
Set up your environment
Initial Setup Steps in Your Notebook
Let's Build our Pipeline
Identify the Best Performing Run
Cleanup
Overview
Intro to Vertex AI
Cloud environment setup
Vertex Pipelines setup
Creating your first pipeline
Creating an end-to-end ML pipeline
Cleanup

The focus of this demo is you can use Vertex AI to train and deploy a ML model. It assumes that you are familiar with Machine Learning even though the machine learning code for training is provided to you. You will use Datasets for dataset creation and managemet, and custom model for training a Scikit Learn model. Finally you will deploy the trained model and get online predictions. The dataset you will use for this demo is the Titanic Dataset.

Better Programming: Vertex AI Tutorial Series

The official notebooks are a collection of curated and non-curated notebooks authored by Google Cloud staff members. The curated notebooks are linked to in the Vertex AI online web documentation.

The community notebooks are not officially maintained by Google.

2.AI Platform

2.1.Guides

2.2.Resources

20210401 No-cost AI and machine learning training opportunities from Google Cloud | Google Cloud Blog, EN, https://cloud.google.com/blog/topics/training-certifications/ai-ml-training-opportunities-from-google-cloud

2.3.Tutorials

3.AutoML

3.1.Guides

4.AutoML Vision

4.1.Guide

  • Before you begin: Set up your Google Cloud Platform project, authentication, and enable AutoML Vision.
  • Preparing your training data: Learn best practices in organizing and annotating the images you will use to train your model, as well as format a training CSV file.
  • Creating datasets and importing images: Create the dataset and import the training data used to train your model.
  • Training Cloud-hosted models: Train your custom model hosted on the Cloud and get the status of the training operation.
  • Training Edge (exportable) models: Train your custom exportable Edge model and get the status of the training operation.
  • Evaluating models: Review the performance of your model.
  • Deploying models: Deploy your model for use after training completes.
  • Making individual predictions: Use your custom model to annotate an individual prediction image with labels and bounding boxes online.
  • Making batch predictions: Use your custom model to annotate a batch of prediction images with labels and bounding boxes online.
  • Exporting Edge models: Export your different trained Edge model formats to Google Cloud Storage and for use on edge devices.
  • Undeploying models: Undeploy your model after you are done using them to avoid further hosting charges.
  • Managing datasets: Manage datasets associated with your project.
  • Managing models: Manage your custom models.

1.Analytics

1.1 Guides

1.2 Resources

1.3 Solutions

1.4 Tutorials

2.BigQuery

2.1 Guides


2.1.1.Discover

Get started with BigQuery

Explore BigQuery

  • BigQuery storage
  • BigQuery analytics
  • BigQuery administration

BigQuery resources

APIs, tools, and references

BigQuery roles and resources

BigQuery video tutorials

  • What is BigQuery? (4:39)
  • Using the BigQuery sandbox (3:05)
  • Asking questions, running queries (5:11)
  • Visualizing query results (5:38)
  • Managing access with IAM (5:23)
  • Protecting sensitive data with authorized views (7:12)
  • Querying external data with BigQuery (5:49)
  • What are user-defined functions? (4:59)

What's next

  • For an overview of BigQuery storage, see Overview of BigQuery storage.
  • For an overview of BigQuery queries, see Overview of BigQuery analytics.
  • For an overview of BigQuery administration, see Introduction to BigQuery administration.
  • For an overview of BigQuery security, see Overview of data security and governance.

2.1.2.Get started

2.1.3.Migrate

2.1.4.Design

Work with Datasets

Work with Tables

Limitations

  • Each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement.
  • Rows that were written to a table recently by using streaming (the tabledata.insertall method or the Storage Write API) cannot be modified with UPDATE, DELETE, or MERGE statements. The recent writes are those that occur within the last 30 minutes. All other rows in the table remain modifiable by using UPDATE, DELETE, or MERGE statements. The streamed data can take up to 90 minutes to become available for copy operations.
  • Correlated subqueries within a when_clause, search_condition, merge_update_clause or merge_insert_clause are not supported for MERGE statements.
  • Queries that contain DML statements cannot use a wildcard table as the target of the query. For example, a wildcard table can be used in the FROM clause of an UPDATE query, but a wildcard table cannot be used as the target of the UPDATE operation.

When to use clustering

Both partitioning and clustering can improve performance and reduce query cost.

Use clustering under the following circumstances:

  • You don't need strict cost guarantees before running the query.
  • You need more granularity than partitioning alone allows. To get clustering benefits in addition to partitioning benefits, you can use the same column for both partitioning and clustering.
  • Your queries commonly use filters or aggregation against multiple particular columns.
  • The cardinality of the number of values in a column or group of columns is large.

Use partitioning under the following circumstances:

  • You want to know query costs before a query runs. Partition pruning is done before the query runs, so you can get the query cost after partitioning pruning through a dry run. Cluster pruning is done when the query runs, so the cost is known only after the query finishes.
  • You need partition-level management. For example, you want to set a partition expiration time, load data to a specific partition, or delete partitions.
  • You want to specify how the data is partitioned and what data is in each partition. For example, you want to define time granularity or define the ranges used to partition the table for integer range partitioning.

Prefer clustering over partitioning under the following circumstances:

  • Partitioning results in a small amount of data per partition (approximately less than 1 GB).
  • Partitioning results in a large number of partitions beyond the limits on partitioned tables.
  • Partitioning results in your mutation operations modifying most partitions in the table frequently (for example, every few minutes).

You can also combine partitioning with clustering. Data is first partitioned and then data in each partition is clustered by the clustering columns.

Modify clustering specification

You can change or remove a table's clustering specifications, or change the set of clustered columns in a clustered table. This method of updating the clustering column set is useful for tables that use continuous streaming inserts because those tables cannot be easily swapped by other methods.

You can change the clustering specification in the following ways:

  • Call the tables.update or tables.patch API method.
  • Call the bq command-line tool's bq update command with the --clustering_fields flag.

Best practices

To get the best performance from queries against clustered tables, use the following best practices.

  • Sample table used in the examples
CREATE TABLE
  `mydataset.ClusteredSalesData`
PARTITION BY
  DATE(timestamp)
CLUSTER BY
  customer_id,
  product_id,
  order_id
  • Filter clustered columns by sort order
SELECT
  SUM(totalSale)
FROM
  `mydataset.ClusteredSalesData`
WHERE
  customer_id = 10000
  AND product_id LIKE 'gcp_analytics%'
  • Do not use clustered columns in complex filter expressions

    For example, the following query will not prune blocks because a clustered column—customer_id—is used in a function in the filter expression.

SELECT
  SUM(totalSale)
FROM
  `mydataset.ClusteredSalesData`
WHERE
  CAST(customer_id AS STRING) = "10000"
  • Do not compare clustered columns to other columns

    The following query does not prune blocks because the filter expression compares a clustered column—customer_id to another column—order_id.

SELECT
  SUM(totalSale)
FROM
  `mydataset.ClusteredSalesData`
WHERE
  customer_id = order_id

Set partition filter requirements

For information on adding the Require partition filter option when you create a partitioned table, see Creating partitioned tables.

If a partitioned table has the Require partition filter setting, then every query on that table must include at least one predicate that only references the partitioning column. Queries without such a predicate return the following error:

Cannot query over table 'project_id.dataset.table' without a filter that can be used for partition elimination.

Require a partition filter in queries

When you create a partitioned table, you can require the use of predicate filters by enabling the Require partition filter option. When this option is applied, attempts to query the partitioned table without specifying a WHERE clause produce the following error:

Cannot query over table 'project_id.dataset.table' without a filter that can be used for partition elimination.

There must be at least one predicate that only references a partition column for the filter to be considered eligible for partition elimination (@@the predicate must be some literal value, it can't be a result from subquery). For example, for a table partitioned on column partition_id with an additional column f in its schema, both of the following WHERE clauses satisfy the requirement:

WHERE partition_id = "foo"
WHERE partition_id = "foo" AND f = "bar"

2.1.5.Load

2.1.6.Analyze

Querying BigQuery Data

BigQuery writes all query results to a table. The table is either explicitly identified by the user (a destination table), or it is a temporary, cached results table. Temporary, cached results tables are maintained per-user, per-project. There are no storage costs for temporary tables, but if you write query results to a permanent table, you are charged for storing the data.

All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions.

Best practice: Use nested and repeated fields to denormalize data storage and increase query performance.

Query data with SQL

Querying nested arrays

WITH Races AS (
  SELECT "800M" AS race,
    [STRUCT("Rudisha" AS name, [23.4, 26.3, 26.4, 26.1] AS laps),
     STRUCT("Makhloufi" AS name, [24.5, 25.4, 26.6, 26.1] AS laps),
     STRUCT("Murphy" AS name, [23.9, 26.0, 27.0, 26.0] AS laps),
     STRUCT("Bosse" AS name, [23.6, 26.2, 26.5, 27.1] AS laps),
     STRUCT("Rotich" AS name, [24.7, 25.6, 26.9, 26.4] AS laps),
     STRUCT("Lewandowski" AS name, [25.0, 25.7, 26.3, 27.2] AS laps),
     STRUCT("Kipketer" AS name, [23.2, 26.1, 27.3, 29.4] AS laps),
     STRUCT("Berian" AS name, [23.7, 26.1, 27.0, 29.3] AS laps)]
       AS participants)
SELECT
  race,
  participant
FROM Races AS r,UNNEST(r.participants) AS participant;

Querying external data sources

BigQuery offers support for querying data directly from:

  • Cloud Bigtable
  • Cloud Storage
  • Google Drive
  • Cloud SQL

2.1.7.Administrater

Manage resources

Optimize resources

Query performance

When evaluating query performance in BigQuery, the amount of work required depends on a number of factors:

  • Input data and data sources (I/O): How many bytes does your query read?
  • Communication between nodes (shuffling): How many bytes does your query pass to the next stage? How many bytes does your query pass to each slot?
  • Computation: How much CPU work does your query require?
  • Outputs (materialization): How many bytes does your query write?
  • Query anti-patterns: Are your queries following SQL best practices?

The following best practices provide guidance on controlling query computation.

  • Avoid repeatedly transforming data through SQL queries
  • Avoid JavaScript user-defined functions
  • Use approximate aggregation functions
  • Use aggregate analytic function to obtain the latest record
  • Order query operations to maximize performance
  • Optimize your join patterns
  • Use INT64 data types in joins to reduce cost and improve comparison performance
  • Prune partitioned queries
  • Avoid multiple evaluations of the same Common Table Expressions (CTEs)
  • Split complex queries into multiple smaller ones

2.1.8.Govern

2.1.9.Develop

The BigQuery Storage Read API provides fast access to BigQuery-managed storage by using an rpc-based protocol.

Background

Historically, users of BigQuery have had two mechanisms for accessing BigQuery-managed table data:

  • Record-based paginated access by using the tabledata.list or jobs.getQueryResults REST API methods. The BigQuery API provides structured row responses in a paginated fashion appropriate for small result sets.
  • Bulk data export using BigQuery extract jobs that export table data to Cloud Storage in a variety of file formats such as CSV, JSON, and Avro. Table exports are limited by daily quotas and by the batch nature of the export process.

The BigQuery Storage Read API provides a third option that represents an improvement over prior options. When you use the Storage Read API, structured data is sent over the wire in a binary serialization format. This allows for additional parallelism among multiple consumers for a set of results.

The Storage Read API does not provide functionality related to managing BigQuery resources such as datasets, jobs, or tables.

2.2 Reference

2.2.1.BigQuery APIs

2.2.2.BigQuery CLI

2.2.3.SQL in BigQuery, Standard SQL reference

Statements

from google.cloud import bigquery
client = bigquery.Client()
job = client.get_job(job_id='45f069ef-660f-41d1-8204-2f0dcf127673')
print(job.state)
#
client.cancel_job(job_id='45f069ef-660f-41d1-8204-2f0dcf127673')

2.3 Resources

Built-in Python Tutorials:

Vertex AI -> Notebook -> /tutorials/bigquery

2.4 Solutions

3.BigQuery ML

3.1.Guides

3.2.Resources

3.3.1.GCP Tutorials

This tutorial uses a linear regression model in BigQuery to predict the weight of a penguin.

This tutorial uses a binary logistic regression model in BigQuery ML to predict the income range of respondents in the US Census Dataset

This tutorial uses a k-means model in BigQuery ML to identify clusters of data in the London Bicycle Hires public dataset.

This tutorial uses the public movielens dataset to create a model from explicit feedback that generates movie recommendations for a user.

This tutorial uses the BigQuery Google Analytics sample to create a model from implicit feedback that recommends content for a visitor to a website.

This tutorial creates a time series model to perform single time-series forecasts using the google_analytics_sample.ga_sessions sample table.

This tutorial creates a set of time-series models to perform multiple time-series forecasts with a single query. You will use the new_york.citibike_trips data. This data contains information about Citi Bike trips in New York City.

This tutorial uses a set of techniques to enable 100x faster forecasting without sacrificing much forecasting accuracy. It enables forecasting millions of time series within hours using a single query.

This tutorial uses the BigQuery TRANSFORM clause for feature engineering to create a model that predicts the birth weight of a child.

This tutorial uses the tlc_yellow_trips_2018 sample table to create a model with hyperparameter tuning that predicts the tip of a taxi trip.

This tutorial imports a TensorFlow model into a BigQuery ML dataset and use it to make predictions from a SQL query.

This tutorial exports a BigQuery ML model and then deploys the model either on AI Platform or on a local machine. You will use the iris table from the BigQuery public datasets.

4.1 Guides

Overview

Sample tags attached to a data entry

Data Lineage

Discover Data

Simple search

In its simplest form, a Data Catalog search query comprises a single predicate. Such a predicate can match several pieces of metadata:

  • A substring of a name, display name, or description of a data asset
  • Exact type of a data asset
  • A substring of a column name (or nested column name) in the schema of a data asset
  • A substring of a project ID
  • The value of a public tag, the name of a public tag template, or a field name in a public tag template attached to a data entry.
  • (Preview) A string for an email address or name for a data steward
  • (Preview) A string from an overview description
Qualified predicates
  • An equal sign (=) restricts the search to an exact match.
  • A colon (:) after the key matches the predicate to either a substring or token within the value in search results.

Data Catalog supports the following qualifiers:

Qualifier Description
name:x Matches x as a substring of the data asset ID.
displayname:x Match x as a substring of the data asset display name.
column:x Matches x as a substring of the column name (or nested column name) in the schema of the data asset. Currently, you can search for a nested column by its path using the AND logical operator. For example, column:(foo bar) matches a nested column with the foo.bar path.

APIs & Reference

4.2 Resources

Unfortunately currently there is no versioning in Data Catalog.

This tutorial suggests a solution to create a historical record of metadata Data Catalog tags by creating change records in real time by capturing and parsing the audit logs from Cloud Logging and processing them in real time by using Pub/Sub and Dataflow to append into a BigQuery table for historical analysis.

Prebuilt templates

Another common question we hear from potential clients is: Do you have prebuilt templates to help us get started with creating our own? Due to the popularity of this request, we created a few examples to illustrate the types of templates being deployed by our users. You can find them in YAML format below and through a GitHub repo. There is also a script in the same repo that reads the YAML-based templates and creates the actual templates in Data Catalog.

BigQuery stores metadata about each object stored in it. You can query these metadata tables to get a better understanding of a dataset and it's contents. See documentation.

BQ nested and repeated columns allow you to achieve the performance benefits of denormalization while retaining the structure of the data.

To illustrate, consider this query against a Bitcoin dataset. The query joins the blocks and transactions tables to find the max transaction ID for each block.

4.3 Tutorials

5.1 Guides

Objectives

In following this guide, you use the Dataplex entities to build a data mesh architecture:

  • Create a Dataplex lake that will act as the domain for your data mesh.
  • Add zones to your lake that will represent individual teams within each domain and provide managed data contracts.
  • Attach assets that map to data stored in Cloud Storage.

Dataplex data quality tasks enable you to define and execute data quality checks across tables in BigQuery and Cloud Storage. Dataplex data quality tasks allow you to apply regular data controls in BigQuery environments.

When to create Dataplex data quality tasks

  • You want to validate data as part of the data production pipeline.
  • You want to routinely monitor quality of datasets against your expectations.
  • You want to build data quality reports for regulatory requirements.

1.Cloud Run

Services and jobs: two ways to run your code

On Cloud Run, your code can either run continuously as a service or as a job. Both services and jobs run in the same environment and can use the same integrations with other services on Google Cloud.

  • Cloud Run services. Used to run code that responds to web requests, or events.
  • Cloud Run jobs. Used to run code that performs work (a job) and quits when the work is done.
When to use Cloud Run services

Cloud Run services are great for code that handles requests or events. Example use cases include:

  • Websites and web applications

    Build your web app using your favorite stack, access your SQL database, and render dynamic HTML pages.

  • APIs and microservices

    You can build a REST API, or a GraphQL API or private microservices that communicate over HTTP or gRPC.

  • Streaming data processing

    Cloud Run services can receive messages from Pub/Sub push subscriptions and events from Eventarc.

When to use Cloud Run jobs

Cloud Run jobs are well-suited to run code that performs work (a job) and quits when the work is done. Here are a few examples:

  • Script or tool

    Run a script to perform database migrations or other operational tasks.

  • Array job

    Perform highly parallelized processing of all files in a Cloud Storage bucket.

  • Scheduled job

    Create and send invoices at regular intervals, or save the results of a database query as XML and upload the file every few hours.

1.1.Guides

Concepts

# Retrieve Job-defined env vars
TASK_INDEX = os.getenv("CLOUD_RUN_TASK_INDEX", 0)
TASK_ATTEMPT = os.getenv("CLOUD_RUN_TASK_ATTEMPT", 0)
# Retrieve User-defined env vars
SLEEP_MS = os.getenv("SLEEP_MS", 0)
FAIL_RATE = os.getenv("FAIL_RATE", 0)


# Define main script
def main(sleep_ms=0, fail_rate=0):
gcloud run jobs create job-quickstart \
    --image gcr.io/PROJECT_ID/logger-job \
    --tasks 50 \
    --set-env-vars SLEEP_MS=10000 \
    --set-env-vars FAIL_RATE=0.5 \
    --max-retries 5 \
    --region REGION
    --project=PROJECT_ID

Execute background jobs

You can structure a job as a single task or as multiple, independent tasks (up to 10,000 tasks) that can be executed in parallel. Each task runs one container instance and can be configured to retry in case of failure. Each task is aware of its index, which is stored in the CLOUD_RUN_TASK_INDEX environment variable. The overall count of tasks is stored in the CLOUD_RUN_TASK_COUNT environment variable. If you are processing data in parallel, your code is responsible for determining which task handles which subset of the data.

gcloud run jobs create JOB_NAME --image IMAGE_URL OPTIONS

Execute Jobs

gcloud run jobs execute JOB_NAME

gcloud run jobs create JOB_NAME --execute-now

How-to guides

Develop

In order to be a good fit for Cloud Run, your app needs to meet all of the following criteria. See the Cloud Run container contract for more information.

  • Serves requests, streams, or events delivered via HTTP, HTTP/2, WebSockets, or gRPC, or executes to completion.
  • Does not require a local persistent file system, but either a local ephemeral file system or a network file system.
  • Is built to handle multiple instances of the app running simultaneously.
  • Does not require more than 8 CPU and 32 GiB of memory per instance.
  • Meets one of the following criteria:
    • Is containerized.
    • Is written in Go, Java, Node.js, Python, or .NET.
    • You can otherwise containerize it.

1.2.Resources

1.4.Tutorials

codelab

1.BigTable

Guide

2.MySQL

Guides

Create and manage

Use best practices

Instance configuration and administration

Data architecture

Application implementation

Data import and export

Backup and recovery

The Cloud SQL SLA agreement excludes outages "caused by factors outside of Google’s reasonable control". This page describes some of the user-controlled configurations that can cause an outage for a Cloud SQL instance to be excluded.

1.Data Pipelines

1.1.Guides

1.2.Resources

  • 20210618 Orchestrating your data workloads in Google Cloud, EN

    Services like Data Fusion, Dataflow and Dataproc are great for ingesting, processing and transforming your data. These services are designed to operate directly on big data and can build both batch and real time pipelines that support the performant aggregation (shuffling, grouping) and scaling of data. This is where you should build your data pipelines and you can use Composer to manage the execution of these services as part of a wider workflow.

Google Cloud’s first general purpose workflow orchestration tool was Cloud Composer.

However, if you want to process events or chain APIs in a serverless way—or have workloads that are bursty or latency-sensitive—we recommend Workflows.

2.Dataproc

2.1.Guides

Compute options

Dataproc provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Dataproc cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.

2.2.Resources

Dataproc projects

Dataproc initialization actions

GCP Token Broker

Dataproc Custom Images

Dataproc Spawner

Connectors

Hadoop/Spark GCS Connector

Hadoop BigQuery Connector

Spark Pubsub Connector

Spark Spanner Connector

Hive Bigquery Storage Handler

Kubernetes Operators

Spark kubernetes operator

Flink kubernetes operator

Examples

Dataproc Python examples

Dataproc Pubsub Spark Streaming example

Dataproc Java Bigtable sample

Dataproc Spark-Bigtable samples

  • Dataproc Serverless for Spark
  • BigQuery Stored Procedure for Apache Spark
  • Serverless Spark with Vertex AI for Spark MLOps
  • SparkSQL Workbench for Data Exploration in Dataplex
  • Dataproc on GKE for Spark

2.3.Tutorials

  • before you begin

    • Copy public data to your Cloud Storage bucket. Copy a public data Shakespeare text snippet into the input folder of your Cloud Storage bucket:
gsutil cp gs://pub/shakespeare/rose.txt \
    gs://${BUCKET_NAME}/input/rose.txt
  • Prepare the Spark wordcount job

3.Dataproc Serverless

3.1.Guides

Setup
Submit a Spark batch workload
Estimate workload costs

Use the spark-bigquery-connector with Apache Spark to read and write data from and to BigQuery. This tutorial demonstrates a PySpark application that uses the spark-bigquery-connector.

Dataproc Serverless for Spark runs workloads within Docker containers. The container provides the runtime environment for the workload's driver and executor processes. By default, Dataproc Serverless for Spark uses a container image that includes the default Spark, Java, Python and R packages associated with a runtime release version. The Dataproc Serverless for Spark batches API allows you to use a custom container image instead of the default image. Typically, a custom container image adds Spark workload Java or Python dependencies not provided by the default container image. Important: Do not include Spark in your custom container image; Dataproc Serverless for Spark will mount Spark into the container at runtime.

When you submit your Spark workload, Dataproc Serverless for Spark can dynamically scale workload resources, such as the number of executors, to run your workload efficiently. Dataproc Serverless autoscaling is the default behavior, and uses Spark dynamic resource allocation to determine whether, how, and when to scale your workload.

You can set Spark properties when you submit a Spark batch workload.

3.2.Resources

Google is providing this collection of pre-implemented Dataproc templates as a reference and to provide easy customization for developers wanting to extend their functionality.

This repository contains Serverless Spark on GCP solution accelerators built around common use cases - helping data engineers and data scientists with Apache Spark experience ramp up faster on Serverless Spark on GCP.

4.Data Fusion

4.1.Guides

Quickstarts

This tutorial shows how to build a reusable pipeline that reads data from Cloud Storage, performs data quality checks, and writes to Cloud Storage.

Reusable pipelines have a regular pipeline structure, but you can change the configuration of each pipeline node based on configurations provided by an HTTP server. For example, a static pipeline might read data from Cloud Storage, apply transformations, and write to a BigQuery output table. If instead you want the transformation and BigQuery output table to change based on the Cloud Storage file that the pipeline reads, you create a reusable pipeline.

5.Dataflow

5.2.Guide

When Dataflow launches worker VMs, it uses Docker container images to launch containerized SDK processes on the workers. You can specify a custom container image instead of using one of the default Apache Beam images. When you specify a custom container image, Dataflow launches workers that pull the specified image. The following list includes reasons you might use a custom container:

GKE

Guides

Resources

Virtual Private Cloud

Guides

Access APIs and services

Serverless VPC Access

Serverless VPC Access makes it possible for you to connect directly to your Virtual Private Cloud network from serverless environments such as Cloud Run, App Engine, or Cloud Functions. Configuring Serverless VPC Access allows your serverless environment to send requests to your VPC network using internal DNS and internal IP addresses (as defined by RFC 1918 and RFC 6598). The responses to these requests also use your internal network.

Serverless VPC Access example (click to enlarge)

Guides

For your application to submit traces to Cloud Trace, it must be instrumented. You can instrument your code by using the Google client libraries. However, it's recommended that you use OpenTelemetry or OpenCensus to instrument your application. These are open source tracing packages. OpenTelemetry is actively in development and is the preferred package.

Guides

Metric kind

Each time series includes the metric kind (type MetricKind) for its data points. The kind of metric data tells you how to interpret the values relative to each other. Cloud Monitoring metrics are one of three kinds:

  • A gauge metric, in which the value measures a specific instant in time. For example, metrics measuring CPU utilization are gauge metrics; each point records the CPU utilization at the time of measurement. Another example of a gauge metric is the current temperature.

  • A delta metric, in which the value measures the change since it was last recorded. For example, metrics measuring request counts are delta metrics; each value records how many requests were received since the last data point was recorded.

  • A cumulative metric, in which the value constantly increases over time. For example, a metric for “sent bytes” might be cumulative; each value records the total number of bytes sent by a service at that time.

Guides

Monitor Logs

Create metrics from logs

Log-based metrics derive metric data from the content of log entries. For example, you can use a log-based metric to count the number of log entries that contain a particular message or to extract latency information recorded in log entries. You can use log-based metrics in Cloud Monitoring charts and alerting policies.

This document explains how to create a counter-type log-based metric by using the Google Cloud console, the Logging API, and the Google Cloud CLI.

Guides

Cloud Resources Detector Example]()
Cloud Trace Exporter Example]()
Cloud Trace Propagator Example]()
End-to-End Example with Flask]()

Resources

Authentication

Guides

Resources

  • Python client for Google Auth, googleapis.dev/python/google-auth/latest/index.html
  • IAM code samples, cloud.google.com/iam/docs/samples

Solutions

Cloud Shell

env vars

name desc example
${GOOGLE_CLOUD_PROJECT} GCP Project ID qwiklabs-gcp-02-e9a6b0bbe66d

commands

examples

gsutil cp -z html -a public-read cattypes.html tabby.jpeg gs://mycats
gsutil -m cp -r dir gs://my-bucket
gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket
gsutil ls –[l or lr] gs://[BUCKET_NAME]/**
gsutil lifecycle set [LIFECYCLE_JSON-CONFIG_FILE] gs://[BUCKET_NAME]
gsutil mb -c nearline gs://archive_bucket

gcloud auth

# list the active account name
gcloud auth list 
Credentialed accounts:
- google1623327_student@qwiklabs.net

gcloud config

# ist the project ID
gcloud config list project 
[core]
project = qwiklabs-gcp-44776a13dea667a6

# set your Project ID
gcloud config set project <YOUR_PROJECT_ID> 

# set it as an environment variable
export PROJECT_ID=$(gcloud config get-value project) 

gcloud container

gcloud container clusters resize NAME (--num-nodes=NUM_NODES | --size=NUM_NODES) [--async] [--node-pool=NODE_POOL] [--region=REGION | --zone=ZONE, -z ZONE]  [GCLOUD_WIDE_FLAG …]
gcloud container clusters update CLUSTER_NAME --enable-autoscaling --min-nodes=1 --max-nodes=10

gcloud projects

gcloud projects add-iam-policy-binding

gcloud services

gcloud services list –enabled
gcloud services list --available

1.Service Choreography / Orchestration

1.2.Resources

20211006 Service orchestration on Google Cloud, EN
  • Service Choreography - With service choreography, each service works independently and interacts with other services in a loosely coupled way through events. Loosely coupled events can be changed and scaled independently, which means there is no single point of failure. But, so many events flying around between services makes it quite hard to monitor. Business logic is distributed and spans across multiple services, so there is no single, central place to go for troubleshooting. There's no central source of truth to understand the system. Understanding, updating and troubleshooting are all distributed

  • Service Orchestration - To handle the monitoring challenges of choreography, developers need to bring structure to the flow of events, while retaining the loosely coupled nature of event-driven services. Using service orchestration, the services interact with each other via a central orchestrator that controls all interactions between the services. This orchestrator provides a high-level view of the business processes to track execution and troubleshoot issues. In Google Cloud, Workflows handles service orchestration.

Product Description
Cloud Workflows Suuport service orchestratio. Workflows is a fully-managed serverless service to orchestrate and automate Google Cloud and HTTP-based API services with serverless workflows. Workflows is particularly helpful with Google Cloud services that perform long-running operations, as Workflows will wait for them to complete, even if they take hours. With callbacks, Workflows can wait for external events for days or months. You can use either YAML or JSON to express your workflow.
Cloud Pub/Sub Support service choreography. Pub/Sub enables services to communicate asynchronously, with latencies on the order of 100 milliseconds. Pub/Sub is used for messaging-oriented middleware for service integration or as a queue to parallelize tasks.
Cloud Eventarc Support service choreography. Eventarc enables you to build event-driven architectures without having to implement, customize, or maintain the underlying infrastructure. Any service with Audit Log integration or any application that can send a message to a Pub/Sub topic can be event sources for Eventarc.
Cloud Task Support service choreography. Cloud Tasks lets you separate out pieces of work that can be performed independently, outside of your main application flow, and send them off to be processed asynchronously using handlers that you create. Difference between Pub/Sub and Cloud Tasks. Pub/Sub supports implicit invocation: a publisher implicitly causes the subscribers to execute by publishing an event. Cloud Tasks is aimed at explicit invocation where the publisher retains full control of execution including specifying an endpoint where each message is to be delivered. Unlike Pub/Sub, Cloud Tasks provides tools for queue and task management including scheduling specific delivery times, rate controls, retries, and deduplication.
Cloud Scheduler Support service choreography. With Cloud Scheduler, you set up scheduled units of work to be executed at defined times or regular intervals, commonly known as cron jobs. Cloud Scheduler can trigger a workflow (orchestration) or generate a Pub/Sub message (choreography). Cloud Scheduler uses cron scheduling to trigger the execution of HTTP-based services at a schedule you define.
20210422 Choosing the right orchestrator in Google Cloud, EN

20210116 Eventarc: A unified eventing experience in Google Cloud | Google Cloud Blog

Eventarc provides an easier path to receive events not only from Pub/Sub topics but from a number of Google Cloud sources with its 'Audit Log' and Pub/Sub integration. Any service with Audit Log integration or any application that can send a message to a Pub/Sub topic can be event sources for Eventarc.

In Eventarc, different events from different sources are converted to 'CloudEvents' compliant events. CloudEvents is a specification for describing event data in a common way with the goal of consistency, accessibility and portability.

20201202 Better service orchestration with Workflows, EN

In Orchestration, a central service defines and controls the flow of communication between services. With centralization, it becomes easier to change and monitor the flow and apply consistent timeout and error policies. 

In Choreography, each service registers for and emits events as they need. There’s usually a central event broker to pass messages around, but it does not define or direct the flow of communication. This allows services that are truly independent at the expense of less traceable and manageable flow and policies. 

In orchestration vs choreography debate, there is no right answer. If you’re implementing a well-defined process with a bounded context, something you can picture with a flow diagram, orchestration is often the right solution. If you’re creating a distributed architecture across different domains, choreography can help those systems to work together.

After Lambda: Exactly-once processing in Cloud Dataflow,

2.1.Guides

In this scenario, there are two publishers publishing messages on a single topic. There are two subscriptions to the topic.

The first subscription has two subscribers, meaning messages will be load-balanced across them, with each subscriber receiving a subset of the messages.

The second subscription has one subscriber that will receive all of the messages.

The bold letters represent messages. Message A comes from Publisher 1 and is sent to Subscriber 2 via Subscription 1, and to Subscriber 3 via Subscription 2. Message B comes from Publisher 2 and is sent to Subscriber 1 via Subscription 1 and to Subscriber 3 via Subscription 2.

  • Building streaming pipelines with Pub/Sub
  • Pub/Sub and Dataflow integration features
    • Low latency watermarksLow latency watermarks
    • High watermark accuracy
    • Efficient deduplication

Pull subscription

  • Large volume of messages (GBs per second).
  • Efficiency and throughput of message processing is critical.
  • Environments where a public HTTPS endpoint with a non-self-signed SSL certificate is not feasible to set up.

Push subscription

  • Multiple topics that must be processed by the same webhook.
  • App Engine Standard and Cloud Functions subscribers.
  • Environments where Google Cloud dependencies (such as credentials and the client library) are not feasible to set up.

Export subscription

  • Large volume of messages that can scale up to multiple millions of messages per second.
  • Messages are directly sent to a Google Cloud resource without any additional processing.

Pub/Sub offers a broader range of features, per-message parallelism, global routing, and automatically scaling resource capacity.

Pub/Sub Lite can be as much as an order of magnitude less expensive, but offers lower availability and durability. In addition, Pub/Sub Lite requires you to manually reserve and manage resource capacity.

Resource Limits
Project 10,000 topics; 10,000 attached or detached subscriptions; 5,000 snapshots; 10,000 schemas
Topic 10,000 attached subscriptions; 5,000 attached snapshots
Subscription Retains unacknowledged messages in persistent storage for 7 days from the moment of publication. There is no limit on the number of retained messages. If subscribers don't use a subscription, the subscription expires. The default expiration period is 31 days.
Schema Schema size (the definition field): 10KB
Publish request 10MB (total size); 1,000 messages
Message Message size (the data field): 10MB; Attributes per message: 100; Attribute key size: 256 bytes; Attribute value size: 1024 bytes
Push outstanding messages 3,000 * N by default. 30,000 * N for subscriptions that acknowledge >99% of messages and average <1s of push request latency. N is the number of publish regions. For more information, see Using push subscriptions.
StreamingPull streams 10 MB/s per open stream
Pull/StreamingPull messages The service might impose limits on the total number of outstanding StreamingPull messages per connection. If you run into such limits, increase the rate at which you acknowledge messages and the number of connections you use.

Quota mismatches

Quota mismatches can happen when published or received messages are smaller than 1000 bytes. For example:

If you publish 10 500-byte messages in separate requests, your publisher quota usage will be 10,000 bytes. This is because messages that are smaller than 1000 bytes are automatically rounded up to the next 1000-byte increment.

If you receive those 10 messages in a single pull response, your subscriber quota usage might be only 5 kB, since the actual size of each message is combined to determine the overall quota.

The inverse is also true. The subscriber quota usage might be greater than the publisher quota usage if you publish multiple messages in a single publish request or receive the messages in separate Pull requests.

2.3.Resources

Cloud Pub/Sub Overview - ep. 1
What is Cloud Pub/Sub? - ep. 2
Cloud Pub/Sub in Action - ep. 3
Cloud Pub/Sub Publishers - ep. 4
Cloud Pub/Sub Subscribers - ep. 5
Push or Pull Subscriber? - ep. 6
Receiving messages using Pull - ep. 7
Receiving Messages using Push to Cloud Function - ep.8
Using Cloud Pub/Sub with Cloud Run - ep. 9
Replaying and discarding messages - ep. 10
Choosing Pub Sub or Pub Sub Lite? - ep. 11

3.Cloud Task

3.1.Guides

Feature Cloud Scheduler Cloud Tasks
Triggering Triggers actions at regular fixed intervals. You set up the interval when you create the cron job, and the rate does not change for the life of the job. Triggers actions based on how the individual task object is configured. If the scheduleTime field is set, the action is triggered at that time. If the field is not set, the queue processes its tasks in a non-fixed order.
Setting rates Initiates actions on a fixed periodic schedule. Once a minute is the most fine-grained interval supported. Initiates actions based on the amount of traffic coming through the queue. You can set a maximum rate when you create the queue, for throttling or traffic smoothing purposes, up to 500 dispatches per second.
Naming Except for the time of execution, each run of a cron job is exactly the same as every other run of that cron job. Each task has a unique name, and can be identified and managed individually in the queue.
Handling failure If the execution of a cron job fails, the failure is logged. If retry behavior is not specifically configured, the job is not rerun until the next scheduled interval. If the execution of a task fails, the task is re-tried until it succeeds. You can limit retries based on the number of attempts and/or the age of the task, and you can control the interval between attempts in the configuration of the queue.

4.Cloud Workflow

4.1.Guides

[Choose Workflows or Cloud Composer for service orchestration](https://cloud.google.com/workflows/docs/choose-orchestration

Workflows orchestrates multiple HTTP-based services into a durable and stateful workflow. It has low latency and can handle a high number of executions. It's also completely serverless.

Workflows is great for chaining microservices together, automating infrastructure tasks like starting or stopping a VM, and integrating with external systems. Workflows connectors also support simple sequences of operations in Google Cloud services such as Cloud Storage and BigQuery.

Cloud Composer is designed to orchestrate data driven workflows (particularly ETL/ELT). It's built on the Apache Airflow project, but Cloud Composer is fully managed. Cloud Composer supports your pipelines wherever they are, including on-premises or across multiple cloud platforms. All logic in Cloud Composer, including tasks and scheduling, is expressed in Python as Directed Acyclic Graph (DAG) definition files.

Cloud Composer is best for batch workloads that can handle a few seconds of latency between task executions. You can use Cloud Composer to orchestrate services in your data pipelines, such as triggering a job in BigQuery or starting a Dataflow pipeline. You can use pre-existing operators to communicate with various services, and there are over 150 operators for Google Cloud alone.

Detailed feature comparison

Feature Workflows Cloud Composer
Syntax Workflows syntax in YAML or JSON format Python
State model Imperative flow control Declarative DAG with automatic dependency resolution
Integrations HTTP requests and connectors Airflow Operators and Sensors
Passing data between steps 64KB for variables 48KB for XCom
Execution triggers and scheduling gcloud CLI, Cloud Console, Workflows API, Workflows client libraries, Cloud Scheduler Cron-like schedules in the DAG definition file, Airflow Sensors
Asynchronous patterns Polling; Waiting for long-running Google Cloud operations Polling
Parallel execution Available via experimental.executions.map Automatic based on dependencies
Execution latency Milliseconds Seconds
Based on open source No Yes (Apache Airflow)
Scaling model Serverless (scales up to demand and down to zero) Provisioned
Billing model Usage-based (per step executed) Based on provisioned capacity
Data processing features No Backfills, ability to re-run DAGs