zukko78/GCP-Architect-QuickNotes

## GCP-Architect-QuickNotes
The access pattern fits Nearline storage class requirements and Nearline is a more cost-effective storage approach than Multi-Regional.
The object lifecycle management policy to delete data is correct versus changing the storage class to Coldline.

Google Cloud Storage supports Multi-Regional buckets that synchronize data across regions automatically.
Google Cloud SQL instances are deployed within a single region.
Google Cloud Bigtable data is stored within a single region.
Google Cloud Datastore is stored within a single region.

Using a shared VPC allows each team to individually manage their own application resources, while enabling each application to communicate
between each other securely over RFC1918 address space.
Deploying services into a single project results in every team accessing and managing the same project resources. This is difficult to
manage and control as the number of teams involved increases.
HTTPS is a valid option, however this answer does not address the need to ensure management of individual projects.
The global load balancer uses a public IP address, and therefore it does not conform to the requirement of communication over RFC1918
address space.

Since the data is accessed frequently within the first 30 days, using Google Cloud Regional Storage will enable the most cost-effective
solution for storing and accessing the data. For videos older than 30 days, Google Cloud Coldline Storage offers the most cost-effective
solution since it won’t be accessed.
While Google Cloud Coldline storage is cost-effective for long-term video storage, Google Cloud Nearline Storage would not be an
effective solution for the first 30 days as the data is expected to be accessed frequently.
While Google Cloud Regional Storage is the most cost-effective solution for the first 30 days, Google Cloud Nearline Storage is not
cost effective for long-term storage.
While Google Cloud Regional Storage is the most cost-effective solution for the first 30 days, storing the data on Google Cloud
Persistent Disk would not be cost-effective for long term storage.

The second disk can only be attached to one instance in read/write mode.
Selection of zone does not impact the number of instances started.
Using a custom sourceImage does not impact the number of instances started.
Instances require at least one network interface. Removing the network interface will create an invalid configuration.

Google Cloud Source Repositories provides a Git version controlled development environment. Google Cloud Container Builder builds docker images from source repositories like Google Cloud Source Repositories. Finally, Google Container Engine can run and manage your docker containers received from Jenkin’s deployment pipeline.
Google Cloud Storage and PubSub do not offer a means to manage or deploy application code.
Google Cloud Shell does not offer a means to build docker images, so while source code can live in Google Cloud Storage, Cloud Shell would not be the appropriate solution.
This option does not provide for any solution which builds the necessary docker images.

Deploying a new version without assigning it as the default version will not create downtime for the application. Using traffic splitting allows for easily redirecting a small amount of traffic to the new version and can also be quickly reverted without application downtime.
Deploying the application version as default requires moving all traffic to the new version. This could impact all users and disable the service.
Deploying a second project requires data synchronization and having an external traffic splitting solution to direct traffic to the new application. While this is possible, with Google App Engine, these manual steps are not required.
App Engine services are intended for hosting different service logic. Using different services would require manual configuration of the consumers of services to be aware of the deployment process and manage from the consumer side who is accessing which service.

Since you know that there is a burst of log lines you can set up a metric that identifies those lines. Stackdriver will also allow you to set up a text, email or messaging alert that can notify promptly when the error is detected so you can hop onto the system to debug.
Logging into an individual machine may not see the specific performance problem as multiple machines may be in the configuration and reducing the chances of interacting with an intermittent performance problem.
Error reporting won’t necessarily catch the log lines unless they are stack traces in the proper format. Additionally just because there is a pattern doesn’t mean you will know exactly when and where to log in to debug.
Trace may tell you where time is being spent but wont let you hone in on the exact host that the problem is occurring on because you generally only send samples of traces. There is also no alerting on traces to notify exactly when the problem is happening.

There is no requirement to migrate the current jobs to a different technology. Using managed services where possible is a requirement. Using Google Cloud Dataproc allows the current jobs to be executed within Google Cloud Platform on a managed services offering.
Migrating the existing data jobs to a different technology such as Google BigQuery, is not a requirement.
Migrating existing data jobs to a different technology such as Google Cloud Dataflow, is not a requirement.
Using managed services where possible is a requirement. The current jobs can run on a Hadoop/Spark cluster in Google Compute Engine but it is not a managed services solution.

Cloud Storage API easily allows write only bucket for the image uploads from the client, the upload event is then pushed into Pub/Sub triggering the Cloud Function to grab the file, push it through the Vision API, and send the meta-data into Pub/Sub, where Dataflow will see the message and process the file from GCS and store the metadata into Cloud SQL.
An App Engine app could be written to accept image uploads, but Datastore is not for storing image files.
An App Engine app could be written to accept image uploads, but natively Dataflow needs either a GCS bucket or a PubSub topic to listen to for event processing. Connecting Dataflow to AppEngine is a highly unusual architecture.
Connecting users directly to Dataflow for image uploads is not going to be able to handle the bursty nature of user upload traffic efficiently and thus won’t give a reliable experience to users.

Having the scanners be located outside the cloud environment will best emulate end user penetration testing. Using the public internet to access the environments best emulates end user traffic.
Google does not require notification for customers conducting security scanning on their own applications.
Deploying the security scanners within the cloud environment may not test the border security configuration that end users would normally access. This does not emulate as close as possible end user traffic.
Deploying the security scanners using the VPN between the on-premises and cloud environments may not test the border security configuration that end users would normally access. VPN traffic may be trusted higher than public internet traffic and not emulate as close as possible end user traffic.

This meets the criteria of doing this automatically and simultaneously.
Federated mode allows for deployment in a federated way, but does not do anything beyond that, you still have to have a tool like Jenkins to enable the "automated " part of the question, and with Jenkins you can accomplish the goal without necessarily needing federation to be enabled.
This may work in very simple examples, but as complexity grows this will become unmanageable.
Google Container Builder does not offer a way to push images to different clusters, they are published to Google Container Registry.

BigQuery is the only of these Google products that supports an SQL interface and a high enough SLA (99.9%) to make it readily available. Cloud Storage does not have an SQL interface.

This approach meets all of the requirements, it is easy to do and works cross project and cross region.
This approach affects performance of the existing machine and incurs significate network costs.
This approach does not allow you to create the VM in the new project since snapshots are limited to the project in which they are taken.
dd will not work correctly on a mounted disk.

This grants the least privilege required to access the data and minimizes the risk of accidentally granting access to the wrong people.
Signed URLs could potentially be leaked.
This is needlessly permissive, users only require one permission in order to get access.
This is security through obscurity, also known as no security at all.

Gives the security team read only access to everything your company produces, anything else gives them the ability to, accidentally or otherwise, change things.

This is a seamless way to ensure the last known good version of the static content is always available.
This allows for easy management of the VMs and lets GCE take care of updating each instance.
This copy process is unreliable and makes it tricky to keep things in sync, it also doesn’t provide a way to rollback once a bad version of the data has been written to the copy.
This would add a great deal of overhead to the process and would cause conflicts in association between different Deployment Manager deployments which could lead to unexpected behavior if an old version is changed.
This approach doesn’t scale well, there is a lot of management work involved.

Allows for extensive testing of the application in the green environment before sending traffic to it. Typically the two environments are identical otherwise which gives the highest level of testing assurance.
Allows for smaller, more incremental rollouts of updates (each microservice can be updated individually) which will reduce the likelihood of an error in each rollout.
Would remove a well proven step from the general release strategy, a canary release platform is not a replacement for QA, it should be additive.
Doesn’t really help the rollout strategy, there is no inherent property of a relational database that makes it more subject to failed releases than any other type of data storage.
Doesn’t really help either since NoSQL databases do not offer anything over relational databases that would help with release quality.

The HTTP(S) load balancer in GCP handles websocket traffic natively. Backends that use WebSocket to communicate with clients can use the HTTP(S) load balancer as a front end, for scale and availability.
There is no compelling reason to move away from websockets as part of a move to GCP.
This may be a good exercise anyway, but it doesn’t really have any bearing on the GCP migration.
There is no compelling reason to move away from websockets as part of a move to GCP.
	The access pattern fits Nearline storage class requirements and Nearline is a more cost-effective storage approach than Multi-Regional.
	The object lifecycle management policy to delete data is correct versus changing the storage class to Coldline.

	Google Cloud Storage supports Multi-Regional buckets that synchronize data across regions automatically.
	Google Cloud SQL instances are deployed within a single region.
	Google Cloud Bigtable data is stored within a single region.
	Google Cloud Datastore is stored within a single region.

	Using a shared VPC allows each team to individually manage their own application resources, while enabling each application to communicate
	between each other securely over RFC1918 address space.
	Deploying services into a single project results in every team accessing and managing the same project resources. This is difficult to
	manage and control as the number of teams involved increases.
	HTTPS is a valid option, however this answer does not address the need to ensure management of individual projects.
	The global load balancer uses a public IP address, and therefore it does not conform to the requirement of communication over RFC1918
	address space.

	Since the data is accessed frequently within the first 30 days, using Google Cloud Regional Storage will enable the most cost-effective
	solution for storing and accessing the data. For videos older than 30 days, Google Cloud Coldline Storage offers the most cost-effective
	solution since it won’t be accessed.
	While Google Cloud Coldline storage is cost-effective for long-term video storage, Google Cloud Nearline Storage would not be an
	effective solution for the first 30 days as the data is expected to be accessed frequently.
	While Google Cloud Regional Storage is the most cost-effective solution for the first 30 days, Google Cloud Nearline Storage is not
	cost effective for long-term storage.
	While Google Cloud Regional Storage is the most cost-effective solution for the first 30 days, storing the data on Google Cloud
	Persistent Disk would not be cost-effective for long term storage.

	The second disk can only be attached to one instance in read/write mode.
	Selection of zone does not impact the number of instances started.
	Using a custom sourceImage does not impact the number of instances started.
	Instances require at least one network interface. Removing the network interface will create an invalid configuration.

	Google Cloud Source Repositories provides a Git version controlled development environment. Google Cloud Container Builder builds docker images from source repositories like Google Cloud Source Repositories. Finally, Google Container Engine can run and manage your docker containers received from Jenkin’s deployment pipeline.
	Google Cloud Storage and PubSub do not offer a means to manage or deploy application code.
	Google Cloud Shell does not offer a means to build docker images, so while source code can live in Google Cloud Storage, Cloud Shell would not be the appropriate solution.
	This option does not provide for any solution which builds the necessary docker images.

	Deploying a new version without assigning it as the default version will not create downtime for the application. Using traffic splitting allows for easily redirecting a small amount of traffic to the new version and can also be quickly reverted without application downtime.
	Deploying the application version as default requires moving all traffic to the new version. This could impact all users and disable the service.
	Deploying a second project requires data synchronization and having an external traffic splitting solution to direct traffic to the new application. While this is possible, with Google App Engine, these manual steps are not required.
	App Engine services are intended for hosting different service logic. Using different services would require manual configuration of the consumers of services to be aware of the deployment process and manage from the consumer side who is accessing which service.

	Since you know that there is a burst of log lines you can set up a metric that identifies those lines. Stackdriver will also allow you to set up a text, email or messaging alert that can notify promptly when the error is detected so you can hop onto the system to debug.
	Logging into an individual machine may not see the specific performance problem as multiple machines may be in the configuration and reducing the chances of interacting with an intermittent performance problem.
	Error reporting won’t necessarily catch the log lines unless they are stack traces in the proper format. Additionally just because there is a pattern doesn’t mean you will know exactly when and where to log in to debug.
	Trace may tell you where time is being spent but wont let you hone in on the exact host that the problem is occurring on because you generally only send samples of traces. There is also no alerting on traces to notify exactly when the problem is happening.

	There is no requirement to migrate the current jobs to a different technology. Using managed services where possible is a requirement. Using Google Cloud Dataproc allows the current jobs to be executed within Google Cloud Platform on a managed services offering.
	Migrating the existing data jobs to a different technology such as Google BigQuery, is not a requirement.
	Migrating existing data jobs to a different technology such as Google Cloud Dataflow, is not a requirement.
	Using managed services where possible is a requirement. The current jobs can run on a Hadoop/Spark cluster in Google Compute Engine but it is not a managed services solution.

	Cloud Storage API easily allows write only bucket for the image uploads from the client, the upload event is then pushed into Pub/Sub triggering the Cloud Function to grab the file, push it through the Vision API, and send the meta-data into Pub/Sub, where Dataflow will see the message and process the file from GCS and store the metadata into Cloud SQL.
	An App Engine app could be written to accept image uploads, but Datastore is not for storing image files.
	An App Engine app could be written to accept image uploads, but natively Dataflow needs either a GCS bucket or a PubSub topic to listen to for event processing. Connecting Dataflow to AppEngine is a highly unusual architecture.
	Connecting users directly to Dataflow for image uploads is not going to be able to handle the bursty nature of user upload traffic efficiently and thus won’t give a reliable experience to users.

	Having the scanners be located outside the cloud environment will best emulate end user penetration testing. Using the public internet to access the environments best emulates end user traffic.
	Google does not require notification for customers conducting security scanning on their own applications.
	Deploying the security scanners within the cloud environment may not test the border security configuration that end users would normally access. This does not emulate as close as possible end user traffic.
	Deploying the security scanners using the VPN between the on-premises and cloud environments may not test the border security configuration that end users would normally access. VPN traffic may be trusted higher than public internet traffic and not emulate as close as possible end user traffic.

	This meets the criteria of doing this automatically and simultaneously.
	Federated mode allows for deployment in a federated way, but does not do anything beyond that, you still have to have a tool like Jenkins to enable the "automated " part of the question, and with Jenkins you can accomplish the goal without necessarily needing federation to be enabled.
	This may work in very simple examples, but as complexity grows this will become unmanageable.
	Google Container Builder does not offer a way to push images to different clusters, they are published to Google Container Registry.

	BigQuery is the only of these Google products that supports an SQL interface and a high enough SLA (99.9%) to make it readily available. Cloud Storage does not have an SQL interface.

	This approach meets all of the requirements, it is easy to do and works cross project and cross region.
	This approach affects performance of the existing machine and incurs significate network costs.
	This approach does not allow you to create the VM in the new project since snapshots are limited to the project in which they are taken.
	dd will not work correctly on a mounted disk.

	This grants the least privilege required to access the data and minimizes the risk of accidentally granting access to the wrong people.
	Signed URLs could potentially be leaked.
	This is needlessly permissive, users only require one permission in order to get access.
	This is security through obscurity, also known as no security at all.

	Gives the security team read only access to everything your company produces, anything else gives them the ability to, accidentally or otherwise, change things.

	This is a seamless way to ensure the last known good version of the static content is always available.
	This allows for easy management of the VMs and lets GCE take care of updating each instance.
	This copy process is unreliable and makes it tricky to keep things in sync, it also doesn’t provide a way to rollback once a bad version of the data has been written to the copy.
	This would add a great deal of overhead to the process and would cause conflicts in association between different Deployment Manager deployments which could lead to unexpected behavior if an old version is changed.
	This approach doesn’t scale well, there is a lot of management work involved.

	Allows for extensive testing of the application in the green environment before sending traffic to it. Typically the two environments are identical otherwise which gives the highest level of testing assurance.
	Allows for smaller, more incremental rollouts of updates (each microservice can be updated individually) which will reduce the likelihood of an error in each rollout.
	Would remove a well proven step from the general release strategy, a canary release platform is not a replacement for QA, it should be additive.
	Doesn’t really help the rollout strategy, there is no inherent property of a relational database that makes it more subject to failed releases than any other type of data storage.
	Doesn’t really help either since NoSQL databases do not offer anything over relational databases that would help with release quality.

	The HTTP(S) load balancer in GCP handles websocket traffic natively. Backends that use WebSocket to communicate with clients can use the HTTP(S) load balancer as a front end, for scale and availability.
	There is no compelling reason to move away from websockets as part of a move to GCP.
	This may be a good exercise anyway, but it doesn’t really have any bearing on the GCP migration.
	There is no compelling reason to move away from websockets as part of a move to GCP.