StevenACoffman/go-sql.md

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Database Environments

A schema contains a group of tables.
A database contains a group of schemas.
In prod, we use a cloudsql PostgreSQL instance.
The reports DB is for prod only. It's presence should be a gatekeeper to prevent destructive teardowns.
Everywhere else, we assume a local PostgreSQL running on port 5432 with two existing databases.
The khan_test DB is for integration tests.
The khan_dev DB is for local development.
We can use the postgres user in all three environments, but in prod we want to vary the user to better
differentiate the access pattern metrics.
The schemas for all three should be kept the same.
We do this with migrations.
Migrations

How to manually run these

Assuming you have golang-migrate CLI installed
(e.g. brew install golang-migrate).
NOTE: Running migrations in prod is super-dangerous! Measure twice, cut once!
The golang-migrate provides a CLI can be used to manually perform the database migrations.
For instance to get the initial schema created:
migrate -verbose -path .  -database postgres://localhost:5432/khan_test?sslmode=disable goto 1

For an existing database that already has the schemas applied, you can set the version without applying migrations via:
migrate -verbose -path .  -database postgres://localhost:5432/khan_test?sslmode=disable force 1

A complete tutorial is available here.
khan_dev=# \l
                           List of databases
   Name    |  Owner   | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+---------+-------+-------------------
 khan_dev  | postgres | UTF8     | C       | C     |
 khan_test | postgres | UTF8     | C       | C     |
 postgres  | steve    | UTF8     | C       | C     |

This is an example of a script that will set up the empty databases for dev and test:
# A simple script to create the test / dev postgres databases needed for webapp
# tests and development that depend on postgres.

# Our test and dev environments assume that the user has created a postgres
# user with create db privs.  This should have been done within dotfiles, but
# we try our best to do it here anyways.

if ! psql -tc "SELECT rolname from pg_catalog.pg_roles"  postgres \
  | grep -c 'postgres' > /dev/null 2>&1 ; then
    psql --quiet -c "CREATE ROLE postgres LOGIN SUPERUSER;" postgres;
fi

# Create the db that we'll use in our tests.
if ! psql -U postgres -l | grep khan_test ; then
  createdb khan_test -U postgres -O postgres
fi
psql -U postgres -qc "create extension if not exists btree_gist" khan_test;
psql -U postgres -qc "create extension if not exists pg_stat_statements" khan_test;

# We'll also create the dev db, in case we run tests that use the
# dev_appserver
if ! psql -U postgres -l | grep khan_dev ; then
  createdb khan_dev -U postgres -O postgres
fi
psql -U postgres -qc "create extension if not exists btree_gist" khan_dev;
psql -U postgres -qc "create extension if not exists pg_stat_statements" khan_dev;

How to add migrations

Please Note the naming of the files in this directory.
They need to be named so that the migrations can be ordered lexicographically.
It's often accomplished by using a numbered prefix like 001_something.sql.
If you added another file you would name it 002_whatever_you_want.sql.
Another possibility is yyyymmdd_whatever_you_want.sql, e.g. 20200921_something.sql
but you would need to rename all the files that already exist here to follow this convention.
How to Create The Initial Migration

The initial migrations in 001_initial_schema.up.sql were generated by running:
cloud_sql_proxy -dir=/Users/steve/cloudsql -instances=$DBHOST=tcp:5433 &
pg_dump -U $DBUSER -h 127.0.0.1 -p 5433 $DBNAME -s -f 001_initial_schema.up.sql

I deleted a few out-of-the-box functions, as well as an old deprecated table or two.
Subsequent migrations should be placed in files lexicographically sorted.
How to update the migration bindata for unit and integration tests

Assuming you have installed go-bindata using go get -u github.com/go-bindata/go-bindata/...:
cd ../..
go-bindata -pkg migrations -ignore bindata -nometadata -prefix reports/migrations/ -o ./reports/migrations/bindata.go ./reports/migrations

This will allow the unit and integration tests to verify the application is synched to the code.


## go-sql.md

      
    Raw
  

              go-sql.md
            
          
https://medium.com/@benbjohnson/structuring-applications-in-go-3b04be4ff091
https://medium.com/@benbjohnson/standard-package-layout-7cdbc8391fc1

No ORM

I prefer to write queries by hand. Maximum control and minimum mental overhead.
Generally ORMs are more trouble then they are worth, especially since the Go standard library has nice support out of the box. Beyond the database/sql package, there is also the handy https://github.com/jmoiron/sqlx
Postgres driver choices


github.com/jackc/pgx - nice type support
github.com/lib/pq - has been around for ages and is battle tested. Cloud SQL Proxy uses this.

DB setup and migrations

I like to do migrations externally myself. The libs always seemed overly complex and loaded with features that my teams did not need. If a library is needed then golang-migrate is worth a look.
Testing databases stuff


For testing, you can just use dockertest to spin up a Postgres container and run your tests against it directly. You can extend the default Postgres image to contain an initialization script that will create the user and initialize tables.

Make sure you can run go test ./... without that docker set up. Skip tests that require a database unless it's explicitly configured via environment variable or build tag IMO.
I’ve also use sqlite under test and postgresql in production and worked around the incompatibilities (few).

github.com/mattn/go-sqlite3 - sqlite3 driver conforming to the built-in database/sql interface.
github.com/lukasmartinelli/pgclimb - Export data from PostgreSQL into different data formats (JSON, JSON Lines, CSV, XLSX, XML) or use a Golang templates
github.com/lukasmartinelli/pgfutter - Import CSV and JSON into PostgreSQL the easy way


Some interesting opinions on Go and SQL

Google Cloud SQL


AppEngine has https://cloud.google.com/sql/docs/postgres/connect-app-engine
My understanding is that, outside of AppEngine and CloudRun, to connect to a Second Generation Cloud SQL database without having to deal with IP whitelisting or SSL certificates manually, you need to use the Cloud SQL Proxy. It works by opening unix/tcp sockets on the local machine and proxying connections to the associated Cloud SQL instances when the sockets are used.
cloudsql-proxy
Are you using the Cloud SQL proxy directly from a Go program? https://github.com/GoogleCloudPlatform/cloudsql-proxy/tree/master/proxy
If you're using the the Postgres lib/pq, you can use the cloudsqlpostgres driver from here. See example usage.

Postgres Connection Pooling In Go

Cloud SQL Postgres has connection limits.
Briefly, if the number of clients is more than ~100, the performance of Postgres degrades. That happens because of Postgres-side implementation quirks: launching a dedicated process for every connection, snapshot-taking mechanism, and using shared memory for interactions — all these factors are relevant. You can brush up on the reasons behind that in this brilliant article brandur.org/postgres-connections.
http://go-database-sql.org/connection-pool.html
Generally, try to opt not to use pgbouncer and you should instead prefer using go’s connection pool. PG Bouncer requires prepared statements and is a no-go IMO. On GCP, you are already connecting through their proxy connector.
That said, it is still challenging as you scale up your services talking to PostgreSQL.
Also challenging in that GCP doesn’t let you change the connection limit parameter for their managed postgresql service
A good step is to introspect either service count or connection count and SetMaxOpenConns based on it.
Need more scaling for Postgres?


CockroachDB is Postgres compatible:

https://www.cockroachlabs.com/blog/why-postgres/


yugabyte is another choice.

PostgreSQL and go demo app


A Journey to Postgres Productivity with Go - Slides - repo
How I use Postgres with Go
sqlc

Data stuff:

Parquet is a columnar format that's faster than csv:

https://github.com/xitongsys/parquet-go

NextCloud


https://docs.nextcloud.com/server/15/developer_manual/api.html
https://gitlab.bertha.cloud/partitio/Nextcloud-Partitio/gonextcloud
https://github.com/axllent/upload2dav
https://github.com/mschneider82/sharecmd
https://github.com/partitio/gonextcloud

Prometheus Metrics


https://github.com/frodenas/stackdriver_exporter
https://github.com/wrouesnel/postgres_exporter
https://prometheus.io/docs/instrumenting/exporters/
https://github.com/infinityworks/github-exporter

Graphql Client


https://github.com/shurcooL/graphql

Google Cloud Run

https://medium.com/google-cloud/how-to-run-serverless-batch-jobs-on-google-cloud-ca45a4e33cb1
https://medium.com/google-cloud/cloud-run-and-cloud-function-what-i-use-and-why-12bb5d3798e1
https://medium.com/google-cloud/cloud-run-vs-cloud-functions-whats-the-lowest-cost-728d59345a2e
https://medium.com/google-cloud/making-requests-to-cloud-run-with-the-service-account-620014dc1486
Google Cloud Composer

Google Cloud Composer is basically Apache Airflow. We handle a wide variety of workflows on a daily basis that require advanced tooling such as complex dependency management, fan-in and fan-out, and more. In addition, Airflow provides useful tools for logging, metrics, and monitoring.
Airflow, in its design, provides the wrong abstraction. Airflow Operators, instead of simply orchestrating work to be executed, actually implement some of the functional work themselves. This means that Airflow Operators inherently combine orchestration bugs with execution bugs.


Each step of this DAG is a different functional task, so each step is created using a different Airflow Operator. Developers must spend time researching, understanding, using, and debugging the Operator they want to use. This also means that each time a developer wants to perform a new type of task, they must repeat all of these steps with a new Operator. And, as we found at Bluecore, Operators are often buggy. Developer after developer moved a previously-working workflow over to Airflow only to have it brought down by an issue with an Airflow Operator itself.


Developers can quickly get started creating DAGs using the plug-and-play nature of the Operators, but in the face of any issues, the Operators themselves complicate root-cause analysis. This is because Operators themselves often handle the bulk of the work! This can range from creating connections, querying databases, parsing results, and more. Ultimately, this is abstracting away functionality that the developer should, ideally, totally understand!


Operators are executed on the Airflow workers themselves. The Airflow Scheduler, which runs on Kubernetes Pod A, will indicate to a Worker, which runs on Kubernetes Pod B, that an Operator is ready to be executed. At that point, the Worker will pick up the Operator and execute the work directly on Pod B. This will happen for every Operator that it executes. This means that all Python package dependencies from each workflow will need to be installed on each Airflow Worker for Operators to be executed successfully. Different workflows can have very different requirements. In the best case, this means soaking up valuable memory by loading all of the packages onto each Worker (and costing more money). In the worst case, Python package conflicts could prevent workflows from being run on the same Airflow instance altogether.
These are the issues that arise simply from using Airflow in its prescribed way.


In lieu of a growing list of functionality-specific Operators, we believe that there should be a single, bug-free Operator that would be able to execute any arbitrary task. This shift would allow us to separate workflow management from workflow execution, simplifying our understanding of Airflow and our ability to quickly debug issues.

Kubernetes Pod Operator
Cloud Dataflow operators run Apache Beam jobs in Cloud Dataflow.
Cloud Dataproc operators run Hadoop and Spark jobs in Cloud Dataproc.
Cloud Datastore operators read and write data in Cloud Datastore.
AI Platform operators run training and prediction jobs in AI Platform.
Cloud Storage operators read and write data in Cloud Storage.
https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753

Other Google Cloud stuff


https://cloud.google.com/blog/products/serverless/6-strategies-for-scaling-your-serverless-applications
https://blog.stephanbehnke.com/3-years-on-google-app-engine-an-epic-review/
https://github.com/ahmetb/cloud-run-faq
https://medium.com/google-cloud/how-to-run-serverless-batch-jobs-on-google-cloud-ca45a4e33cb1
https://tachingchen.com/blog/google-cloud-pubsub-pull-subscription/
https://cloud.google.com/go/getting-started/tutorial-app
https://cloud.google.com/go/getting-started/using-pub-sub
https://github.com/sinmetal/gcpsm
https://github.com/GoogleCloudPlatform/k8s-sqldb-operator
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/master/cloud-pubsub
https://github.com/GoogleCloudPlatform/golang-samples/blob/master/appengine_flexible/pubsub/pubsub.go
https://github.com/sinmetal/gcpsm
https://medium.com/swlh/using-pure-golang-for-google-cloud-bacc6b62e0ed

Interesting opinion stuff


https://medium.com/avitotech/how-to-work-with-postgres-in-go-bad2dabd13e4