Skip to content

Instantly share code, notes, and snippets.

@Winslett
Last active May 16, 2022 14:24
Show Gist options
  • Save Winslett/fe2ffe49f37cd9ae21abb1e53cdf0297 to your computer and use it in GitHub Desktop.
Save Winslett/fe2ffe49f37cd9ae21abb1e53cdf0297 to your computer and use it in GitHub Desktop.
Use SQL comments as a Stacktrace for database performance

Database Traceability: using SQL Comments like a stacktrace (+ a strong emotional connection to Ruby on Rails)

Which of my applications ran this SQL statement? Have you traced a SQL statement back to the source application? Of course you have! If you don't admit the adventure was time consuming, then, you either blocked the process of our your memory, you have a small codebase, or you were using SQL commenting.

SQL statements can hide in plain sight. They hide behind ORM code and distributed system built around a centralized database (no shame, we have all done it).

Using Ruby as an example below (BTW, I <3 Ruby; I will see you at RailsConf!), ORMs generally look like the following. If the source of your performance issues is the following, how would you know?

Message
	.where(receipient_id: params[:id])
	.includes(:receipient, :sender)
	.order(sent_at: :desc)

The SQL that is generated looks nothing like the ORM code that is written. The above code will output the following SQL statements:

-- get messages
SELECT "messages".* FROM "messages" WHERE "messages"."receipient_id" = $1 ORDER BY "messages"."sent_at" DESC

-- get receipient
SELECT "users".* FROM "users" WHERE "users"."id" = $1

-- get senders
SELECT "users".* FROM "users" WHERE "users"."id" IN (/* list of args whose length depends on number of senders*/)

But that's just Ruby, right? Node's Prisma …

I built a similar structure in the Prisma ORM, and ran the following code:

prisma.message.findMany({
    orderBy: [
    {
      sent_at: 'desc',
    }],
    where: {
      receipient_id: 10088,
    },
    include: {
      receipient: true,
      sender: true,
    }
  })

The queries generated look like the following:

-- get messages
SELECT "public"."Message"."id", "public"."Message"."message", "public"."Message"."receipient_id", "public"."Message"."sender_id", "public"."Message"."sent_at", "public"."Message"."created_at", "public"."Message"."updated_at" FROM "public"."Message" WHERE "public"."Message"."receipient_id" = $1 ORDER BY "public"."Message"."sent_at" DESC OFFSET $2

-- get receipient
SELECT "public"."User"."id", "public"."User"."email", "public"."User"."name", "public"."User"."created_at", "public"."User"."updated_at" FROM "public"."User" WHERE "public"."User"."id" IN ($1) OFFSET $2

-- get sender
SELECT "public"."User"."id", "public"."User"."email", "public"."User"."name", "public"."User"."created_at", "public"."User"."updated_at" FROM "public"."User" WHERE "public"."User"."id" IN (/* list of args whose length depends on number of senders*/) OFFSET $25

What do SQL comments do?

Enough with the ORM bashing! But, you get the point right? A system with high load, slow responses, a database log flowing with slow queries, then, a land-mine of a query is thrown into the database. How can we fix this? How can we point you in the right direction?

Use SQL Comments!

What would SQL comments look like?

-- get messages from Rails using 
SELECT "messages".* FROM "messages" WHERE "messages"."receipient_id" = $1 ORDER BY "messages"."sent_at" DESC /*application:CustomerFacingFrontEndApp,line:/app/controllers/messages_controller:5:in `<main>'*/

By looking at the comment, we know the query was run by executing line 5 of the MessageController from the CustomerFacingFrontEndApp. Query comments are recorded in your slow query logs and when using pg_stat_statements.

NICE! So, how can we do that? It all depends …

Python, Java, Node.js, & PHP:

  • sqlcommenter , this project has a wide-breadth, so it may or may not be plug-and-play for your environment, yet it is a good starting point.

Rails (which I love) has 2 native features in Rails 7:

Prisma:

GoSQL: write your own comments in the SQL :)

But wait, there is more!

Combine SQL Comments with the following Postgres features, and you'll level up your traceability:

  • pg_stat_statements - record query performance to a Postgres table
  • log_min_duration_statement - if a query takes longer than a specified time, record query in Postgres logs.
  • auto_explain - if a query takes longer than a specified time, record the query execution plan in Postgres logs.

Below, I've added the Ruby on Rails annotate method to track how a slow query is being executed:

Message
  .where(receipient_id: params[:id])
  .includes(:receipient, :sender)
  .order(sent_at: :desc)
  .annotate("Github Issue #345")

In my logging management system, I can go search for "Github Issue #345" (Of course, you should tie this back to your issue tracking system!). Your log management system would show the following:

LOG:  duration: 542.136 ms  plan:
	Query Text: SELECT "messages".* FROM "messages" WHERE "messages"."receipient_id" = $1 /* Github Issue #345 */ ORDER BY "messages"."sent_at" DESC
	Gather Merge  (cost=37979.87..37983.37 rows=30 width=68)
	  Workers Planned: 2
	  ->  Sort  (cost=36979.85..36979.89 rows=15 width=68)
	        Sort Key: sent_at DESC
	        ->  Parallel Seq Scan on messages  (cost=0.00..36979.56 rows=15 width=68)
	              Filter: (receipient_id = 10088)

Yuck! This query just performed a bunch of table-scans over 37,979 rows! Read more about (Demystifying Database Performance for Developers)[https://www.crunchydata.com/blog/demystifying-database-performance-for-developers]). Step #1, add indexes and re-measure. Step #2, measure and repeat.

Where to start?

Even if you don't have slow queries now, go ahead and start commenting and annotating your queries. Your future-self will thank your past-self.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment