Skip to content

Instantly share code, notes, and snippets.

@kevindew
Last active February 12, 2018 12:51
Show Gist options
  • Save kevindew/b7259baebad6a1f3227531dba909b8fd to your computer and use it in GitHub Desktop.
Save kevindew/b7259baebad6a1f3227531dba909b8fd to your computer and use it in GitHub Desktop.

How it started

We seem to be seeing the 504s begin at 16:54:27 which seems to be the start of the snowball.

None of the content changes leading up to that seem particularly remarkable for the amount of subscription contents they generated:

irb(main):086:0> pp SubscriptionContent.where(content_change_id: 21300..21335, digest_run_subscriber_id: nil).joins(:content_change).group(:content_change_id, "content_changes.content_id", "content_changes.created_at").order("content_changes.created_at").count
{[21300, "25000916-b720-4734-ada9-5e132cfb531f", Thu, 08 Feb 2018 16:19:57 UTC +00:00]=>128,
 [21301, "c28088ef-056f-49f3-932f-8d52e2fc6094", Thu, 08 Feb 2018 16:21:52 UTC +00:00]=>128,
 [21302, "a2d1f8b0-7f1c-476e-818e-cc8de5ba0660", Thu, 08 Feb 2018 16:22:01 UTC +00:00]=>1142,
 [21303, "77ecb4f6-db29-4988-8480-76e3b851994a", Thu, 08 Feb 2018 16:23:55 UTC +00:00]=>128,
 [21304, "40045056-cd2e-4f7a-90be-3f7896153c6c", Thu, 08 Feb 2018 16:25:31 UTC +00:00]=>128,
 [21305, "7fccc33a-4df5-4df7-af61-48bb0d8407ed", Thu, 08 Feb 2018 16:27:00 UTC +00:00]=>128,
 [21306, "76125ba7-8ed6-4c5a-8cc0-fec717b2197c", Thu, 08 Feb 2018 16:28:38 UTC +00:00]=>128,
 [21307, "4d8cd561-9235-4463-ada2-182c9ebf6b97", Thu, 08 Feb 2018 16:30:07 UTC +00:00]=>128,
 [21308, "fd844695-2d93-4702-bd93-65211383c07e", Thu, 08 Feb 2018 16:31:14 UTC +00:00]=>861,
 [21309, "57d72545-48cb-49f2-8772-9cb9ff8ec672", Thu, 08 Feb 2018 16:32:42 UTC +00:00]=>737,
 [21310, "2553904c-e9a7-4afe-b193-7973d7d21171", Thu, 08 Feb 2018 16:32:57 UTC +00:00]=>2169,
 [21311, "e288bf95-c490-4d10-98cb-b6cc4f4f2e34", Thu, 08 Feb 2018 16:34:33 UTC +00:00]=>1448,
 [21312, "03be0dc5-c4be-4063-9d1a-116c54c74a4c", Thu, 08 Feb 2018 16:37:06 UTC +00:00]=>1491,
 [21313, "c90dd329-ea2a-413a-b927-a67bd3cbbb84", Thu, 08 Feb 2018 16:44:21 UTC +00:00]=>10001,
 [21314, "d04468d8-16a7-42a1-9f97-3508e6635b4e", Thu, 08 Feb 2018 16:44:43 UTC +00:00]=>451,
 [21315, "5fd954f4-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:46:07 UTC +00:00]=>1196,
 [21316, "e2bc228e-e675-4269-9871-bda0fdda3cde", Thu, 08 Feb 2018 16:47:27 UTC +00:00]=>730,
 [21317, "131e0adb-68e7-48a9-be6d-5896bcba51df", Thu, 08 Feb 2018 16:47:39 UTC +00:00]=>3728,
 [21318, "41011c51-ed1a-44c7-a533-0ebe9d3f92e6", Thu, 08 Feb 2018 16:48:03 UTC +00:00]=>2169,
 [21319, "5faa235a-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:49:06 UTC +00:00]=>1196,
 [21320, "834d05c3-7a29-4c18-a6eb-9653b826ee12", Thu, 08 Feb 2018 16:49:28 UTC +00:00]=>304,
 [21321, "67c92acb-7a04-4db0-b648-a15bd5980d98", Thu, 08 Feb 2018 16:51:00 UTC +00:00]=>1196,
 [21322, "5faa23f6-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:52:58 UTC +00:00]=>1196,
 [21323, "51959543-9fff-4768-a610-4af3e978378a", Thu, 08 Feb 2018 16:54:29 UTC +00:00]=>730,
 [21324, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:55:55 UTC +00:00]=>1196,
 [21325, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:55:56 UTC +00:00]=>1196,
 [21326, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:01 UTC +00:00]=>1196,
 [21327, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:08 UTC +00:00]=>1196,
 [21328, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:13 UTC +00:00]=>1196,
 [21329, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:18 UTC +00:00]=>1196,
 [21330, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:26 UTC +00:00]=>1196,
 [21331, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:30 UTC +00:00]=>1196,
 [21332, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:37 UTC +00:00]=>1196,
 [21333, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:40 UTC +00:00]=>1196,
 [21334, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:50 UTC +00:00]=>1196,
 [21335, "5faa3004-7631-11e4-a3cb-005056011aef", Thu, 08 Feb 2018 16:56:51 UTC +00:00]=>1196}

There don't seem to be remarkable amounts of any type of traffic before the problems emerge just a general build up.

Lots of emails sent to users were duplicated. Some emails may have been sent up 118 times

We may have had ~80 attempts to sign up to email lists which failed

504s per day

January 31st 2018, 00:00:00.000    3
February 1st 2018, 00:00:00.000    6
February 2nd 2018, 00:00:00.000    7,248
February 5th 2018, 00:00:00.000    164
February 6th 2018, 00:00:00.000    11,226
February 7th 2018, 00:00:00.000    58,327
February 8th 2018, 00:00:00.000    80,929

Content changes by hour

irb(main):016:0> pp ContentChange.where("created_at >= ? AND created_at < ?", Date.new(2018, 2, 8), Date.new(2018, 2, 9)).group("date_trunc('hour', created_at)").order("date_trunc('hour', created_at)").count
{"2018-02-08T00:00:00.000Z"=>3,
 "2018-02-08T03:00:00.000Z"=>1,
 "2018-02-08T05:00:00.000Z"=>2,
 "2018-02-08T06:00:00.000Z"=>4,
 "2018-02-08T07:00:00.000Z"=>9,
 "2018-02-08T08:00:00.000Z"=>16,
 "2018-02-08T09:00:00.000Z"=>79,
 "2018-02-08T10:00:00.000Z"=>44,
 "2018-02-08T11:00:00.000Z"=>36,
 "2018-02-08T12:00:00.000Z"=>30,
 "2018-02-08T13:00:00.000Z"=>20,
 "2018-02-08T14:00:00.000Z"=>40,
 "2018-02-08T15:00:00.000Z"=>16,
 "2018-02-08T16:00:00.000Z"=>131,
 "2018-02-08T17:00:00.000Z"=>302,
 "2018-02-08T18:00:00.000Z"=>1,
 "2018-02-08T20:00:00.000Z"=>1,
 "2018-02-08T21:00:00.000Z"=>1,
 "2018-02-08T22:00:00.000Z"=>2}

Emails sent per hour

irb(main):015:0> pp Email.where("created_at >= ? AND created_at < ?", Date.new(2018, 2, 8), Date.new(2018, 2, 9)).group("date_trunc('hour', created_at)").order("date_trunc('hour', created_at)").count
{"2018-02-08T00:00:00.000Z"=>3221,
"2018-02-08T03:00:00.000Z"=>2555,
"2018-02-08T05:00:00.000Z"=>3139,
"2018-02-08T06:00:00.000Z"=>1383,
"2018-02-08T07:00:00.000Z"=>3106,
"2018-02-08T08:00:00.000Z"=>52947,
"2018-02-08T09:00:00.000Z"=>76325,
"2018-02-08T10:00:00.000Z"=>77844,
"2018-02-08T11:00:00.000Z"=>36343,
"2018-02-08T12:00:00.000Z"=>47120,
"2018-02-08T13:00:00.000Z"=>9132,
"2018-02-08T14:00:00.000Z"=>62290,
"2018-02-08T15:00:00.000Z"=>25897,
"2018-02-08T16:00:00.000Z"=>59669,
"2018-02-08T17:00:00.000Z"=>317313,
"2018-02-08T18:00:00.000Z"=>142001,
"2018-02-08T19:00:00.000Z"=>161789,
"2018-02-08T20:00:00.000Z"=>2357,
"2018-02-08T21:00:00.000Z"=>3674,
"2018-02-08T22:00:00.000Z"=>10493}

Delivery Attempts updated by hour (status delivered)

Note callbacks stopped coming from Notify at 17:49:

irb(main):022:0> DeliveryAttempt.where(status: "delivered").maximum(:updated_at)
=> "2018-02-08T17:49:33.322Z"
irb(main):021:0> pp DeliveryAttempt.where(status: "delivered").where("updated_at >= ? AND updated_at < ?", Date.new(2018, 2, 8), Date.new(2018, 2, 9)).group("date_trunc('hour', created_at)").order("date_trunc('hour', created_at)").count
{"2018-02-08T00:00:00.000Z"=>3221,
"2018-02-08T03:00:00.000Z"=>2555,
"2018-02-08T05:00:00.000Z"=>3139,
"2018-02-08T06:00:00.000Z"=>1383,
"2018-02-08T07:00:00.000Z"=>3106,
"2018-02-08T08:00:00.000Z"=>52947,
"2018-02-08T09:00:00.000Z"=>75950,
"2018-02-08T10:00:00.000Z"=>77373,
"2018-02-08T11:00:00.000Z"=>37162,
"2018-02-08T12:00:00.000Z"=>47120,
"2018-02-08T13:00:00.000Z"=>9132,
"2018-02-08T14:00:00.000Z"=>62290,
"2018-02-08T15:00:00.000Z"=>25897,
"2018-02-08T16:00:00.000Z"=>59319,
"2018-02-08T17:00:00.000Z"=>81232}

Delivery attempts created by hour

irb(main):023:0> pp DeliveryAttempt.where("created_at >= ? AND created_at < ?", Date.new(2018, 2, 8), Date.new(2018, 2, 9)).group("date_trunc('hour', created_at)").order("date_trunc('hour', created_at)").count
{"2018-02-08T00:00:00.000Z"=>3221,
"2018-02-08T03:00:00.000Z"=>2555,
"2018-02-08T05:00:00.000Z"=>3139,
"2018-02-08T06:00:00.000Z"=>1383,
"2018-02-08T07:00:00.000Z"=>3106,
"2018-02-08T08:00:00.000Z"=>52947,
"2018-02-08T09:00:00.000Z"=>75977,
"2018-02-08T10:00:00.000Z"=>77373,
"2018-02-08T11:00:00.000Z"=>37162,
"2018-02-08T12:00:00.000Z"=>47120,
"2018-02-08T13:00:00.000Z"=>9132,
"2018-02-08T14:00:00.000Z"=>62290,
"2018-02-08T15:00:00.000Z"=>25897,
"2018-02-08T16:00:00.000Z"=>59669,
"2018-02-08T17:00:00.000Z"=>317335,
"2018-02-08T18:00:00.000Z"=>142001,
"2018-02-08T19:00:00.000Z"=>161789,
"2018-02-08T20:00:00.000Z"=>2357,
"2018-02-08T21:00:00.000Z"=>3674,
"2018-02-08T22:00:00.000Z"=>10493}

Content changes per content_id / public_updated_at

SELECT CONCAT(content_id, ' ', public_updated_at), COUNT(*)
FROM content_changes
WHERE created_at >= '2018-02-08 16:00' AND created_at < '2018-02-08 18:00'
GROUP BY CONCAT(content_id, ' ', public_updated_at);

                          concat                          | count
----------------------------------------------------------+-------
 03be0dc5-c4be-4063-9d1a-116c54c74a4c 2018-02-08 16:37:05 |     1
 bfd97e59-88ea-4568-a4a6-8b07474330b7 2018-02-08 17:53:18 |     1
 7d082591-2383-4c77-b880-82065b1e44d0 2018-02-08 17:34:01 |     9
 db0fa880-6f3b-44fe-875e-68a65351ce02 2018-02-08 17:21:47 |    30
 4fa65aaa-3d81-401f-8a77-bb3e9283cf52 2018-02-08 17:00:05 |     1
 fd20622d-63fc-4b61-92f4-708ef4869147 2018-02-08 16:02:03 |     1
 d04468d8-16a7-42a1-9f97-3508e6635b4e 2018-02-08 16:44:41 |     1
 5faa3004-7631-11e4-a3cb-005056011aef 2018-02-08 16:55:43 |    59
 a0396ea3-2acd-4f1a-9294-86bdc4a9b5e7 2018-02-08 16:59:02 |    11
 68fff272-d57c-4532-ba36-2e616db29f08 2018-02-08 16:04:34 |     1
 fcf794e6-bb85-494e-ad99-e35bd88b4547 2018-02-08 16:18:35 |     1
 7fccc33a-4df5-4df7-af61-48bb0d8407ed 2018-02-08 16:27:00 |     1
 baf007ba-0e86-4f2c-a0e0-3a114cfdc3a4 2018-02-08 16:56:57 |    28
 5faa235a-7631-11e4-a3cb-005056011aef 2018-02-08 16:49:03 |     1
 5fd954f4-7631-11e4-a3cb-005056011aef 2018-02-08 16:46:02 |     1
 e288bf95-c490-4d10-98cb-b6cc4f4f2e34 2018-02-08 16:34:32 |     1
 a2d1f8b0-7f1c-476e-818e-cc8de5ba0660 2018-02-08 16:22:00 |     1
 696c511c-a5a6-491d-b226-4d6a64af7f03 2018-02-08 17:25:13 |    42
 57d72545-48cb-49f2-8772-9cb9ff8ec672 2018-02-08 16:32:41 |     1
 77ecb4f6-db29-4988-8480-76e3b851994a 2018-02-08 16:23:55 |     1
 e2bc228e-e675-4269-9871-bda0fdda3cde 2018-02-08 16:47:25 |     1
 4b5a6a08-40b9-449c-b13d-9b58bd824ad6 2018-02-08 17:01:03 |     1
 25000916-b720-4734-ada9-5e132cfb531f 2018-02-08 16:19:57 |     1
 40045056-cd2e-4f7a-90be-3f7896153c6c 2018-02-08 16:25:31 |     1
 3a7e1996-6bb8-4afd-bb2d-7d9ce46fe63a 2018-02-08 17:53:12 |     1
 5faa23f6-7631-11e4-a3cb-005056011aef 2018-02-08 16:52:55 |     1
 4d8cd561-9235-4463-ada2-182c9ebf6b97 2018-02-08 16:30:07 |     1
  b944addf-9f0d-47a1-9126-78adb45cd055 2018-02-08 16:56:57 |    46
 b0324068-5cef-4c24-9cee-9817f9d5c59f 2018-02-08 17:17:19 |     9
 4202a493-9fa8-437c-ad8d-f433f66d8ba2 2018-02-08 17:53:00 |     1
 ca16b0bd-f6b3-49b1-9885-015d4d6ec027 2018-02-08 16:12:31 |     1
 76125ba7-8ed6-4c5a-8cc0-fec717b2197c 2018-02-08 16:28:38 |     1
 67c92acb-7a04-4db0-b648-a15bd5980d98 2018-02-08 16:50:58 |     1
 fd844695-2d93-4702-bd93-65211383c07e 2018-02-08 16:31:13 |     1
 c90dd329-ea2a-413a-b927-a67bd3cbbb84 2018-02-08 16:43:40 |     1
 7a82e1f8-1de7-4ee5-89be-37ced7eb8212 2018-02-08 17:00:05 |     1
 e4ad31ae-10e4-4e84-8dc9-72d3eef5248e 2018-02-08 17:53:31 |     1
 c28088ef-056f-49f3-932f-8d52e2fc6094 2018-02-08 16:21:52 |     1
 31ceb582-bc44-40f1-821c-0e6acb1fbdb8 2018-02-08 17:53:06 |     1
 2553904c-e9a7-4afe-b193-7973d7d21171 2018-02-08 16:32:55 |     1
 5c9d4d26-7631-11e4-a3cb-005056011aef 2018-02-08 16:57:57 |    24
 131e0adb-68e7-48a9-be6d-5896bcba51df 2018-02-08 16:47:38 |     1
 41011c51-ed1a-44c7-a533-0ebe9d3f92e6 2018-02-08 16:48:02 |     1
 51959543-9fff-4768-a610-4af3e978378a 2018-02-08 16:54:24 |     1
 834d05c3-7a29-4c18-a6eb-9653b826ee12 2018-02-08 16:49:26 |     1
 8cfdbefe-570e-426b-b53c-4d568446db4c 2018-02-08 16:19:48 |     1
 15c219da-80fe-4af0-b484-2de2cfb3ca16 2018-02-08 17:53:24 |     1
 a84f2d1c-a891-4ba4-b4c7-50adb34e6e46 2018-02-08 16:09:34 |     1
 5fa8a97a-7631-11e4-a3cb-005056011aef 2018-02-08 16:58:12 |    18
 a7e9e0fd-20d5-471e-ba4b-0d325e4951b6 2018-02-08 17:06:08 |   118
(50 rows)

Subscription contents for each content change (plucking out duplicates)

SELECT COUNT(*)
FROM
(
        SELECT DISTINCT CONCAT(content_id, ' ', public_updated_at) AS uniq, id
        FROM content_changes
        WHERE created_at >= '2018-02-08 16:00' AND created_at < '2018-02-08 18:00'
        ORDER BY id ASC
) AS cc
INNER JOIN subscription_contents AS sc ON sc.content_change_id = cc.id AND sc.digest_run_subscriber_id IS NULL;

 count
--------
 679778
(1 row)
SELECT cc.uniq, COUNT(*)
FROM
(
        SELECT DISTINCT CONCAT(content_id, ' ', public_updated_at) AS uniq, id
        FROM content_changes
        WHERE created_at >= '2018-02-08 16:00' AND created_at < '2018-02-08 18:00'
        ORDER BY id ASC
) AS cc
INNER JOIN subscription_contents AS sc ON sc.content_change_id = cc.id AND sc.digest_run_subscriber_id IS NULL
GROUP BY cc.uniq;

                            uniq                           | count
----------------------------------------------------------+--------
 03be0dc5-c4be-4063-9d1a-116c54c74a4c 2018-02-08 16:37:05 |   1491
 bfd97e59-88ea-4568-a4a6-8b07474330b7 2018-02-08 17:53:18 |   1160
 7d082591-2383-4c77-b880-82065b1e44d0 2018-02-08 17:34:01 |  13401
 db0fa880-6f3b-44fe-875e-68a65351ce02 2018-02-08 17:21:47 |  28980
 4fa65aaa-3d81-401f-8a77-bb3e9283cf52 2018-02-08 17:00:05 |    697
 fd20622d-63fc-4b61-92f4-708ef4869147 2018-02-08 16:02:03 |   1591
 d04468d8-16a7-42a1-9f97-3508e6635b4e 2018-02-08 16:44:41 |    451
 5faa3004-7631-11e4-a3cb-005056011aef 2018-02-08 16:55:43 |  70564
 a0396ea3-2acd-4f1a-9294-86bdc4a9b5e7 2018-02-08 16:59:02 |  11517
 68fff272-d57c-4532-ba36-2e616db29f08 2018-02-08 16:04:34 |   1279
 fcf794e6-bb85-494e-ad99-e35bd88b4547 2018-02-08 16:18:35 |    128
 7fccc33a-4df5-4df7-af61-48bb0d8407ed 2018-02-08 16:27:00 |    128
 baf007ba-0e86-4f2c-a0e0-3a114cfdc3a4 2018-02-08 16:56:57 |  53088
 5faa235a-7631-11e4-a3cb-005056011aef 2018-02-08 16:49:03 |   1196
 5fd954f4-7631-11e4-a3cb-005056011aef 2018-02-08 16:46:02 |   1196
 e288bf95-c490-4d10-98cb-b6cc4f4f2e34 2018-02-08 16:34:32 |   1448
 a2d1f8b0-7f1c-476e-818e-cc8de5ba0660 2018-02-08 16:22:00 |   1142
 696c511c-a5a6-491d-b226-4d6a64af7f03 2018-02-08 17:25:13 | 310548
 57d72545-48cb-49f2-8772-9cb9ff8ec672 2018-02-08 16:32:41 |    737
 77ecb4f6-db29-4988-8480-76e3b851994a 2018-02-08 16:23:55 |    128
 e2bc228e-e675-4269-9871-bda0fdda3cde 2018-02-08 16:47:25 |    730
 4b5a6a08-40b9-449c-b13d-9b58bd824ad6 2018-02-08 17:01:03 |   4036
 25000916-b720-4734-ada9-5e132cfb531f 2018-02-08 16:19:57 |    128
 40045056-cd2e-4f7a-90be-3f7896153c6c 2018-02-08 16:25:31 |    128
 3a7e1996-6bb8-4afd-bb2d-7d9ce46fe63a 2018-02-08 17:53:12 |   1160
 5faa23f6-7631-11e4-a3cb-005056011aef 2018-02-08 16:52:55 |   1196
 4d8cd561-9235-4463-ada2-182c9ebf6b97 2018-02-08 16:30:07 |    128
 b944addf-9f0d-47a1-9126-78adb45cd055 2018-02-08 16:56:57 |  33580
 b0324068-5cef-4c24-9cee-9817f9d5c59f 2018-02-08 17:17:19 |  23760
 4202a493-9fa8-437c-ad8d-f433f66d8ba2 2018-02-08 17:53:00 |   1160
 76125ba7-8ed6-4c5a-8cc0-fec717b2197c 2018-02-08 16:28:38 |    128
 ca16b0bd-f6b3-49b1-9885-015d4d6ec027 2018-02-08 16:12:31 |   1121
 67c92acb-7a04-4db0-b648-a15bd5980d98 2018-02-08 16:50:58 |   1196
 c90dd329-ea2a-413a-b927-a67bd3cbbb84 2018-02-08 16:43:40 |  10001
 fd844695-2d93-4702-bd93-65211383c07e 2018-02-08 16:31:13 |    861
 7a82e1f8-1de7-4ee5-89be-37ced7eb8212 2018-02-08 17:00:05 |   1554
 e4ad31ae-10e4-4e84-8dc9-72d3eef5248e 2018-02-08 17:53:31 |   1160
 c28088ef-056f-49f3-932f-8d52e2fc6094 2018-02-08 16:21:52 |    128
 31ceb582-bc44-40f1-821c-0e6acb1fbdb8 2018-02-08 17:53:06 |   1160
 2553904c-e9a7-4afe-b193-7973d7d21171 2018-02-08 16:32:55 |   2169
 5c9d4d26-7631-11e4-a3cb-005056011aef 2018-02-08 16:57:57 |  17784
 131e0adb-68e7-48a9-be6d-5896bcba51df 2018-02-08 16:47:38 |   3728
 41011c51-ed1a-44c7-a533-0ebe9d3f92e6 2018-02-08 16:48:02 |   2169
 51959543-9fff-4768-a610-4af3e978378a 2018-02-08 16:54:24 |    730
 834d05c3-7a29-4c18-a6eb-9653b826ee12 2018-02-08 16:49:26 |    304
 15c219da-80fe-4af0-b484-2de2cfb3ca16 2018-02-08 17:53:24 |   1160
 8cfdbefe-570e-426b-b53c-4d568446db4c 2018-02-08 16:19:48 |   4624
 a84f2d1c-a891-4ba4-b4c7-50adb34e6e46 2018-02-08 16:09:34 |   3165
 5fa8a97a-7631-11e4-a3cb-005056011aef 2018-02-08 16:58:12 |  21528
 a7e9e0fd-20d5-471e-ba4b-0d325e4951b6 2018-02-08 17:06:08 |  38232
(50 rows)

Subscription contents associated with content changes

SELECT COUNT(*)
FROM subscription_contents
INNER JOIN content_changes ON subscription_contents.content_change_id = content_changes.id
WHERE content_changes.created_at >= '2018-02-08 16:00' AND content_changes.created_at < '2018-02-08 18:00'
AND subscription_contents.digest_run_subscriber_id IS NULL;

 count
--------
 679778
(1 row)
SELECT content_changes.content_id, content_changes.public_updated_at, COUNT(*)
FROM subscription_contents
INNER JOIN content_changes ON subscription_contents.content_change_id = content_changes.id
WHERE content_changes.created_at >= '2018-02-08 16:00' AND content_changes.created_at < '2018-02-08 18:00'
AND subscription_contents.digest_run_subscriber_id IS NULL
GROUP BY content_changes.content_id, content_changes.public_updated_at
ORDER BY content_changes.public_updated_at;

              content_id              |  public_updated_at  | count
--------------------------------------+---------------------+--------
 fd20622d-63fc-4b61-92f4-708ef4869147 | 2018-02-08 16:02:03 |   1591
 68fff272-d57c-4532-ba36-2e616db29f08 | 2018-02-08 16:04:34 |   1279
 a84f2d1c-a891-4ba4-b4c7-50adb34e6e46 | 2018-02-08 16:09:34 |   3165
 ca16b0bd-f6b3-49b1-9885-015d4d6ec027 | 2018-02-08 16:12:31 |   1121
 fcf794e6-bb85-494e-ad99-e35bd88b4547 | 2018-02-08 16:18:35 |    128
 8cfdbefe-570e-426b-b53c-4d568446db4c | 2018-02-08 16:19:48 |   4624
 25000916-b720-4734-ada9-5e132cfb531f | 2018-02-08 16:19:57 |    128
 c28088ef-056f-49f3-932f-8d52e2fc6094 | 2018-02-08 16:21:52 |    128
 a2d1f8b0-7f1c-476e-818e-cc8de5ba0660 | 2018-02-08 16:22:00 |   1142
 77ecb4f6-db29-4988-8480-76e3b851994a | 2018-02-08 16:23:55 |    128
 40045056-cd2e-4f7a-90be-3f7896153c6c | 2018-02-08 16:25:31 |    128
 7fccc33a-4df5-4df7-af61-48bb0d8407ed | 2018-02-08 16:27:00 |    128
 76125ba7-8ed6-4c5a-8cc0-fec717b2197c | 2018-02-08 16:28:38 |    128
 4d8cd561-9235-4463-ada2-182c9ebf6b97 | 2018-02-08 16:30:07 |    128
 fd844695-2d93-4702-bd93-65211383c07e | 2018-02-08 16:31:13 |    861
 57d72545-48cb-49f2-8772-9cb9ff8ec672 | 2018-02-08 16:32:41 |    737
 2553904c-e9a7-4afe-b193-7973d7d21171 | 2018-02-08 16:32:55 |   2169
 e288bf95-c490-4d10-98cb-b6cc4f4f2e34 | 2018-02-08 16:34:32 |   1448
 03be0dc5-c4be-4063-9d1a-116c54c74a4c | 2018-02-08 16:37:05 |   1491
 c90dd329-ea2a-413a-b927-a67bd3cbbb84 | 2018-02-08 16:43:40 |  10001
 d04468d8-16a7-42a1-9f97-3508e6635b4e | 2018-02-08 16:44:41 |    451
 5fd954f4-7631-11e4-a3cb-005056011aef | 2018-02-08 16:46:02 |   1196
 e2bc228e-e675-4269-9871-bda0fdda3cde | 2018-02-08 16:47:25 |    730
 131e0adb-68e7-48a9-be6d-5896bcba51df | 2018-02-08 16:47:38 |   3728
 41011c51-ed1a-44c7-a533-0ebe9d3f92e6 | 2018-02-08 16:48:02 |   2169
 5faa235a-7631-11e4-a3cb-005056011aef | 2018-02-08 16:49:03 |   1196
 834d05c3-7a29-4c18-a6eb-9653b826ee12 | 2018-02-08 16:49:26 |    304
 67c92acb-7a04-4db0-b648-a15bd5980d98 | 2018-02-08 16:50:58 |   1196
 5faa23f6-7631-11e4-a3cb-005056011aef | 2018-02-08 16:52:55 |   1196
 51959543-9fff-4768-a610-4af3e978378a | 2018-02-08 16:54:24 |    730
 5faa3004-7631-11e4-a3cb-005056011aef | 2018-02-08 16:55:43 |  70564
 b944addf-9f0d-47a1-9126-78adb45cd055 | 2018-02-08 16:56:57 |  33580
 baf007ba-0e86-4f2c-a0e0-3a114cfdc3a4 | 2018-02-08 16:56:57 |  53088
 5c9d4d26-7631-11e4-a3cb-005056011aef | 2018-02-08 16:57:57 |  17784
 5fa8a97a-7631-11e4-a3cb-005056011aef | 2018-02-08 16:58:12 |  21528
 a0396ea3-2acd-4f1a-9294-86bdc4a9b5e7 | 2018-02-08 16:59:02 |  11517
 7a82e1f8-1de7-4ee5-89be-37ced7eb8212 | 2018-02-08 17:00:05 |   1554
 4fa65aaa-3d81-401f-8a77-bb3e9283cf52 | 2018-02-08 17:00:05 |    697
 4b5a6a08-40b9-449c-b13d-9b58bd824ad6 | 2018-02-08 17:01:03 |   4036
 a7e9e0fd-20d5-471e-ba4b-0d325e4951b6 | 2018-02-08 17:06:08 |  38232
 b0324068-5cef-4c24-9cee-9817f9d5c59f | 2018-02-08 17:17:19 |  23760
 db0fa880-6f3b-44fe-875e-68a65351ce02 | 2018-02-08 17:21:47 |  28980
 696c511c-a5a6-491d-b226-4d6a64af7f03 | 2018-02-08 17:25:13 | 310548
 7d082591-2383-4c77-b880-82065b1e44d0 | 2018-02-08 17:34:01 |  13401
 4202a493-9fa8-437c-ad8d-f433f66d8ba2 | 2018-02-08 17:53:00 |   1160
 31ceb582-bc44-40f1-821c-0e6acb1fbdb8 | 2018-02-08 17:53:06 |   1160
 3a7e1996-6bb8-4afd-bb2d-7d9ce46fe63a | 2018-02-08 17:53:12 |   1160
 bfd97e59-88ea-4568-a4a6-8b07474330b7 | 2018-02-08 17:53:18 |   1160
 15c219da-80fe-4af0-b484-2de2cfb3ca16 | 2018-02-08 17:53:24 |   1160
 e4ad31ae-10e4-4e84-8dc9-72d3eef5248e | 2018-02-08 17:53:31 |   1160
(50 rows)

Content changes with duplicates

Forgive the grim query wasn't sure how to alias it 🙃

SELECT
  (SELECT id FROM content_changes AS cc_id WHERE cc_id.content_id = cc.content_id AND cc_id.public_updated_at = cc.public_updated_at ORDER BY created_at DESC LIMIT 1),
  (SELECT created_at FROM content_changes AS cc_id WHERE cc_id.content_id = cc.content_id AND cc_id.public_updated_at = cc.public_updated_at ORDER BY created_at DESC LIMIT 1),
  cc.content_id,
  cc.public_updated_at,
  COUNT(*)
FROM content_changes AS cc
INNER JOIN content_changes AS cc2
  ON cc.content_id = cc2.content_id
  AND cc.public_updated_at = cc2.public_updated_at
  AND cc.id != cc2.id
GROUP BY cc.content_id, cc.public_updated_at
ORDER BY (SELECT created_at FROM content_changes AS cc_id WHERE cc_id.content_id = cc.content_id AND cc_id.public_updated_at = cc.public_updated_at ORDER BY created_at DESC LIMIT 1);

  id   |         created_at         |              content_id              |  public_updated_at  | count
-------+----------------------------+--------------------------------------+---------------------+-------
 21091 | 2018-02-08 09:34:28.194236 | 5e323e1a-7631-11e4-a3cb-005056011aef | 2018-02-08 09:34:17 |     2
 21093 | 2018-02-08 09:34:44.788541 | 5e323aa1-7631-11e4-a3cb-005056011aef | 2018-02-08 09:34:33 |     2
 21098 | 2018-02-08 09:42:25.641314 | 0cbe45ad-dd3b-4ab3-a32d-dd3c9139dfe3 | 2018-02-08 09:41:35 |     2
 21111 | 2018-02-08 10:02:37.758382 | 5ad4c51b-d802-44b1-8cd6-4107d5097489 | 2018-02-08 10:01:48 |     2
 21112 | 2018-02-08 10:02:38.53114  | c679efd6-97af-493d-ad8a-95524274d0f5 | 2018-02-08 10:02:24 |     2
 21134 | 2018-02-08 10:31:10.368531 | 6c0f189f-7a7d-4fe2-9b03-39d5de742ee0 | 2018-02-08 10:31:02 |     2
 21251 | 2018-02-08 14:04:05.546118 | 6ae77661-de1b-4391-ae6b-b85cb65a476e | 2018-02-08 14:03:46 |    12
 21266 | 2018-02-08 14:37:58.42666  | b578fab3-95f1-4287-a8d7-97ccf267927d | 2018-02-08 14:37:22 |     2
 21269 | 2018-02-08 14:38:46.634347 | 678cc26f-766a-4105-af1f-916040d8cd62 | 2018-02-08 14:37:58 |     2
 21424 | 2018-02-08 16:59:58.853122 | baf007ba-0e86-4f2c-a0e0-3a114cfdc3a4 | 2018-02-08 16:56:57 |   756
 21469 | 2018-02-08 17:01:27.597638 | 5faa3004-7631-11e4-a3cb-005056011aef | 2018-02-08 16:55:43 |  3422
 21470 | 2018-02-08 17:01:28.822403 | b944addf-9f0d-47a1-9126-78adb45cd055 | 2018-02-08 16:56:57 |  2070
 21497 | 2018-02-08 17:02:26.754877 | 5c9d4d26-7631-11e4-a3cb-005056011aef | 2018-02-08 16:57:57 |   552
 21502 | 2018-02-08 17:02:36.593478 | a0396ea3-2acd-4f1a-9294-86bdc4a9b5e7 | 2018-02-08 16:59:02 |   110
 21512 | 2018-02-08 17:03:17.743744 | 5fa8a97a-7631-11e4-a3cb-005056011aef | 2018-02-08 16:58:12 |   306
 21630 | 2018-02-08 17:16:28.023858 | a7e9e0fd-20d5-471e-ba4b-0d325e4951b6 | 2018-02-08 17:06:08 | 13806
 21639 | 2018-02-08 17:18:07.231665 | b0324068-5cef-4c24-9cee-9817f9d5c59f | 2018-02-08 17:17:19 |    72
 21669 | 2018-02-08 17:24:24.816013 | db0fa880-6f3b-44fe-875e-68a65351ce02 | 2018-02-08 17:21:47 |   870
 21711 | 2018-02-08 17:28:58.239512 | 696c511c-a5a6-491d-b226-4d6a64af7f03 | 2018-02-08 17:25:13 |  1722
 21720 | 2018-02-08 17:34:52.951955 | 7d082591-2383-4c77-b880-82065b1e44d0 | 2018-02-08 17:34:01 |    72
(20 rows)

500-599 responses during 16:00 to 18:00 to sign up to email

Source: https://kibana.logit.io/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2018-02-08T16:00:00.000Z',mode:absolute,to:'2018-02-08T18:00:00.000Z'))&_a=(columns:!(request,status,beat.hostname,govuk_request_id,request_time),index:'*-*',interval:h,query:(query_string:(analyze_wildcard:!t,query:'status:%20%5B500%20TO%20599%5D%20AND%20NOT%20application:%20email-alert-api-public.publishing.service.gov.uk-json.event.access%20AND%20NOT%20application:%20email-alert-api.publishing.service.gov.uk-json.event.access%20AND%20request:%20*email*%20AND%20tags:%20nginx%20AND%20beat.hostname:%20cache*')),sort:!('@timestamp',asc),uiState:(spy:(mode:(fill:!f,name:table))))

@timestamp per hour 	Count 
February 8th 2018, 16:00:00.000	25
February 8th 2018, 17:00:00.000	56

Delivery Attempts created just before the errors began

irb(main):010:0> pp Email.where("created_at >= ? AND created_at < ?", Time.new(2018, 2, 8, 16), Time.new(2018, 2, 8, 17)).group("date_trunc('minute', created_at)").count
{"2018-02-08T16:34:00.000Z"=>1449,
 "2018-02-08T16:09:00.000Z"=>3166,
 "2018-02-08T16:04:00.000Z"=>1280,
 "2018-02-08T16:55:00.000Z"=>1,
 "2018-02-08T16:22:00.000Z"=>1001,
 "2018-02-08T16:25:00.000Z"=>129,
 "2018-02-08T16:12:00.000Z"=>1122,
 "2018-02-08T16:44:00.000Z"=>1002,
 "2018-02-08T16:54:00.000Z"=>3898,
 "2018-02-08T16:28:00.000Z"=>129,
 "2018-02-08T16:57:00.000Z"=>4019,
 "2018-02-08T16:47:00.000Z"=>1002,
 "2018-02-08T16:37:00.000Z"=>1492,
 "2018-02-08T16:18:00.000Z"=>129,
 "2018-02-08T16:31:00.000Z"=>862,
 "2018-02-08T16:58:00.000Z"=>4030,
 "2018-02-08T16:33:00.000Z"=>2170,
 "2018-02-08T16:59:00.000Z"=>3030,
 "2018-02-08T16:23:00.000Z"=>3151,
 "2018-02-08T16:50:00.000Z"=>2000,
 "2018-02-08T16:56:00.000Z"=>5010,
 "2018-02-08T16:45:00.000Z"=>1000,
 "2018-02-08T16:46:00.000Z"=>1001,
 "2018-02-08T16:48:00.000Z"=>1000,
 "2018-02-08T16:27:00.000Z"=>129,
 "2018-02-08T16:02:00.000Z"=>1592,
 "2018-02-08T16:51:00.000Z"=>1001,
 "2018-02-08T16:32:00.000Z"=>738,
 "2018-02-08T16:20:00.000Z"=>1001,
 "2018-02-08T16:49:00.000Z"=>1003,
 "2018-02-08T16:30:00.000Z"=>129,
 "2018-02-08T16:53:00.000Z"=>7001,
 "2018-02-08T16:21:00.000Z"=>1002,
 "2018-02-08T16:52:00.000Z"=>3000}

Delivery Attempts updated at the errors began

irb(main):007:0> pp DeliveryAttempt.where("updated_at >= ? AND updated_at < ?", Time.new(2018, 2, 8, 16), Time.new(2018, 2, 8, 17)).group("date_trunc('minute', created_at)").count
{"2018-02-08T15:58:00.000Z"=>1,
 "2018-02-08T15:59:00.000Z"=>425,
 "2018-02-08T16:02:00.000Z"=>1592,
 "2018-02-08T16:04:00.000Z"=>1280,
 "2018-02-08T16:10:00.000Z"=>3166,
 "2018-02-08T16:12:00.000Z"=>1122,
 "2018-02-08T16:18:00.000Z"=>129,
 "2018-02-08T16:20:00.000Z"=>1001,
 "2018-02-08T16:21:00.000Z"=>1002,
 "2018-02-08T16:22:00.000Z"=>1001,
 "2018-02-08T16:23:00.000Z"=>3151,
 "2018-02-08T16:25:00.000Z"=>129,
 "2018-02-08T16:27:00.000Z"=>129,
 "2018-02-08T16:28:00.000Z"=>129,
 "2018-02-08T16:30:00.000Z"=>129,
 "2018-02-08T16:31:00.000Z"=>862,
 "2018-02-08T16:32:00.000Z"=>738,
 "2018-02-08T16:33:00.000Z"=>2170,
 "2018-02-08T16:34:00.000Z"=>1449,
 "2018-02-08T16:37:00.000Z"=>1492,
 "2018-02-08T16:44:00.000Z"=>972,
 "2018-02-08T16:45:00.000Z"=>1030,
 "2018-02-08T16:46:00.000Z"=>1001,
 "2018-02-08T16:47:00.000Z"=>1002,
 "2018-02-08T16:48:00.000Z"=>1000,
 "2018-02-08T16:49:00.000Z"=>1003,
 "2018-02-08T16:50:00.000Z"=>1828,
 "2018-02-08T16:51:00.000Z"=>1172,
 "2018-02-08T16:52:00.000Z"=>2603,
 "2018-02-08T16:53:00.000Z"=>6940,
 "2018-02-08T16:54:00.000Z"=>4131,
 "2018-02-08T16:55:00.000Z"=>1,
 "2018-02-08T16:56:00.000Z"=>2534,
 "2018-02-08T16:57:00.000Z"=>12,
 "2018-02-08T16:58:00.000Z"=>57,
 "2018-02-08T16:59:00.000Z"=>261}

Content Changes sent in the minutes before errors

SELECT content_changes.id, CONCAT(content_changes.content_id, ' ', content_changes.public_updated_at), COUNT(*)
FROM delivery_attempts
INNER JOIN emails ON delivery_attempts.email_id = emails.id
INNER JOIN subscription_contents ON subscription_contents.email_id = emails.id
INNER JOIN content_changes ON subscription_contents.content_change_id = content_changes.id
WHERE delivery_attempts.created_at >= '2018-02-08T16:52:00' AND delivery_attempts.created_at < '2018-02-08T16:56'
GROUP BY content_changes.id, CONCAT(content_changes.content_id, ' ', content_changes.public_updated_at);

  id   |                          concat                          | count
-------+----------------------------------------------------------+-------
 21322 | 5faa23f6-7631-11e4-a3cb-005056011aef 2018-02-08 16:52:55 |  1196
 21313 | c90dd329-ea2a-413a-b927-a67bd3cbbb84 2018-02-08 16:43:40 |  1001
 21317 | 131e0adb-68e7-48a9-be6d-5896bcba51df 2018-02-08 16:47:38 |  3728
 21314 | d04468d8-16a7-42a1-9f97-3508e6635b4e 2018-02-08 16:44:41 |   451
 21321 | 67c92acb-7a04-4db0-b648-a15bd5980d98 2018-02-08 16:50:58 |  1196
 21315 | 5fd954f4-7631-11e4-a3cb-005056011aef 2018-02-08 16:46:02 |  1196
 21320 | 834d05c3-7a29-4c18-a6eb-9653b826ee12 2018-02-08 16:49:26 |   304
 21318 | 41011c51-ed1a-44c7-a533-0ebe9d3f92e6 2018-02-08 16:48:02 |  2169
 21319 | 5faa235a-7631-11e4-a3cb-005056011aef 2018-02-08 16:49:03 |  1196
 21323 | 51959543-9fff-4768-a610-4af3e978378a 2018-02-08 16:54:24 |   730
 21316 | e2bc228e-e675-4269-9871-bda0fdda3cde 2018-02-08 16:47:25 |   730
(11 rows)

Cause of duplicate emails being sent out

Email Alert API appeared to be functioning correctly and didn't produe additional requests to Govdelivery. However Email Alert Service was having problems communicating with Email Alert API and was receiving 504 responses, by retrying these Email Alert API was populated with duplicate notifications which it duly sent out.

Cause of requests to Email Alert API returning 504

The calls from Notify to /status-updates have shown evidence of receiving 504 responses on a number of prior occassions. Therefore it seems to be a common pattern that a large volume of emails could cause 504s. If this occurs during requests coming in this can also cause 504s which caused the re-attempts of content-change requests.

Load balancing may be sending too much stuff to backend-3

This doesn't seem to be the case according to these graphs

Resolution Ideas

  • Return a 409 conflict response in Email Alert API if we receive a duplicate content change - DONE
  • Isolate email-alert-api public endpoints to run on separate app so to limit affects
  • Perform testing in staging environment to see what we can actually handle request wise and set rate-limiting accordingly (this was tested previously - so may need adjusting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment