Skip to content

Instantly share code, notes, and snippets.

@emileswarts
Last active July 21, 2020 13:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save emileswarts/6564a9adb60ae397d7ddf882b855f035 to your computer and use it in GitHub Desktop.
Save emileswarts/6564a9adb60ae397d7ddf882b855f035 to your computer and use it in GitHub Desktop.
Logging finale

We've spent quite a bit of time finessing the current system. Given it's been running for a while, and we have alarms, it's quite easy to tell whether we've broken something

Final upates

This means:

  1. Tightening up security
  2. Ensuring a good developer experience for the next team, also ensuring READMES are up to date
  3. Collaborating with OST to make sure the data that is coming through is in the correct format.

Found that some payloads weren't being decoded properly, so luckily because we tackled the integration quite early on, we had an opportunity to address that. Made sure the versions of Functionbeat we use are compatible with OST Improved tagging for navigating logs

Something that was quite time consuming was getting logs out of Shared services into OST, given it's provisioned differently to the other accounts (through CI)

We also sent over the architecture diagram to pentesters so they can start getting ready.

We've automated the retention policy on all data we keep to be 7 days, which is inline with DPIA and allows for disaster recovery if anything were to happen.

Demo

Yeah we did the demo to wider stakeholders, seemed to go well.

Palo Alto

The data sent by Palo Alto was confirmed by Tim, according to the data spreadsheet.

Log subscriptions

Subscribed to all the log sources that are needed by the OST.

Go live

We will be putting this system down for a bit as we start DNS / DHCP. When OST have stood up their production platform, we will tackle 4 things, in this order:

Monitoring and alerting

We've set up the alarms to notify us if anything goes wrong with any of the AWS services that make up this platform. We've been monitoring them for long enough to fine tune them so they don't go off because they are too sensitive.

Any lag or errors on the OST side, we will get alerts for, so we have quite a bit of visibility over their system too now I guess. When testing these alarms, we would go in and turn off various services to see how we were alerted. Was quite fun.

This wall all built in Terraform, so easily recreated.

These alerts are sent via email, and long term we imagine we will set up an integration with Service Now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment