Skip to content

Instantly share code, notes, and snippets.

@jhyland87
Last active October 26, 2018 18:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jhyland87/9b8c8ca3f807ddfe572d8a5e4ed13290 to your computer and use it in GitHub Desktop.
Save jhyland87/9b8c8ca3f807ddfe572d8a5e4ed13290 to your computer and use it in GitHub Desktop.
Question:
Tell me about a time that you strongly disagreed with your
manager on something you deemed to be very important to the
business. What was it about and how did you handle it?
Answer:
I think the most prevalent example would be the mentality in which my
previous manager and I had towards potential problems related to
monitoring, logging and security weaknesses. It seemed to me that he
had more of a reactive oriented mentality, where as I thought it was
in our best into be as proactive as possible. An example scenario
would be the lack of monitoring of our servers and applications. We
had a Nagios server setup already and actively monitoring some
services, but historically, services were only added to Nagios *after*
outages.
For example, we had MySQL replication setup in multiple environments,
and while the servers themselves were monitored for CPU and memory
utilization, ping and uptime, other vital services weren't monitored,
such as disk utiliation, the mysql daemon status or even the replication
status.
I feel I was vindicated when eventually the mount point that the
MySQL data was stored on reached full capacity, resulting in the
mysql service hanging and replication failing. While this may not
have taken down the production website directly, anything that relied
on the slave server was impacted, such as reports, backups, and some
automated scripts. After the disk usage and replication was resolved,
all mysql servers in every environment was added to Nagios, as well as
the replication status, and anything else that could hinder replication.
Question:
Which LP was the above question likely applicable to?
(https://www.amazon.jobs/principles)
Answers:
Principal: Insist on high standards - I've always believed that one of
the most important tools in maintaining a robust infrastructure and
preventing or remediating outages is to monitor as many services as
possible. Monitoring not only helps prevent outages, but the statistics
and historical data can also be invaluable.
Principal: Have Backbone; Disagree and Commit - I made it known (on more
than one occasion) that I believed monitoring core services (such as
MySQL and replication) was a critical aspect in keeping the infrastructure
and production services running smoothly and keeping downtime to a minimum,
and as such, it should be a priority. But I was ignored, probably because
I got busted downloading shit via Tor (CentOS OS, NOT porn), followed by
getting busted sharing that email thread with my BFF. FML.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment