You must be signed in to star a gist
Hosted Graphite are hiring an ops/automation engineer!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
|Two co-founders, 11 engineers, and traffic to our service has more than doubled in the last year.
|We run a hosted version of the popular Graphite open source metric and monitoring software, and we
|have customers all over the world.
|We need another back end engineer to help work on scaling, reliability and automation. We have over
|500 systems to manage now, mostly physical hardware, and automation is more important than ever.
|Instead of hiring a pure ops person, we want to hire someone capable of automating as much ops work
|as possible. More automation = more sleep.
|We're looking for a sysadmin/engineer who wants to be part of an early stage startup with all the
|ups, downs, risks and benefits that go with it. This is not a comfortable corporate job, but then
|there aren't any TPS reports or middle managers either...
|Significant Linux system administration experience. You need to know how to use package managers
|correctly and tools like tcpdump, lsof, mtr, rsync, iptables, ntp, strace, etc when diagnosing
|common application, system and network problems.
|Some puppet experience would be good - we've been using puppet since server #1 and we're pretty
|pleased with it so far.
|An eye for performance is important - your contributions will be exercised by more than
|fifty billion events per day. We always have to think about how something will scale and fail.
|We don't really care about your level of formal education, mathematical skill and so on. We want
|to see that you have relevant experience, that you like automating away repetitive work, that
|you have good attention to detail, an aptitude for learning new skills and that you have empathy
|for your team-mates and our customers.
|The job and the challenges
|While the frontend is three Django apps, we have more than ten different backend and internal
|services, and many of them talk to each other. We'll need your help to scale them individually,
|and to decide when to throw away and rebuild others. This is not your typical website and
|database scaling problem, though we have those too!
|While this role involves a lot of ops work, the biggest challenges come from how our traffic has
|doubled every year for the last few years, (and we expect this to continue) and how we need to
|continue automating deployments of our services across our clusters of servers. We need someone
|to help us identify weak points and to build auto-remediation tools for when things fail.
|We have eight riak clusters, which you'll need to learn to maintain. We use a lot of big redis
|instances. We're using serf for distributed service discovery/cluster management and we're trying
|to make our backend tolerate a failure without waking anybody up.
|Being in the on-call rotation is part of the job. That usually means not being more than a few
|minutes from an internet connection. Sometimes it means getting woken up by a phone at 4am. We
|have weeks go by with zero incidents, and other weeks with several. On-call always sucks, so
|we're interested in making it suck as little as possible.
|Every on call shift ends on a Friday morning with the rest of the day off, giving you a three day
|weekend that's not counted against your holiday allowance. We want relaxed, well rested ops people.
|Being on call does not mean watching graphs - nobody has time for that. We try to rely on our
|alerting and we try to only alert for actionable things that are already broken, or will be
|broken soon. As a monitoring company it's important that we constantly try to make sure our own
|monitoring is up to scratch too.
|Most of the team works out of the Dublin office, but we're flexible about working from home and one
|of our co-founders is living in the US, so we're partially remote and we have to be good at
|communicating. We use Slack, Google Docs, Trello, Workflowy and video chat tools like appear.in to
|keep in touch.
|Location and hours
|While we're a partially remote team, our office is in Dublin, Ireland and we'd like you to be
|there with most of the team. We have a bright, spacious office on Drury St in the city centre
|with many good lunch and transport options nearby.
|Our working hours are typically 1000-1800, but it varies by person.
|Once you've settled in you'll have the opportunity to work from home regularly.
|A competitive salary. 25 days of paid holiday, one day off after every on-call shift plus
|the usual 9 public holidays.
|Health insurance for you and your family.
|Since you'll be on-call, we'll pay your phone bill. We also provide a company laptop,
|typically a Macbook Air, but the brand/model is up for discussion.
|How to apply
|Tell jobs at hostedgraphite dot com about why your skills, experience and personality make
|you a good fit. If you want to submit a CV, make sure it's txt or pdf. We'd like to see
|some of your code, but it's not essential.
|No ninjas, rockstars or brogrammers, please; just nice, caring humans.
|We don't work with recruitment agencies.