jmnavarro/head-of-big-data-engineering.md

## head-of-big-data-engineering.md

      
    Raw
  

              head-of-big-data-engineering.md
            
          
    Who are we?

We're urbanData Analytics, a BigData & Machine Learning company settled in Madrid. They're not buzzwords, we use all of them every-single-day. If you want to know a bit more about us, you can watch this talk (or this other one in Spanish).
We develop software, data, and services for Real Estate industry, trying to provide a bit of light and transparency to a traditionally opaque sector. So we're on the wave of PropTech, one of the few big sectors on his way to being fully digitalized.
What are we looking for?

A Head of Big Data Engineer for our data team, mainly working on pipelines that process a huge amount of data.
Our stack uses:

Python: our lingua franca, used for several processes
Pandas: for data analysis and our communication language between data scientists and data engineers
Scala: we're moving some ETLs to Scala, in order to improve our type safety. We like functional Scala, but don't get crazy, please (:
Apache Beam and Scio executed on Google Dataflow: our main tools for big ETLs.
Kubernetes: to run containers with Python processes or Flask APIs.
Airflow: to orchestrate all workflows.
PostgreSQL and PostGIS: we are mad about them. We have some former CARTO key employees here. We also use other storage engines: Redis, Google Storage, Mysql, etc.
Message brokers: RabbitMQ, Google PubSub... because we cannot swallow everything at once!
CGP: yes, we run all our ETLs in the cloud.

What can we offer?


A key position in one of the most important teams of the company: you'll arrive at the perfect timing, being able to contribute from the very beginning.
Competitive salary + bonus.
Positive + inclusive + respectful working environment.
Comfortable office in Torre Europa, Madrid (Metro Santiago Bernabeu): fresh fruit, coffee... you know. Breathtaking sights from your desk.
Free home-made lunch every day served by Jbfood.
Full-remote is welcomed. Spanish timezone and frequent travels (once a week) are required.
PluralSight account to learn during your working hours.
A budget for conferences and events.

Do you fit?

We're looking for a proactive individual, passionate for the Big data world, being able to learn a lot and fast, but with a good track of Big data products.
Someone able to:

To make difficult decisions, evaluate alternatives and results, and fill the gaps if the result is not good enough (failing is part of the process...)
To understand data scientists and business people language and needs. Have always a product mindset and develop strategies to solve data problems in a consistent and robuts way.
To create technology from scratch, but without blindly following trends and buzzwords. You need to understand the benefits behind every technology or tool.
To deal with our current code base (not too old, just 2 years old): understand the reasons behind every workaround, accept the tradeoffs made, evolve it iteratively, and have an opinion about when is the right time to throw it away or to keep it (starting from scratch is not always an option).
To handle hundred of millions of rows in our databases, updated weekly with proper monitoring.
To operate and evolve our Kubernetes clusters (one for live API, the other for parallel data processing) and optimize the development environment based on Docker.
And last but not least, someone able to lead, inspirate and motivate a team of 4-6 data engineers.

How will be your day-to-day?


To be the Head of the Data Engineering team, mentoring and making grow a team of 4-6 data engineers.
To develop data pipelines: Apache Beam, Airflow, vanilla Python, SQL, or other tools that are suitable for the job.
Integrate quality checkings in your pipelines: test input data with preconditions, output data with postconditions and test your code with unit-testing best practices.
Monitor your pipelines runs: add alerts and notifications when execution was failed or input/output data don't meet the expectations.
Design and evolve our data lake (based on Google Storage)
Evaluate, choose and implement our next columnar database, based on business needs: Clickhouse? Bigquery?
To operate and evolve our Google Cloud infrastructure: Docker, Kubernetes, Stackdriver...
To evolve the current deployment for Machine Learning models (based on sharding by country).
Evaluate and choose tools for improve our Machine Learning experiements cycle: Michelangelo? MLFlow?
To automate deployments of code and data (we call it "data-deploy" because we need to deploy our datasets with the same confidence and control as we do with the code). Consider tools like Apache Nifi for data-deploys.
To be a technical reference in the team. Your code will be "the way to go" to other data engineers.
To help to evolve the data acquisition infrastructure: RabbitMQ, Scala, OpenVPN, Proxmox, Linux sysadmin...

How does it look like for the first months?

After the first month...


You've understood the overall architecture.
You've implemented your first tasks (features of bugfixes), and reduced tech debt in the short term.
You've understood our development process.
You've improved the development environment (based on Docker).
You know our main git repos, and where to commit each piece of code.
You've made at least 4 or 6 deployments to prod.
You feel confident enough to tackle bigger and more complex challenges.
As manager, you've met with your team and have a good idea about their curent skills.

After your first three months...


You were able to decide which parts need to be refactored, and which ones can stay for a few months more.
You've made your first architectural decisions: what's the best way to implement the monitoring and alerting system?
You've made at least one deploy every 2 or 3 days.
As manager, you've have more familiarity with your team and have a good idea about their motivations.

After your first nine months...


The architecture has changed a lot: more resilient, better monitoring, more scalable... thanks to your contribution and decisions!
Our data-deploys runs smoothly, monitored and alerts are raised when data doesn't meet the expected quality.
You've achieved a Continous Delivery model: when a branch hits master, it will be deployed right away.
You've improved the development process: better task tracking, better branching, better documentation process...
You're the go-to-guy of the team in terms of Big data development best practices.
You're able to breakdown the tasks required to implement new features: identify dependencies and the critical path for the implementation and decide who's the best that can implement each task.
As manager, you trust in your team and your team trust on your skills and intuition. You're able to inspirate them and make them grow.
You were a speaker in at least one conference, talking about our experience and stack.

I'm all in! What should I do?

If you think you have the commitment and the experience, don't hesitate!
Send your resume or Linkedin, Git(hub|Lab) to jm@urbandataanalytics.com and a brief explanation on why you want to join us!
Come on! Don't miss the train! 🚂