fabianabel/open-positions.md

## open-positions.md

      
    Raw
  

              open-positions.md
            
          
    Central Data Science Team at NEW WORK SE / XING


Open positions
Team
Topics that we work on
Technologies that we are using
Modus operandi: how we work
Contact

 
Open positions

Data Science at NEW WORK SE (= the mother company of XING) continues to grow. We are constantly seeking for Data Engineers and Data Scientists. At the moment, we have open positions in our central Data Science team, e.g. we are searching for:

(Junior, Mid-level or Senior) Data Engineers (Spain, Germany, Portugal) who would help us building up our Data Platform and Machine Learning Platform. See for example the following job ads: (Senior) Data Engineer, Junior Data Engineer, Data Platform Engineer
(Mid-level or Senior) Scala Developers (Spain, Germany, Portugal) who also contribute to deveoping search and recommender services and help us automating machine learning processes.

Team


we are 25 people and are a quite international team (people from Spain, Portugal, Russia, Turkey, Switzerland, US, Germany, Syria)
almost half of us work in Hamburg in the headquarter which is located in the harbour, 5 people work in XING's Barcelona office, 3 in/near to Valencia, 2 in Porto and 1 in Madrid.
we organize our team in smaller sub-teams that collaborate very closely

Topics that we work on


Search: everything related to search and discovery.
Personalization: Recommender Systems such as Members-You-May-Know recommendations, job recommendations, etc.
Data Quality: enriching existing data assets, entity recognition, text processing, various classification tasks, etc.
Data Science Platform: tooling and libraries for building data services and facilitating the usage of machine learning & Co.
Data Science Infrastructure: Hadoop infrastructure, Cassandra cluster, Kafka cluster and libraries that are required here and there

Topics that we will work on in 2022:

Data Platform: making it easier for developers to build data pipelines, modernize our search indexing infrastructure
Machine Learning platform: invest further into CI/CD for ML, improve Python support (e.g. making it easier to bring ML models to production), improve the ML development experience
Data Quality: improving the robustness of data pipelines (e.g. SLOs on data level), enhance monitoring and alerting for data quality

Technologies that we are using


Scala is our main programming language (for both service layer (= REST services) as well as for writing MapReduce batch jobs or custom Hive UDFs)
Python is used for analyses, data visualization and also for some of our workflows running in production.
Elasticsearch is our search technology/infrastructure
Hortonworks is the Hadoop distribution that we are using as infrastructure for batch processing. In particular, we use: Spark and Hive for both exploratory analyses as well as for workflows that are running in production, plain MapReduce, Oozie and an own Scala DSL for specifying Oozie workflows, ...
Cassandra is the key-value store that we often use for making pre-computed stuff available to the REST services that provide, e.g. MYMK recommendations, job recommendations & Co.
Kubernetes is used as deployment infrastructure on which we run our delivery services
Kafka and Akka come often into play when we do stream processing
...

Problems that we are dealing with


Machine learning problems ranging from classification problems to learning to rank tasks, hyper-parameter optimization and automated machine learning
Automating machine learning processes, CI/CD for machine learning
Search, personalized search, inferring meaning of search queries
Recommender systems: trying to predict in which items a XING user may be interested in
Realtime processing pippelines: building pipelines that process new incoming data on the fly
Scalable REST services: developing serach/recommende/... REST services that receive thounds of requests per second
Data quality: measuring and monitoring data quality and trying to enhance the value/utility of XING's data assets
Entity resolution: given some text we try to understand to which entity this text may refer to (e.g. location, XING member, company, jobrole, etc.)
Matching problems: (a) given an entity, try to find similar entities or (b) given two entities, try to find the commonalities between those entities
...

Modus operandi: how we work


we focus on getting things done and are flexible regarding the tools, methods, organizational structures, etc. that we use
we heavily run A/B tests to proof that new things really help our users
we try to automate as much as possible (including deployment of services, training of ML models, A/B testing; see e.g.: Automating ML for RecSys)
hierarchies are flat and there is no strict separation between job roles (e.g. Software Engineers are free to dive into data analyses and Data Scientists push code to production). In fact, all of us write code.
we collaborate with folks from universities (e.g. with our friends from CrowdRec or as part of the ACM RecSys Challenge that we organized 2016 and 2017)
we go/organize conferences, meetups, workshops and give talks
most of us drink more than 5 cups of coffee per day
we work on both own services/tools/products as well as on services that help other teams in the company to build nice products
our services/apps get a huge amount of traffic (around 500M requests per day). Hence, when joining our team, you can have quite a big impact on XING and our users (e.g. by building stuff that helps people to find a job, connect with other people, etc.)
See also what others think about working at NEW WORK SE / XING: reviews on Kununu for NEW WORK SE and XING

Contact

Feel free to drop me (Fabian Abel) a message, e.g. via email (pattern: firstname.lastname@xing.com).