Skip to content

Instantly share code, notes, and snippets.

@fabianabel
Last active October 7, 2021 12:30
Show Gist options
  • Save fabianabel/2ab1069ea2dd78a5ec807c9f57ffe75c to your computer and use it in GitHub Desktop.
Save fabianabel/2ab1069ea2dd78a5ec807c9f57ffe75c to your computer and use it in GitHub Desktop.
Data Science - Open positions

Central Data Science Team at NEW WORK SE / XING

new-work xing

Open positions

Data Science at NEW WORK SE (= the mother company of XING) continues to grow. We are constantly seeking for Data Engineers and Data Scientists. At the moment, we have open positions in our central Data Science team, e.g. we are searching for:

  • (Junior, Mid-level or Senior) Data Engineers (Spain, Germany, Portugal) who would help us building up our Data Platform and Machine Learning Platform. See for example the following job ads: (Senior) Data Engineer, Junior Data Engineer, Data Platform Engineer
  • (Mid-level or Senior) Scala Developers (Spain, Germany, Portugal) who also contribute to deveoping search and recommender services and help us automating machine learning processes.

Team

  • we are 25 people and are a quite international team (people from Spain, Portugal, Russia, Turkey, Switzerland, US, Germany, Syria)
  • almost half of us work in Hamburg in the headquarter which is located in the harbour, 5 people work in XING's Barcelona office, 3 in/near to Valencia, 2 in Porto and 1 in Madrid.
  • we organize our team in smaller sub-teams that collaborate very closely

Topics that we work on

  • Search: everything related to search and discovery.
  • Personalization: Recommender Systems such as Members-You-May-Know recommendations, job recommendations, etc.
  • Data Quality: enriching existing data assets, entity recognition, text processing, various classification tasks, etc.
  • Data Science Platform: tooling and libraries for building data services and facilitating the usage of machine learning & Co.
  • Data Science Infrastructure: Hadoop infrastructure, Cassandra cluster, Kafka cluster and libraries that are required here and there

Topics that we will work on in 2022:

  • Data Platform: making it easier for developers to build data pipelines, modernize our search indexing infrastructure
  • Machine Learning platform: invest further into CI/CD for ML, improve Python support (e.g. making it easier to bring ML models to production), improve the ML development experience
  • Data Quality: improving the robustness of data pipelines (e.g. SLOs on data level), enhance monitoring and alerting for data quality

Technologies that we are using

  • Scala is our main programming language (for both service layer (= REST services) as well as for writing MapReduce batch jobs or custom Hive UDFs)
  • Python is used for analyses, data visualization and also for some of our workflows running in production.
  • Elasticsearch is our search technology/infrastructure
  • Hortonworks is the Hadoop distribution that we are using as infrastructure for batch processing. In particular, we use: Spark and Hive for both exploratory analyses as well as for workflows that are running in production, plain MapReduce, Oozie and an own Scala DSL for specifying Oozie workflows, ...
  • Cassandra is the key-value store that we often use for making pre-computed stuff available to the REST services that provide, e.g. MYMK recommendations, job recommendations & Co.
  • Kubernetes is used as deployment infrastructure on which we run our delivery services
  • Kafka and Akka come often into play when we do stream processing
  • ...

Problems that we are dealing with

  • Machine learning problems ranging from classification problems to learning to rank tasks, hyper-parameter optimization and automated machine learning
  • Automating machine learning processes, CI/CD for machine learning
  • Search, personalized search, inferring meaning of search queries
  • Recommender systems: trying to predict in which items a XING user may be interested in
  • Realtime processing pippelines: building pipelines that process new incoming data on the fly
  • Scalable REST services: developing serach/recommende/... REST services that receive thounds of requests per second
  • Data quality: measuring and monitoring data quality and trying to enhance the value/utility of XING's data assets
  • Entity resolution: given some text we try to understand to which entity this text may refer to (e.g. location, XING member, company, jobrole, etc.)
  • Matching problems: (a) given an entity, try to find similar entities or (b) given two entities, try to find the commonalities between those entities
  • ...

Modus operandi: how we work

  • we focus on getting things done and are flexible regarding the tools, methods, organizational structures, etc. that we use
  • we heavily run A/B tests to proof that new things really help our users
  • we try to automate as much as possible (including deployment of services, training of ML models, A/B testing; see e.g.: Automating ML for RecSys)
  • hierarchies are flat and there is no strict separation between job roles (e.g. Software Engineers are free to dive into data analyses and Data Scientists push code to production). In fact, all of us write code.
  • we collaborate with folks from universities (e.g. with our friends from CrowdRec or as part of the ACM RecSys Challenge that we organized 2016 and 2017)
  • we go/organize conferences, meetups, workshops and give talks
  • most of us drink more than 5 cups of coffee per day
  • we work on both own services/tools/products as well as on services that help other teams in the company to build nice products
  • our services/apps get a huge amount of traffic (around 500M requests per day). Hence, when joining our team, you can have quite a big impact on XING and our users (e.g. by building stuff that helps people to find a job, connect with other people, etc.)
  • See also what others think about working at NEW WORK SE / XING: reviews on Kununu for NEW WORK SE and XING

Contact

Feel free to drop me (Fabian Abel) a message, e.g. via email (pattern: firstname.lastname@xing.com).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment