Profilbild von Anonymes Profil, Senior Data Engineer / MLOps
verfügbar

Letztes Update: 01.04.2024

Senior Data Engineer / MLOps

Abschluss: nicht angegeben
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: deutsch (Muttersprache) | englisch (verhandlungssicher) | französisch (Grundkenntnisse)

Dateianlagen

CV-Jan-Brusch-Senior-Data-Engineer-MLOps_151223.pdf

Skills

Ich bin Senior Data Engineer mit Hintergrund in der Software-Entwicklung und im Projektmanagement.

Technisch liegt mein besonderer Schwerpunkt auf dem Apache Toolstack: Spark, Kafka, Flink, airflow, Hadoop, Hive, Cassandra. Viel Erfahrung habe ich ebenfalls in Cloud Umgebungen, hauptsächlich dabei mit AWS, und cloudnativen Technologien: kubernetes, docker, terraform und ansible.

Über das technische hinaus zeichne ich mich durch einen guten Blick für das große ganze, starke Kommunikation, Leadership und interkulturelle Kompetenz aus. 

Projekthistorie

04/2023 - bis jetzt
Industrial Machine Monitoring
Industrial Machine Supplier (Industrie und Maschinenbau, >10.000 Mitarbeiter)

A manufacturer and supplier of industrial machines wants to provide an additional after sales service to
customers: Real-Time machine monitoring that can notify operators on the shop floor in case of suspicious
sensor readings. Customers can configure rule based monitoring for their shop floor in a custom backend. The
analytics engine is also able to produce real time insights based on advanced analytics. As an additional
challenge, as more and more customers adopt this service, the DevOps side of the analytics engine has to be
migrated from a monolithic environment into a flexible, cloud based setup in order to account for the individual
requirements and challenges that each single customer provides.

Key Achievements:
• Senior / Lead Developer in a team of 6 people
• Lead Research and Conception for Infrastructure Redesign that:
◦ Distributes the monolithic analytics engine into a fleet of small, independent AWS native
deployments
◦ Reduced the end-to-end processing time of a machine monitoring message by up to 80%
• Conception and Development of new Features

Technologies: Apache Flink, Java, Apache Maven, Docker, AWS Kinesis, AWS Kinesis Analytics, AWS
DynamoDB, AWS CDK, AWS Cloudwatch, AWS EKS, AWS Lambda, AWS Athena, kubernetes, helm, AWS
EC2, TypeScript, AngularJS, Python
Real

07/2022 - 12/2023
Cluster Migration of Internal Data Warehouse
(Konsumgüter und Handel, 500-1000 Mitarbeiter)

As data volumes continue to grow for eCommerce companies and the number of data consumers within the organization is increasing, sometimes old infrastructure will not be able to keep up with the challenges. Additionally, In this particular case, the computing and warehousing cluster has to be on-premise for data security reasons. After new cluster-infrastructure had been provided by an external provider, all data warehouse and computing logic has to be migrated from the old infrastructure to the new infrastructure. An additional challenge is to maintain backwards compatibility of the migrated processes at all times.
Key Achievements:
  • Migration and Deployment of 30+ airflow DAGs with 20 – 50 Tasks each on new infrastructure
  • Co-development of a python client library for Apache Livy that is used by 100+ airflow tasks
  • Deployment of 20+ Apache Hive databases with 10 – 50 tables each in three Data Warehouse layers via Ansible
  • Code review of 5-10 merge requests per week
Technologies: Apache airflow, Python, Apache Hive, Apache Spark, PySpark, Apache Livy, Apache Hadoop, Ansible

01/2023 - 03/2023
Real Time Trading Application
Stock Exchange (Banken und Finanzdienstleistungen, 50-250 Mitarbeiter)

The volume of financial transactions on trade exchanges is steadily increasing as well as the execution speed of
the transactions. FinTech Companies like trade exchanges, market makers and traders have to be able to conduct
their business in this currently accelerating environment. In order to do this, robust software is needed that can
automatically facilitate most trades on top of a robust data streaming infrastructure. Exceptional trades have to
be routed to human traders in order to correct the price and facilitate a manual trade.

Key Achievements:
• Fixed a long running bug in the desktop trading application. In-house traders are now able to finalize up
to 10 additional manual trade opportunities per day
• Integrated 3 additional international marketplaces into the automated trading and market making process
• Enabled Automated Security Testing for > 5 applications, enabling engineers to spot and close more than
10 previously undetected security issues

Technologies: Java, JUnit, Apache Maven, Python, Jython, TIBCO, MySQL, PostgresQL, Gitlab CI, Gitlab
Static Application Security Testing (SAST), Jenkins, JIRA, Confluence

02/2020 - 06/2022
Platform for Real Time Fraud Detection in eCommerce
neuland - Büro für Informatik (Internet und Informationstechnologie, 50-250 Mitarbeiter)

In order to prevent financial and reputational loss in eCommerce platforms an automated detection of fraud patterns in online shops is needed. The software should be able to scale out over multiple shop systems and data sources. Further requirements are monitoring traffic in real time and incorporating expert knowledge alongside machine learning models.

Key Achievements:

  • Lead design of the platform

  • Implementation of a proof of concept from which 80% of code made the first product iteration

  • Technical Lead for a team of 5 Developers

  • Successful deployment and zero downtime operations on customer premises at around 15 million events per day

  • Design of cloud based testing environment that can be brought up in less than 15 minutes (Infrastructure as Code) and handle up to 10 times of production workload

Technologies: Apache Flink, Apache Kafka, Redis, Terraform, AWS, kubernetes, helm docker, Datadog


06/2018 - 12/2021
Webtracking Event Pipeline with snowplow in AWS
(Konsumgüter und Handel, 500-1000 Mitarbeiter)

For an eCommerce Platform it is crucial to have a detailed picture of customer behaviour on which business decisions can be based. Either in real-time or from the data warehouse. For that a flexible, scalable, and field-testet solution is necessary which can run in the cloud. Additionally, all browser events need a custom enrichment with business information from the backend in order to provide necessary context e.g. for „Add to Cart“-events.

Key Achievements:

  • Integration of snowplow event-pipeline in cloud based shop architecture

  • Day to day operations of event-pipeline at ca. 4 million events per day

  • Co-Engineering of custom enrichment in the webshop backend (ca. 1000+ lines of code) and handover of ownership to the backend team

  • Setup of custom real time event monitoring (< 1s latency) with elasticsearch and kibana

  • Setup of custom scheduling and deployment processes for 5 components of the snowplow event-pipeline

Technologies: snowplow, kubernetes, amazon EMR, amazon kinesis, amazon redshift, Apache airflow, kibana, elasticsearch, NodeJS, Gitlab CI


01/2018 - 06/2021
Product Recommendation Engines: Collaborative Filtering and Item Similarity with Neural Nets
(Konsumgüter und Handel, 500-1000 Mitarbeiter)

To enrich the shopping experience of the customer and to drive additional sales, the eCommerce platform should be able to recommend customers additional products. Two orthogonal strategies are employed: Product based similiarity based on neural network embeddings and collaborative filtering based on user behaviour. Additionally, Performance monitoring for the recommendations is needed.

Key Achievements:

  • Productionize both models based on proof of concepts by ML engineer including data aquisition, running of the model and data output

  • Scheduling and operations of productionized models, including 3 different code bases and more than 5 regularly scheduled jobs

  • Operationalization of 10+ performance metrics over 5 dashboards for stakeholders

Technologies: Python keras, Python pandas, amazon EMR, Apache Mahout, amazon Redshift, apache airflow, apache superset


01/2018 - 06/2021
ETL-Pipeline with Apache Airflow und kubernetes
(Konsumgüter und Handel, 500-1000 Mitarbeiter)

A datadriven company needs to have a reliable and scalable infrastructure as a key components of the corporate decision making. Engineers as well as analysts need to be enabled to create ETL-processes and ad-hoc reports without the need to consult with a data engineer. The data architecture of the company needs to provide scalability, clear separation between testing and production and ease of use.

Key Achievements:

  • Leading conception of cloud based infrastructure based on the above requirements

  • Initial training of 5 developers an onboarding of more than 10 developers since

  • Initial setup and operation of apache airflow with intially ca. 10 jobs, scaling up to more than 100 regular scheduled jobs at present

Technologies: Apache airflow, kubernetes, docker, AWS, gitlab CI


08/2017 - 01/2021
A/B-Testing Plattform
(Konsumgüter und Handel, 500-1000 Mitarbeiter)

In order to enable an eCommerce organization to become a datadriven organization there must be (among other things) a framework present to compare different version of the website against each other. Many members of the organization and departments need to be able to create and conduct experiments without the assistance of a data engineer. Anther important factor for the framework was the usage Bayesian statistics.

Key Achievements:

  • Leading Conception of testing framework including randomization logic, statistical modelling and grapical presentation in the frontend

  • Implementation of proof of concept for statistical engine

  • Implementation of production code for frontend, backend, statistical engine

  • Training of stakeholders from 3 different departments in methodology and statistical background of A/B-testing

Technologies: python PyMC3, Python SciPy, Apache Spark, Python pySpark, Apache airflow, docker, kubernetes, VueJS, Redshift


01/2015 - 01/2017
Back End Developer
(Konsumgüter und Handel, 500-1000 Mitarbeiter)

Backend Development for a large eCommerce System with Java, SQL and NodeJS.

Reisebereitschaft

Weltweit verfügbar
Meine Arbeitsweise ist Fulltime Remote. Einzelne Präsenztage können aber natürlich immer gerne vereinbart werden.
Profilbild von Anonymes Profil, Senior Data Engineer / MLOps Senior Data Engineer / MLOps
Registrieren