02.01.2026 aktualisiert

**** ******** ****
Premiumkunde
100 % verfügbar

Data Engineer / Architect

Sulzbach, Deutschland
Weltweit
Master of Science Business Informatics (Focus: Data Mining)
Sulzbach, Deutschland
Weltweit
Master of Science Business Informatics (Focus: Data Mining)

Profilanlagen

profil_marc_brissier.pdf

Skills

  • Programming languages
    • Python
    • Scala
    • Java
    • JavaScript
    • SQL
    • (C#, R)
  • Framework/Tools
    • AWS S3, Sagemaker, Elastic Container Service
    • Google Cloud Platform BigQuery, Dataproc, Cloud Composer, Cloud Storage
    • Azure Data Lake Storage Gen2, Data Factory, DevOps, Databricks, SQL Database, Machine Learning Studio
    • Apache Spark / PySpark / MLlib / Structured Streaming
    • Databricks / Delta Lake
    • Docker
    • Kubernetes
    • Terraform
    • Apache Hive
    • Apache Kafka
    • Apache Airflow
    • HDFS
    • scikit-learn, pandas
    • PyTorch
    • JupyterLab, Jupyter Notebook, Apache Zeppelin
    • SSIS
    • React, React Native
    • Django
  • Databases
    • PostgreSQL
    • MongoDB
    • MS SQL Server

Data Engineering & Machine Learning Expert | Cloud Specialist | Full-Stack Developer
With over 8 years of experience as a technical consultant and freelancer, I specialize in designing and deploying production-ready Machine Learning pipelines and Data Lakes across top cloud platforms, including GCP, AWS, and Azure.
  • Proven Results: Successfully developed an advanced dashboard and reporting system to track carbon emissions, boosting data transparency and decision-making for stakeholders (2024).
  • Versatile Expertise: Led the delivery of two high-impact, data-driven products by seamlessly integrating data engineering with machine learning, providing clients with reliable predictive analytics (2022-2023).
  • Cloud Migration Success: Completed a full-scale migration to Google Cloud Platform (GCP) within a year, ensuring smooth transitions and system optimization (2021).
  • End-to-End Solutions: I bring proficiency in full-stack development, enabling me to contribute to every stage of the software development process with agility and precision.



Trainings and Certificates
  • AWS - Data Engineer (2024)
  • Convolutional Neural Networks with Pytorch
    • Udemy - 09.2021
  • Full week Apache Spark Training on Databricks
    • Databricks - 01.2020
  • Udacity Deep Learning Nanodegree
    • Udacity - 07.2019
  • CRT010: Databricks Certified Developer: Apache Spark 2.X  with Scala
    • Databricks - 02.2019
  • 3 full days Scrum training
    • inovex GmbH - 03.2018



Additional experience from private projects
  • Boda Wedding Web-App > see https://boda-app.de/
  • React Native App -> see https://mbrissier.github.io/scood/
  • I developed a ML pipeline able to extract nutritional facts from nutrition table pictures with an accuracy of 70 percent. Model: Convolutional Neural Network, Frameworks: PyTorch and PyTorch Mobile

Sprachen

DeutschMutterspracheEnglischverhandlungssicherSpanischgut

Projekthistorie

Data Engineer

Biggest german car marketplace

Internet und Informationstechnologie

250-500 Mitarbeiter

Migration to (Google Cloud Platform) GCP
  • Responsible for the technical migration to GCP, coming from an on-premise Hadoop environment with multiple Apache Spark (Scala) ETL jobs.
  • Finding architectural solutions for the following challenges and unknowns:
    • Dynamic partition overwrite with BigQuery
    • Connecting GCP to on-premise data stores like Kafka and Mongo Databases
    • Scheduling jobs in GCP with Cloud Composer (Apache Airflow)
    • Implementing deployment and testing in GCP
    • Introducing Terraform for a collaborative management of infrastructure as code
  • Reengineering of all on-premise Apache Spark (Scala)  ETL jobs and Jenkins pipelines for a GCP migration.
    • Creating Scala sbt build scripts for multiple projects and Jenkins pipelines for versioning and deployment of fat JARs and Airflow DAGs.
  • Orchestrating and executing the migration phase with various dependencies between Spark jobs, departments as data consumers and newly developed features by the team.
  • Introducing the new GCP setup to the developer team.

Big Data Scientist

inovex GmbH

Internet und Informationstechnologie

250-500 Mitarbeiter

Technical Consultant for Big Data projects.
  • At inovex I have been working as a technical consultant for different customers in the roles of Data Engineer or Machine Learning Engineer.
  • I worked on various projects with different technology stacks in on-premises and cloud environments.
  • I developed most of my skills in and around the following stacks and environments: Hadoop Ecosystem (especially Apache Spark), SciPy, Google Cloud Platform, AWS, Azure, on-premises clusters.
  • Please see the project history for more details. All real company names of the customers are restricted by a NDA.

Machine Learning Engineer

Biggest german private tv channel

Medien und Verlage

>10.000 Mitarbeiter

Creating a Data Science Process Framework

  • Evaluating different ML Frameworks for Experiment Tracking and Model Serving (e.g. MLFlow, Neptune, …).
  • Creating a custom Cookiecutter Data Science project template for a better project structure, git integration, python packaging, python environment management and secret management.
  • Setting up a private PyPI repository for Data Science project packages in (Google Cloud Platform) GCP.
  • Development of a central python api for an easier integration and deployment of ML artifacts to GCP - especially for GCS and BiqQuery.
  • Elaborating a Data Science process framework and establishing it with different Data Science departments in combination with the developed artifacts python api, PyPI repository, Data Science project template and selected ML Frameworks.

Data Engineer

German retailer

Konsumgüter und Handel

>10.000 Mitarbeiter

Realising a Hadoop Roadmap

  • Responsible for the project budget and the successful realisation of a Hadoop Roadmap with a team of three Data Engineers.
  • Developing and presenting a PoC together with a Principal Azure Cloud Architect from Microsoft as a blueprint for the planned migration to Azure.
  • Design of the new data lake architecture, conception and implementation of Apache Spark (PySpark) ETL jobs.
  • Lecturer of a company-wide Apache Spark 2.x (PySpark) training for the Data Engineering and Analytics team.
  • Training the Analytics team with the Apache Spark ML module for large-scale ML scenarios.
  • Reengineering of legacy Hadoop MapReduce Java processes by introducing a CI/CD pipeline based on Ansible and Bitbucket together with new build and test scripts.
  • Presenting different options for data security on HDFS within the MapR distribution.

Data Engineer

German intralogistics manufacturer

Industrie und Maschinenbau

>10.000 Mitarbeiter


Building a data lake on Azure with Databricks 
During the project I was working ~ two days a week for this customer.

  • Lead architect for the upcoming data lake in the Azure Cloud.
  • Configuration and Setup of Databricks in Azure
  • Training of the data team in python and Apache Spark (PySpark) within the Databricks development environment.
  • Usage of Databricks delta format for performant update, insert, delete capabilities in the new data lake.
  • Developing Apache Spark ETL jobs, Scheduling via Azure Data Factory
  • Setting Azure Data Lake Storage Gen2 as Storage Layer for Databricks
  • Configuration of an external Hive metastore with a new Azure SQL database.

Machine Learning Engineer

German private tv channel

Medien und Verlage

5000-10.000 Mitarbeiter

Implementing a large-scale ML-Pipeline in AWS

  • Reengineering of a consisting scikit-learn ML project with Apache Spark.
  • Implementing Scala Spark and PySpark jobs for preprocessing, training and prediction.
  • Deployment of the model to AWS by managing all related resources and handling the ML pipeline orchestration with Apache Airflow DAGs.
  • Enhancing the overall performance by using Scala (User Defined Function) UDFs and sparse vectors.
  • Identifying the best performing model via Sparks hyperparameter tuning (Model: random forest classifier).

Data Engineer

German private tv channel

Medien und Verlage

5000-10.000 Mitarbeiter

Building a Hadoop Data Lake

  • Implementing ETL jobs with Apache Spark (Scala) and Hive.
  • Ensuring the integrity and availability of different data access layers.
  • Orchestration and scheduling with Apache Airflow.
  • Developing Java-based download clients for heterogeneous data sources.
  • Building a Python-based crawler for thousands of websites and ingesting the data into Hive. Additionally returning the top k relevant websites for a given set of n keywords using spaCy and Spark MLlib (Algorithm: tf-idf).
  • Construction of an Apache Spark Streaming and Apache Zeppelin prototype. Presenting the prototype while ingesting and transforming live app data in front of 150 attendees (talking in english).

Machine Learning Engineer

German automotive OEM

Automobil und Fahrzeugbau

>10.000 Mitarbeiter

PoC: Prediction of currency fluctuations

  • Refactoring and debugging an existing R-based prediction project (Model: deep neural network).
  • Conception of a Azure SQL database structure as a prediction sink.
  • Implementing a Microsoft Azure Machine Learning pipeline including the data ingestion, R code execution and prediction storage.
  • Offering the predictions as a web service for a given day.
  • Cooperation with Microsoft as the Azure Machine Learning Studio was in an early release state.

Data Engineer

German publisher

Medien und Verlage

500-1000 Mitarbeiter

Building a export pipeline to AWS

  • Implementing the synchronisation between an on-premise SQL DB and Azure SQL DB with SSIS.
  • Converting millions of SQL DB rows to a JSON representation.
  • Building a custom SSIS component with C# for the upload to AWS SQS.
  • Ensuring the future pipeline maintenance by implementing scheduling, monitoring and auto alarming.

Data Engineer

German online retailer

Konsumgüter und Handel

>10.000 Mitarbeiter

Generating synthetic data

  • Design of an automatic anonymisation tool and data security concept in a challenging big data context
    Implementing Apache Spark (Scala) ETL jobs including the randomisation and hashing of personal data to generate synthetic data for machine learning projects
    Ensuring the data integrity with different layers of Apache Hive tables and views
    Configuration of permissions with Apache Sentry for Hive and ACLs for HDFS

Data Engineer

German private tv channel

Medien und Verlage

5000-10.000 Mitarbeiter

Enhancing a Cross-device analytics platform

  • Developing ETL jobs with Pentaho Data Integration and Apache Hive.
  • Ensuring the daily processing of heterogeneous sources and sinks like Google BigTable and MongoDB.
  • Building an automatic validation of the analytical outputs

Kontaktanfrage

Einloggen & anfragen.

Das Kontaktformular ist nur für eingeloggte Nutzer verfügbar.

RegistrierenAnmelden