Benjamin Bluhm verfügbar

Benjamin Bluhm

Senior Data Scientist

verfügbar
Profilbild von Benjamin Bluhm Senior Data Scientist aus Frankfurt
  • 60329 Frankfurt Freelancer in
  • Abschluss: PhD in Statistics/Econometrics
  • Stunden-/Tagessatz: 100 €/Std.
    Remote-Stundensatz / Onsite-Stundensatz abhängig von Ort und onsite-Tagen
  • Sprachkenntnisse: deutsch (Muttersprache) | englisch (verhandlungssicher)
  • Letztes Update: 07.04.2020
SCHLAGWORTE
PROFILBILD
Profilbild von Benjamin Bluhm Senior Data Scientist aus Frankfurt
DATEIANLAGEN
CV

Diese Anzeige ist nur für angemeldete Nutzer möglich.

SKILLS
  • Mehrjährige Projekterfahrung im Data Science und Big Data Umfeld in unterschiedlichen Branchen (Banken, Logistik, Telekommunikation, Retail, Pharma)
  • Prototyping und Operationalisierung von Predictive Analytics Lösungen auf Basis von Machine Learning Verfahren
  • Datenbereinigung und Umsetzung komplexer Feature Engineering Logiken auf Basis heterogener Datenquellen
  • Entwicklung von produktionsreifem Code in Python und R sowie tiefgehende Kenntnisse relevanter Bibliotheken (z.B. pandas, numpy, scikit-learn, dplyr, caret)
  • Entwicklung von skalierbaren workflows in Apache Spark, insbesondere PySpark sowie tiefgehende Kenntnisse in Spark SQL, Spark Dataframe API und Spark MLlib
  • Cloud-basierte Implementierung von skalierbaren Machine Learning Algorithmen über AWS EMR, S3, RDS, Batch, Sagemaker, Metaflow
  • Scrum / Arbeit in agilen Teams
  • Visualisierung von Ergebnissen mit Hilfe gängiger Bibliotheken wie z.B. matplotlib, plotly, ggplot
  • Sicherer Umgang mit gängigen Entwicklungsumgebungen wie z.B. PyCharm, Visual Studio Code, Jupyter und RStudio
PROJEKTHISTORIE
  • 06/2019 - bis jetzt

    • Boehringer Ingelheim
    • >10.000 Mitarbeiter
    • Pharma und Medizintechnik
  • Senior Consultant Data Science
  • Project lead on regional level analysis for the U.S. drug market. The objective is to create customer value by producing regional patient clusters as an input to optimize regional patient assistance programs. The project involves both data preparation tasks including feature engineering as well as model training using clustering algorithms for producing regions. The entire project is implemented in Python and Spark.

    • Developing and prototyping of different clustering algorithms in Jupyter using scikit-learn
    • Data exploration, joining and cleaning of different data sources containing patient-level pharmacy transactions using Spark SQL and Dataframe API
    • Creation of new features extracted from transaction raw data and feature aggregation to regional level using Spark SQL and Dataframe API
    • Visualization of results in plotly and matplotlib
    • Implementation of production-ready code in Visual Studio Code
    • Regular presentation of results to stakeholders in the U.S.
    • Key technologies: Python, PySpark, Jupyter, Visual Studio Code, Parquet, Plotly 

  • 06/2017 - 05/2019

    • REWE Systems
    • >10.000 Mitarbeiter
    • Konsumgüter und Handel
  • Senior Consultant Data Science
  • Design and implementation of a large-scale demand forecasting system using linear and non-linear regression models with the objective of improving product availability and reduce out-of-stock rates in REWE food stores.

    • Developing and prototyping of machine learning algorithms and classical statistical approaches for time series demand forecasting
    • Developing approaches to deal with typical time series patterns including outliers, seasonal patterns, structural breaks and holiday effects
    • Hyperparameter tuning using grid search and randomized search
    • Prototyping of potential new features to improve prediction accuracy
    • Implementation of a distributed machine learning system on a Hadoop cluster using PySpark und scikit-learn
    • Implementation of an interactive dashboard for monitoring of KPIs and productive models
    • Key technologies: Python, Pyspark, PyCharm, Jupyter, RStudio, RMarkdown, Zeppelin, HDFS, Drill, Parquet, Teradata, DB2 

  • 05/2017 - 06/2017

    • GLS Group
    • >10.000 Mitarbeiter
    • Transport und Logistik
  • Senior Consultant Data Science
  • Proof of concept design for recipient segmentation in order to optimize last mile package delivery using different clustering algorithms.

    • Data exploration, data cleansing and feature engineering in Python
    • Implementation of simple K-means algorithm / Extension to Gaussian Mixture Model to improve discriminatory power across clusters using probability thresholds 
    • Development of generic work flow to test and evaluate different clustering approaches in Zeppelin Notebook
    • Visualization of key results in ggplot library
    • Key technologies: Spark Mllib, PySpark, Zeppelin, RStudio

  • 02/2017 - 04/2017

    • Deutsche Telekom
    • >10.000 Mitarbeiter
    • Telekommunikation
  • Senior Consultant Data Science
  • Development of an application for predicting customer satisfaction on the basis of technical data from public WiFi hotspots.

    • Prototyping of forecasting algorithms using autoregressive time series models for predicting short-term dynamics of hotspot data 
    • Implementation of a distributed forecasting system using a Spark time series library 
    • Forecasts were used as input to a classification-algorithm to predict future customer satisfaction at different hotspot locations
    • Data loading and writing via Apache Cassandra 
    • Data Visualization in Zeppelin and R
    • Key technologies: Scala, Cassandra, Zeppelin, R

  • 10/2013 - 10/2015

    • European Central Bank
    • 5000-10.000 Mitarbeiter
    • Öffentlicher Dienst
  • Research Analyst
  • Development of an analytical toolset for the evaluation of statistical models used in the quarterly ECB’s projection exercise and contribution to the econometric research agenda in the ECB’s economic research department.

    • Implementation of a bayesian model averaging framework in R for data-driven identification of policy relevant factors using posterior model and inclusion probabilities 
    • Co-author of ECB working paper on panel data estimation in Bayesian setting (see publication list)
    • Co-author of ECB occasional paper on addressing model uncertainty using Bayesian Model Averaging  
    • Programming of a user-friendly R routine for BMA estimation which has been used in various projects at the ECB and the Bank of England 
    • Preparation and coordination of two ECB research workshops held in Madrid and Lisbon 
    • Key technologies: RStudio, Matlab

  • 10/2013 - 11/2014

    • Deutsche Bank
    • >10.000 Mitarbeiter
    • Banken und Finanzdienstleistungen
  • Research Analyst
  • Prototyping and implementation of statistical models to predict financial market trends to support data-driven investment decisions in the mutual fund segment of DWS asset management

    • Development of a machine learning algorithm for selecting the most important predictors among equity return drivers
    • Implementing a two-step procedure of linear regularization and dimensionality reduction approaches
    • Extensive evaluation of different in-sample estimation windows and forecasting horizons 
    • Key technologies: Matlab

KONTAKTANFRAGE VERSENDEN

Nachricht:

Absenderdaten:

WEITERE PROFILE IM NETZ