Profilbild von Rudy Pastel Data Scientist aus Munich

Rudy Pastel

verfügbar

Letztes Update: 01.06.2021

Data Scientist

Abschluss: Promotion im Bereich der angewandten Mathematik
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: deutsch (verhandlungssicher) | englisch (verhandlungssicher) | französisch (Muttersprache) | spanisch (Grundkenntnisse)

Dateianlagen

Lebenslauf.docx
Résumé.docx
Development of a R based ETL solution to replace MS SSIS.docx

Skills

Erfahrung
Fach- und Markenübergreifende Beratung, Datenanalyse und Entwicklung von Prototypen.
  • Fachbereichen: Marketing, Buchhaltung, After sales
  • Mathematik: Zeitreihenvorhersage, Aua?lligkeitserkennung, Assoziationsanalyse
  • IT: Entwicklung von R Paketen und von Shiny Visualisierungen
Betreuung und Weiterbildung von neuen Data Scientists
  • R und Shiny Referenz des Teams
  • Der von mir betreute Praktikant wurde eingestellt
Koordination von 10+ Data Scientists zwischen 07/2015 und 07/2016
  • Technische Einscha?tzung von Bewerbern und dessen U?bereinstimmungen mit dem Team
  • Fu?hrung der wo?chentlichen Teamsitzung und Vertretung von Management bei Kapazita?tsplanung
  • Aufbau der Kompetenzmatrix und Auflistung der gewu?nschten Weiterbildungen
Mathematik
  • Statistik: Lineare Regression, Logistische Regression, Elastic-net, Zeitreihenanal
  • Maschinelles Lernen: Entscheidungsbaum, Random Forest, Support Vector Machine, clustering, Na?chste-Nachbarn-Klassifikation, Bagging, Cross-Validation
  • Data Mining: priori, frequent itemset mining, frequent sequence mining
IT-skills

R[3 years] Paketenentwicklung 
Shiny[3 years] Taught best practices
Git[3 years] Systematic use of BitBucket
SQL[2 years] Basic ETL

Sprachkenntisse

Franzo?sisch: Muttersprache
Englisch: verhandlungssicher C2
Deutsch:  Fließend C1.2 – DSH 3
Spanisch: fortgeschritten A2

Projekthistorie

03/2019 - bis jetzt
Development of a R based ETL solution to replace Microsoft SSIS
(Versicherungen, >10.000 Mitarbeiter)

At the heart of the reinsurance business lies the transfer of risk and premium fees.

  1. Insurance takers such as individuals insuring their cars and businesses insuring their stocks, transfer their risks to insurers against premiums.

  2. Insurers themselves transfer their risks to reinsurers against a premium.

  3. Reinsurers transfer their risks to financial markets for a premium.

The transfers manifest as a constant flow of data to and from reinsurers.

 

The client,  the international life reinsurance business unit of an international reinsurer from Bavaria, processes data from various sources such as first insurers, market quote providers and trade simulators on a daily basis. Microsoft SQL Server Integration Services (SSIS) used to be the linchpin of the “Extract Transform Load” (ETL) process part. Unfortunately,  SSIS based ETL proved hard to develop, maintain and update. Furthermore, Microsoft can stop supporting a given version of SSIS at any time and force the business unit to transition to a non backward compatible version of SSIS. As a result, few collaborators could use SSIS, which became a bottle neck. I was hired to develop a  R based  replacement to SSIS so as to tackle those problems.

 

The R based solution I developed replaced all the SSIS functionalities required and brought about the desired non-functional requirement and more:

  • Any team member can quickly and independently implement an ETL process

  • ETL processes are no longer a bottle neck

  • The business unit owns the system and evolves it as they see fit

  • New processes can be automated so as to free up time for the collaborators

SSIS is now decommissioned and the R based solution I developed is the new linchpin:  it saves time, money and headaches.

  Technicalities

The data import framework I developed consists of 5 R packages that are

  • structured as per Hadley Wickham advices

  • documented individually via roxygen2

  • documented globally via bookdown

  • unit tested via testthat

  • version controlled via git hosted on Azure DevOps

  • released in a private R package repository.



12/2020 - 01/2021
Replication of a chemical compound
(Industrie und Maschinenbau, >10.000 Mitarbeiter)

The properties of a chemical compound depend on its molecular composition. When the compound is expensive, rare or difficult to procure, one may want to replicate its composition by mixing available precursors. The difficulty is creating a mix that appropriately balances the properties and the price of the replicate.

 

The chemical company relies on the subject matter experts. Because expertise is long to acquire, difficult to share and time consuming to apply, the chemical company was looking for a way to automate as much as possible of the process. The experts would then focus on the most valuable. My task was to enable the experts to start from a reasonable data driven replicate.    

 

I proceeded so.

  1. I discussed the chemistry with experts.

  2. I modelled the problem and embedded the solution into a graphical interface.

  3. I included the chemical compound editing tool experts use into the graphical interface.

 

The tool saves the chemists a lot of time by providing them with a very good basis. They later refine it via the graphical interface and practical experiments.

  Technical tools  
  • Mathematics: Linear programming optimisation

  • Data analysis: R

  • Visualisation: R-Shiny

  • Data source: Excel files



03/2019 - 09/2019
Prediction of debt collection over 180 month
(Banken und Finanzdienstleistungen)

In the debt collection business, predicting future collection is of crucial importance when pricing debt portfolios, budgeting resources and acquiring clients. The debt collection company hired my services to replace their time consuming manual process with a data driven prediction tool that any member of the controlling department should be able to use.

 

I created the prediction tool in collaboration with the control agents as follows.

  1. The controllers taught me the basics of their business.

  2. The head data engineer introduced me to their information system and database.

  3. I developed a flexible model of how debts are collected and developed ways to test for it.

  4. I developed a graphical user interface so that the controllers could start the prediction process until it is fully integrated into the monthly IT processes.

 

Predicting debt collection used to take the whole team 6 weeks. It now takes 3 hours plus some manual adjustment.

  Technical tools
  • Mathematics: Modelling and quantile regression

  • Data analysis: R

  • Visualisation: Shiny

  • ETL: Oracle

Reisebereitschaft

Weltweit verfügbar
Ich bin ab Juli 2021 im München verfuegbar.
Wenn das Projekt sich ausserhalb von München sich befindet, müssen wir remote Arbeit vereinbaren.
Profilbild von Rudy Pastel Data Scientist aus Munich Data Scientist
Registrieren