30.11.2025 aktualisiert


verifiziert
Premiumkunde
100 % verfügbarLead Data Engineer
Berlin, Deutschland
Deutschland +6
Bachelors of Actuarial Scienceinfo: Deutschland, Österreich, Schweiz, Belgien, Irland, Luxemburg, Niederlande
Über mich
Lead Data Engineer with 6+ years building enterprise platforms. Led teams of 8 engineers delivering lakehouse migrations and multitenant architectures. Expert in Databricks, Snowflake, AWS, Azure, and GCP. Track record: 10x performance gains, greenfield CI/CD, and GDPR/HIPAA compliance.
Skills
JavaScriptKünstliche IntelligenzApache AirflowLuftfahrtAmazon Web ServicesAmazon Elastic Compute CloudAmazon S3Data AnalysisArchitekturMicrosoft AzureBusiness IntelligenceGoogle BigQueryStrategisches ManagementCloud ComputingCloud-Speicher
Lead Data Engineer / Data Architect
Lead Data Engineer with 6+ years building enterprise platforms for Johnson & Johnson, Aer Lingus, and Australian Football League. Led teams of 8 engineers delivering lakehouse migrations and multi-tenant architectures. Expert in Databricks, Snowflake, AWS, Azure, and GCP. Track record: 10x performance gains, greenfield CI/CD, and GDPR/HIPAA compliance.
Skills
Platforms: Databricks, Snowflake, Delta Lake, Azure Synapse, BigQuery Cloud: AWS, Azure, GCP Programming: Python, PySpark, SQL, Scala Orchestration: Airflow, Delta Live Tables, Azure Data Factory, Matillion DevOps: GitHub Actions, Jenkins, Terraform, Databricks Asset Bundles Data Quality: Great Expectations, dbt, DLT Expectations AI/ML: OpenAI API, LangChain, Vertex AI, Prophet, scikit-learn Visualisation: Power BI, Tableau, Looker
Languages
English (Native), German (A2)
Sprachen
DeutschgutEnglischMuttersprache
Projekthistorie
- Led 8 engineers migrating fragmented Oracle, Informatica, Snowflake, and Airflow landscape to unified Databricks medallion architecture
- Ingesting 100M+ daily events from reservations, departure control, baggage handling, flight telemetry, finance, and HR systems
- Built parameterized ingestion framework with Delta Live Tables cutting pipeline delivery from 3 weeks to 3 days
- Migrated 60+ Airflow DAGs to Databricks Workflows, eliminating orchestration tool sprawl and reducing maintenance overhead
- Implemented CI/CD framework from scratch using Databricks Asset Bundles and GitHub Actions, enabling automated testing and deployment across all environments
- Established standardized data quality and testing framework with unit tests, integration tests, and DLT expectations embedded into CI/CD pipeline
- Hackathon Winner 2025: Built AI-powered crew recovery system using LLMs and MCP protocols, reducinng disruption response time from hours to minutes
- Reduced platform costs by €400K/year through platform consolidation, spot instances, auto-scaling clusters, and retiring legacy jobs
- Enforced GDPR compliance via Unity Catalog row/column security and dynamic PII masking on 50+ tables
- Tech: Databricks, Unity Catalog, Delta Live Tables, PySpark, AWS S3, Terraform, GitHub Actions
- Led 6 engineers replacing legacy SQL Server and SSIS pipelines with Azure Databricks lakehouse architecture
- Processing batch and near-real-time data from 25+ manufacturing lines across pharmaceutical and medical device production
- Built automated data quality framework with Great Expectations and DLT expectations, catching defects before they impact €45M+ product lines
- Architected fault-tolerant pipelines with circuit breaker patterns, dead-letter handling, and PagerDuty integration
- Implemented Jenkins CI/CD pipeline from scratch with Terraform IaC, eliminating manual deployments and reducing release cycles from days to hours
- Cut production incidents by 70% through automated testing gates and environment promotion workflows
- Achieved 35% average query performance improvement through Delta Lake migration, Z-ORDER clustering, and adaptive query execution
- Reduced annual platform costs by €280K through reserved instances, spot pools, and automated lifecycle policies
- Enforced HIPAA compliance via row-level security, dynamic column masking, and immutable audit logging
- Tech: Azure Databricks, Delta Lake, Azure Data Factory, PySpark, SQL Server, Jenkins, Terraform
- Led 4 engineers building company's first centralized data platform on GCP, replacing fragmented spreadsheets and siloed manual processes
- Ingesting data from 20+ sources including Twitter/X API, NewsAPI, Salesforce CRM, and Google Analytics
- Designed dimensional model in BigQuery supporting campaign analytics, media monitoring, and sentiment tracking across 50+ Fortune 500 client accounts
- Built Airflow orchestrations managing (Cloud Composer), 150+ daily DAGs across ingestion, dbt transformations, and ML inference
- Implemented dbt transformation layer with 200+ models, automated schema tests, and freshness SLAs
- Built NLP pipeline using GPT-3.5-turbo for sentiment and topic classification at scale, with GPT-4 for high-value client deliverables, 50K+ documents daily, 40% accuracy improvement over keyword baseline
- Implemented LangChain orchestration for prompt chaining, context management, and structured output parsing
- Deployed inference endpoints on Vertex AI with cost controls, rate limiting, and response caching
- Reduced analyst reporting workload by 60% through self-service Looker dashboards with row-level client isolation
- Tech: GCP, BigQuery, Airflow, dbt, Python, OpenAI API, Vertex AI, Looker