Schlagwörter
Skills
System Architect/Data Engineer with more than 15 years of experience building reliable ETL pipelines on scalable cloud infrastructure. I help organizations designing and implementing fully automated, data driven processes to support the business case using modern technology and data infrastructure.
Building data driven backends
I thrive on building new things from scratch, contributing best practices from previous projects and experience on how to design, build, deploy and operate data processing software and infrastructure.
Requirements engineering
Understanding customer needs and business requirements and translating them into infrastructure and software is one of my core competency used throughout the lifetime of every project. My focus on the essential core value proposition provides value to the customer early on.
Enabling teams
I support and enable teams to efficiently build and operate data driven processes. Sharing knowledge and best practices on designing and implementing reliable ETL processes and required infrastructure components is part of my daily routine.
Using technology
Having worked on more than 20 projects allowed me to learn and use a broad variety of tools and services to design and build data driven backend processes. This understanding helps me in choosing the right tools and technologies for the task given.
Projekthistorie
- Consulting services w.r.t data model, deployment strategies, process design
- Design and implementation of various ETL processes
- Workshops and knowledge transfer on cloud and data architecture, process design
- Design and implementation of a cloud-based data warehouse infrastructure based on AWS Managed Services
- Help in migrating various data processing jobs from Cloudera to AWS Managed Services
- Design and implementation of deployment pipelines for testing and rollout
- Workshops and knowledge transfer on cloud and data architecture and process design
- Implemented workflows to fetch data from various third-party providers
- Building and enabling a team to create new ETL processes
- Realtime integration of a market automation software suite
- Implemented modern ETL processing environment using Airflow, Spark and Kubernetes
- General advice on data architecture and data management
Design and implementation of a cloud-based data warehouse for evaluation of vehicle data. Design and implementation of a data science environment.
- Extented a prototype and put into operational readiness for a production environment
- Design and implementation of a CI / CD pipeline
- Setup of project structure and release management
- Implemented various ETL pipelines for car measurement data collection, validation and transformation
Technologies: AWS Cloud, Lambda, IAM, Airflow, Kubernetes, Terraform, Python, Jenkins
Design of AWS managed infrastructure platform for sensor data processing, extension of an existing data science environment.
- Advice on design and tools for building a Kubernetes based infrastructure platform for sensor data processing
- Design and implementation of infrastructure components on Kubernetes
- Design and implementation of the ETL pipeline for data collection
- Design and implementation of CI/CD pipeline with Bamboo & Kubernetes
Support conception and implementation/migration of a monolith into a micro service architecture.
- Alignment and coordination of different teams regarding technology usage
- Introduction of Kafka as the central message bus for micro service communications
- Introduction of LiquiBase for database schema management
- Professional / technical support for a specific micro service
Technologies: Micro Services, Java, Docker, Kafka, Liquibase, Jenkins
Design and implementation of a cloud-based data warehouse & data science environment.
- Designing data warehouse architecture & data storage strategies
- Architecture proposal of a dynamically scalable data warehouse
- Implementation of infrastructure components in Terraform & Kubernetes
- Implementation of infrastructure components in Kubernetes
- Development of ETL pipelines for data collection
Design and implementation of a cloud-based big data warehouse in the AWS Cloud for market research analytics.
- Technical project management
- Design of an architecture based on AWS cloud infrastructure and managed services
- Implementation of ETL data pipelines
- Development of data warehouse / workflow management
- Data preparation / process management
Technologies: Spark, SparkR, Hadoop, Hive, Jupyter, AWS Cloud, R, Bamboo, Terraform
Support in evaluating big data providers.
- Acquisition and documentation of the technical requirements for setting up and operating an Apache Hadoop based data warehouse
- Obtaining offers from various providers, preparing information for decision-making
- Implementation of a prototype for data collection
Technologies: Hortonworks, Cloudera, SAP Cloud, Apache NiFi, AWS Cloud, MS Azure, Terraform
Architecture review and design and implementation of a realtime aggregator for machine statistics.
- Review and assessment of the existing architecture and data model design
- Implementation workshop data management/Lambda architecture
- Design and implementation of a realtime layer with Spark Streaming
Technologies: Hadoop, Spark, AWS Cloud, Scala, MapReduce, JCascalog, RedShift
Workshop Big Data Technologies - Introduction and Getting Started.
- Conducting a 3-day workshop
- Introduction to Big Data / Hadoop ecosystem
- Practical exercise using big data tools in the AWS Cloud
Technologies: Hadoop, Spark, AWS Cloud, MapReduce, Hive, Pig, R, Terraform
Conception and development of a web application.
- Conception of the application
- Implementation of website and backend
- Set up deployment process + hosting environment
- Setting up a fully automated Apache Hadoop Deployment process in the Amazon and OpenStack Cloud
Technologies: Apache Hadoop, Python, Puppet, AWS, OpenStack, Git, RedHat Linux
Design and implementation of a continuous deployment & delivery pipeline for data-driven applications in cloud environments.
- Design and implementation of a big data infrastructure in the AWS Cloud
- Design and implementation of a continuous deployment pipeline
- Technical management of an customer internal team
Technologies: AWS Cloud, Hadoop, Spark, Bamboo, Git, Terraform, Vagrant, InfluxDB
Support in the development of ETL processes on a Hadoop based DWH.
- Planning and implementation of a hive export module
- Implementation of a Kafka & Redis export module as part of an open source project
- Implementation of an analysis algorithm for click stream analytics
Technologies: Hadoop, Hive, Spark, Redis, Kafka, Avro, Scala, HCatalog, Schedoscope
Conception and implementation of a data ware house based on big data technologies - OLAP workload.
- Planning and implementation of the cluster infrastructure
- Evaluation of various input formats with regard to performance
- Preparation, execution and documentation of load tests
Technologies: Hadoop, Impala, Hive, ETL, AWS Cloud
Design and implementation of a big data architecture for evaluating telecommunications data.
- Planning and implementation of the network setup
- Planning and implementation of a medium sized Hadoop cluster
- Set up deployment process, including monitoring
- Implementation of a data integration framework for high volume data storage
Technologies: Apache Hadoop, Hive, Flume, Java, Spring, Puppet, Ubuntu Linux, AWS
Design and implementation of a big data system for batch and real-time data processing of machine generated data.
- Planning and implementation of the deployment environment
- Evaluation of various technologies for data acquisition / data processing
- Implementation of a distributed, fail-safe high throughput messaging and analysis system for machine data (Lambda Architecture)
- Technical management of a team
Technologies: Hadoop, Samza, Spark, Kafka, Java, ETL, AWS
Design and implementation of Hadoop based data warehouse for online game analytics.
- Planning and implementation of a data warehouse
- Evaluation of different approaches for data collection
- Selection of suitable technologies
- Technical management / coordination of a distributed team (GER, CN, CAN)
- Implementation of a distributed, fail-safe high throughput messaging system
Technologies: Hadoop, Map / Reduce, Kafka, Hive, ETL, Java, Linux
Design and implementation of a big data infrastructure in virtualized environments.
- Planning and implementation of a big data deployment infrastructure
- Implementation deployment process for Hadoop Cluster on demand in a virtualized environment
- Prototype implementation of various algorithms with the map/reduce framework
Technologies: Hadoop, OpenStack, Opscode Chef, Java, Linux
Design and implementation of a Hadoop cluster.
- Advice and conception of a Hadoop cluster
- Selection of the suitable hardware
- Set up a deployment process and roll out the cluster
- Porting of existing statistics routines to Map / Reduce
Technologies: Apache Hadoop, Hive, Pig, Python, Java, Maven, Puppet, Debian Linux
New implementation of an analysis tool as map / reduce application.
- Analysis and integration of an existing implementation in the Map / Reduce Framework with the Hadoop Streaming API
- Installation and configuration of a Hadoop cluster including monitoring
- Set up a deployment process
Technologies: Apache Hadoop / HBase, Java, Maven, Ganglia, Chef, PHP, Debian Linux
Integration of a payment provider in the existing backend.
- Data preparation, conversion and import into database
- Mapping of the data, text matching with an existing database
- Integration of a payment provider
Technologies: Ruby / Rails, OAuth, MySQL, Git, Debian Linux
Integration of a signature component in an email program.
- Set up the debug environment
- Integration of signature components in KMail
- Testing the implementation
Technologies: C ++, Qt, KDE, Ubuntu Linux
Implementation and refactoring of an analysis tool in C ++.
- Set up a build environment for C ++ projects
- Refactoring the prototype
- Adaptation and expansion of the software to the production environment (logging , error handling, unit testing )
- Set up a deployment process
- Setting up a build server (continuous integration)
Reisebereitschaft
Sonstige Angaben
Für Projekte in ANÜ stehe ich nicht zur Verfügung. Bitte sehen Sie von Anfragen für Projekte in ANÜ ab. Vielen Dank!
I'm not available for projects with an ANÜ contract. Please do not contact me in that case. Thank you!