* Preparation, consolidation, and transformation of large (un) structured
data by using modern big data technologies such as Hadoop,
MapReduce, Presto, Pyspark, Python, Numpy, Pandas, Hive, Hbase, Livy,
Jupyterlab)
* Mainly responsible for the independent design, creation, deployment, and
management of big data pipelines within the AWS cloud infrastructure
(IOT Core, Kinesis, S3, Lambda function, SQS, Glue, EMR, Athena,
Redshift, ELK, Kubernetes, Docker, Git)
* Build data lake as a centralized repository to store structured and
unstructured data for advanced analytics - data Ingestion, big data
processing, real-time analytics (S3, Glue data catalog, Athena, EMR,
Redshift, delta lake, Databricks)
* Automatic setup for EMR clusters as well as continuous performance
improvement using AWS CloudFormation
* Develop AI Model for customer-oriented projects (focusing on xgboost)
* Responsible for 15+ POCs (Customers from automotive industry) as
Senior Data Engineer
* Automation of data preparation using ETL tools (Talend, Alteryx, AWS
Glue)
* App development using low-code framework Mendix