Profilbild von malhar parve Data Engineer | Python | PySpark | Azure Data Factory | Spark | Databricks | Snowflake aus FrankfurtamMain

malhar parve

teilweise verfügbar

Letztes Update: 11.03.2024

Data Engineer | Python | PySpark | Azure Data Factory | Spark | Databricks | Snowflake

Abschluss: Bachelor of Engineering in Computer science
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: deutsch (Grundkenntnisse) | englisch (Muttersprache)

Dateianlagen

CV-MALHAR-PARVE-en_110324.pdf

Skills

Summary:
  • Senior Data Engineer with 15+ years of IT Consulting experience working with clients like Deutsche Bank, Deutsche Boerse, Commerzbank, Aldi Sued, KARSTAT, etc. for their various Data Migration, Data Integration, Data Lake, BI/DW projects.
  • Highly proficient in Architecture, Design, Development, Implementation and Support of ETL or ELT data processing pipelines for Data warehouses or Data Lakes using ETL tools like Microsoft Azure Data factory V2, Informatica Powercenter, Talend.
  • Proficient in designing & customizing data models using data modelling techniques likes dimensional modelling, data vault modelling, etc.
  • Worked with various structured and semi structured data sources like SAP, Databases, REST API, CSV, XML, JSON, PARQUET, etc.
  • Designed various data layers e.g. stage layer, core layer, reporting layers, etc. for efficient & in time data processing with high quality.
  • Worked closely with Data Architect, Business Analysts, Product Owners, Tech Leads to help in drafting design, architecture, requirement specification, etc for developing ETL/ELT data processing pipelines.
  • Extensively worked in Scrum projects having high involvement in various phases e.g. sprint planning, creating user stories, tasks, etc.
Skills
ETL Tools
Azure Data Factory V2, Informatica PowerCenter 10.4, Informatica Integration Cloud Services i.e., IICS (Data Integration & Application Integration), Talend Data Integration 7.2

Services/Application Integration Tools
Informatica Intelligent Cloud Services (IICS) Data & Application Services, Talend Open Studio for ESB, Talend Real-Time Big Data Platform 7.2

Databases
Azure SQL Database, Oracle 19c, Oracle 12c, Oracle Exadata 12c, Microsoft SQL Server 2016, IBM DB2, Hadoop HDFS, PostgreSQL 10.7, MySQL

Large Language Model Tools:
ChatGPT, Google Bard

Cloud Based Datawarehouse:
Snowflake, Azure Synapse Analytics

Data Processing Frameworks/Platforms:
Apache Spark, Databricks

Big Data Ecosystem:
Hadoop 2.0, HDFS, Sqoop, Hive

Cloud Technologies:
Azure Storage Accounts, Azure Batch Account, Azure Data Factory, Azure SQL Server, Azure Data Lake Storage, Azure Functions, Azure Logic Apps , Azure Key Vault, Azure DevOps, Amazon Web Services (AWS), AWS S3

Modeling
3-NF, Dimensional Modeling (Star & Snow Flake), Data Vault (Raw Vault & Business Vault)

Modeling Tools
PowerDesinger

Software Development Methods
Agile, SCRUM, Waterfall

Programming Languages
Python 3, PySpark, JAVA, SQL, T-SQL, PL/SQL, UNIX/Bash Shell scripting

Scheduling Tools
BMC Control-M 9.0.19, Automic UC4, Informatica Scheduler, Talend Management Console (TMC)

Database Deployment & Change Control Tools:
Liquibase 4.21.1

Version Control
Subversion, GitHub

Operating Systems
Windows 7/10, Linux, Solaris

Other Tools
Informatica Mapping Architect for Visio, Atlassian Jira, Atlassian Confluence, GitHub, Hue, Eclipse, Toad, SQL Developer, SQL*Plus, PLSQL/Developer, Oracle SQL Developer, FTP, sFTP, WinSCP, FileZilla, Putty, HP Quality Center, Aginity Workbench for IBM Netezza, SQL Workbench/J, Citrix, Hue, POSTMAN REST API Testing, pgAdmin4, SnowSQL, DBeaver

Projekthistorie

07/2023 - bis jetzt
Senior Data Engineer
(Sonstiges, 5000-10.000 Mitarbeiter)

Responsibilities:

Data Pipeline/ELT:

- Design, Develop and Maintain ETL/Data pipelines using Azure Data Factory and Python.
- Designed and led the implementation of end-to-end data pipelines on Azure Data Factory, ensuring efficient data movement and transformation across multiple sources. Resulted in a 30% reduction in data processing time and improved data accuracy.
- Setup all meta data tables, their configurations, store procedures, views for pipeline reusability to load using Generic Import pipelines.
- Reduced 60% to 70% development time of source to data lake and data lake to staging lay mappings by developing generic ADF pipelines.
- Setup all database objects needed for logging pipeline run information.
- Creation of ADF Linked Services, Data Sets, pipelines to read data from SAP tables using SAP Table linked service and load data into Azure Data Lake Storage Gen2.
- Creation of various types of Data Sources, Linked Services, Pipelines, Global Variables, Triggers, etc ADF objects required for pipeline development.
- Creation of Global, Linked Service, Data Source, pipeline parameters for reusability.
- Create ADF pipelines using various activities like Copy Data, Web, Lookup, foreach, store procedure, execute pipeline, etc.
- Uses various data flow transformations such as select, filter, join, derive column, exists, sequence, etc.
- Create ADF Self Hosted Runtime and read data from on premises source system like SAP, etc.
- Debugging ADF pipelines using Data Flow Debug Clusters for verifying the data or transformation results.
- Creation of Generic SCD Type 2 pipelines for loading data into historized tables.
- Creation of documentation of various processes, data models, data flow diagrams, ETL Architecture on Confluence.
- Configuration of GIT Repositories for various environments and releases.
- Creation of Azure Key Vault resource for password encryption in data pipelines.
- Creation of Azure Pipelines to execute PySpark Notebooks from Azure Data Bricks workspace.
- Creation of PySpark Notebooks in Azure Databricks to perform various transformations and loading.
- Creation of various azure resource consumption reports for budget optimization.
- Creation of Azure Logic App workflows for email notification in case of Data Pipeline failure or Fatal errors.
- Used ChatGPT to find various performance optimization, Data Testing techniques.

Database Tasks:

- Creation of various Azure SQL Server database objects such as schemas, tables, sequences, store procedures, views, etc.
- Help business analyst to identify various dimensions as per Report requirements and optimize model.
- Creating various Master Data and Meta Data tables, views & store procedures for data enrichment and job run logging information.

Documentation:

- Creation of documentation of various processes, data models, data flow diagrams, ETL Architecture, Data Pipelines, Database Objects on Confluence.

Team Activities:

- Participating in various SCRUM meetings for creating user stories, estimation, backlog grooming, retrospective, etc.

DevOps:

- Creation of code repositories in Azure DevOps and developing CI/CD release pipelines for deployment to UAT & PROD environment.
- Creation of CI/CD release pipelines to automatically deploy application code objects from Dev to UAT & PRD DevOps repositories.
- Creation of Azure Key Vault, credentials and integrating it with ADF Linked services, activities for retrieving the secrets.

09/2020 - 06/2023
Data Pipeline Development and Support
Uniper Energy (Energie, Wasser und Umwelt, >10.000 Mitarbeiter)

Contract Type: Contract
Role: Data Engineer
Project: Energy Data Lake
Project Technology Stack
Cloud Applications: Microsoft Azure
Source System: REST API, MS SQL Server, Snowflake, CSVs, XMLs
Target System: MS SQL Server, Snowflake
ETL Tool/Programming Language: Azure Data Factory, Talend Data Integration, Python
Other programming languages: Python, SQL, SnowSQL
Scheduling Tool: Azure Batch Service, Talend Management Console

10/2022 - 01/2023
Azure Data Factory Development, Azure Data Migration, Azure Synapse Analytics, Azure SQL Database

- Implement data pipelines using Azure Data Factory
- Migration from the old system to the new system
- Connect, process, implement and store data sources
- Data processing with Azure MS SQL database
- Create SQL procedures that contain the data processing logic

03/2020 - 12/2020
Data Pipeline Development and Support
Credit Suisse (Banken und Finanzdienstleistungen, >10.000 Mitarbeiter)

Contract Type: Contract
Role: ETL Developer
Project: Trade & Transaction Regulatory Reporting (TCIS/TAPI)
MIFIR/EMIR Transaction Regulatory Reporting to various LCAs.

Project Technology Stack
Source System: XML Files, Flat Files, Oracle
Target System: Oracle 19c, XML, CSV
ETL Tool: Informatica PowerCenter 10.2
Other programming languages: Python, Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Control-M


08/2019 - 11/2019
Data Pipeline Development and Support
Deutsche Börse (Banken und Finanzdienstleistungen, >10.000 Mitarbeiter)

Contract Type: Freelance
Role: ETL Developer
Project: Regulatory Reporting Hub (RRH)
MIFIR/EMIR Transaction Regulatory Reporting to NCAs e.g. BaFin, AMF, etc.

Project Technology Stack
Source System: XML Files, Flat Files, Oracle
Target System: Oracle, XML, CSV
ETL Tool: Informatica PowerCenter 10.2
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Control-M

Other tools: Github, JIRA, SQL Developer, WinSCP

10/2017 - 12/2018
Data Pipeline Development and Support
Commerzbank (Banken und Finanzdienstleistungen, >10.000 Mitarbeiter)

Contract Type: Freelance
Role: ETL Developer
Project
Compliance (CMC & CAF) – Anti Money Laundering & Regulatory Reporting - Frankfurt & Singapore
This was a data Integration project which includes providing transaction data from various banking applications like Murex Cash, Murex Equity, Murex Currency, Summit Frontoffice, Summit Backoffice, etc to CMC Market Abuse, Singapore & CAF, Germany, for compliance reporting.
 

Project Technology Stack
Source System: Flat Files, MS SQL Server
Target System: Oracle, Flat Files, Hadoop HDFS
ETL Tool: Informatica PowerCenter 10.1
Other programming languages: Oracle SQL & PLSQL, Python Scripting, Unix Shell Scripting, UC4 Scripting
Scheduling Tool: Automic UC4


06/2015 - 09/2017
ETL Development and Support
Capgemini GmbH / Templeton & Partners Ltd (Konsumgüter und Handel, >10.000 Mitarbeiter)

ETL Tech Lead
Aldi Sued
Aldi Sued, Mülheim an der Ruhr through Capgemini GmbH / Templeton & Partners Ltd
Contract Type: Freelance
Role: ETL Tech Lead
Project: Retail Enterprise Data Warehouse
 

Project Technology Stack
Source System: MS SQL Server, Flat Files, Oracle
Target System: Oracle Exadata
ETL Tool: Informatica PowerCenter 10.1
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Informatica Scheduler
Project Methodology: Scrum/Agile

Responsibilities
- Participate in scoping, source system data analysis, target system requirements, volume analysis and migration window determination.

- Design & Develop Informatica PowerCenter ETL SCD Type-1 and Type-2 mappings to load dimensions in Sales Data Warehouse.

- Developed Informatica mappings to load fact tables using various transformation like sorter, aggregator, joiner, lookup, update strategy, sequence generator, etc.
- Perform data cleansing tasks using expression transformation, etc.
- Contact point for problems in the Production environment and Defects Tracking with business. (3rd-Level-Support)
- Developed Informatica PowerCenter mappings to move data from stage to Core and Data Mart Layer.

- Implement various loads like Daily Loads, Weekly Loads and Monthly Loads.
- Developed PL/SQL Packages, Procedures and Functions accordance with Business Requirements e.g. loading time dimension, etc.
- Documented various POCs and ETL solutions in Confluence.
- Debugging and troubleshooting Sessions using the Informatica Debugger and Workflow Monitor.
- Responsible for finding various bottlenecks and performance tuning at various stages like Source, mapping, transformation, session..
- Created Materialized Views and partitioning tables for performance reasons.
- Worked on various back end Procedures and Functions using PL/SQL.
- Designing Tables, Constraints, Views, and Indexes etc.
- Developed database objects including tables, Indexes, views, sequences, packages, triggers and procedures.

- Participated in Scrum Daily meetings, estimate subtasks of user stories, sprint analysis, etc.


01/2015 - 05/2015
ETL Development and Support
Informationsfabrik GmbH (Konsumgüter und Handel, >10.000 Mitarbeiter)

Senior ETL Consultant
HRS
HRS, Köln through Informationsfabrik GmbH

Contract Type: Freelance
Role: Senior ETL Consultant
Project: Hotel Enterprise Data Warehouse
 

Project Technology Stack
Source System: MS SQL Server, Flat Files, Oracle, XML
Target System: Sybase IQ
ETL Tool: Informatica PowerCenter
Other programming languages: Oracle SQL & PLSQL, T-SQL, Unix Shell Scripting
Scheduling Tool: Control-M
Project Methodology: Waterfall
Data Modeling: Dan Linstedt Data Vault Modeling

Responsibilities
- Use of Data Vault as Data Modeling approach for the Hotel Enterprise Data Warehouse.
- Define the ETL Architecture to load the Data Vault model and data mart.
- Analysed sourced data to identify candidates for Hub, Satellite and link tables
- Developed Informatica PowerCenter mappings, sessions and workflows to load Hub, Link & Satellite tables.
- Added Hub, Link & Satellite tables including business keys, Surrogate keys, descriptive satellite information to Data Vault Model.
- Implement various loads like Daily Loads, Weekly Loads.
- Perform various data cleansing tasks.
- Perform test using sample test data in accordance with the client data migration/integration needs.
- Contact point for problems in the Production environment and Defects Tracking with business.
- Developed Informatica PowerCenter mappings to move data from stage to core and data mart layer.
- Documented various input databases and data sources.
- Debugging and troubleshooting Sessions using the Informatica Debugger and Workflow Monitor.
- Developed UNIX shell scripts to perform various user requirements.

01/2014 - 09/2014
ETL Development and Support
IBM Deutschland / Questax AG (Konsumgüter und Handel, >10.000 Mitarbeiter)

Senior ETL Consultant
KARSTADT
Karstadt, Essen through IBM Deutschland / Questax AG

Contract Type: Freelance
Role: Senior ETL Consultant

Project:
Karstadt information systems for measures and analytics (KARISMA)

The goal of this project was to create centralized Analytical and Reporting system for Karstadt Warehouse GmbH. The major part of the project was to replace existing SAP BW Reporting system and create new enterprise data warehouse with Informatica PowerCenter 9.5.1 for ETL and Cognos 10 for Reporting. Informatica PowerExchange 9.5.1 with BCI (Business Content Integration) & Data Integration using ABAP methods were used to connect to Karstadt SAP Retail system and read data from SAP Standard and Customized Data Sources. IBM Netezza 7 was used as Target system with Informatica PowerExchange for Netezza.
 

Project Technology Stack
Source System: SAP, IDOC, Flat Files, XML
Target System: IBM Netezza
ETL Tool: Informatica PowerCenter 9.5, Informatica Powerexchange 9.5
Other programming languages: Oracle SQL & PLSQL, Unix Shell Scripting
Scheduling Tool: Informatica Scheduler
Project Methodology: Waterfall

07/2011 - 12/2013
ETL Development and Support
CSC Deutschland GmbH / Datamatics Global Solutions GmbH & Hays AG (Banken und Finanzdienstleistungen, >10.000 Mitarbeiter)

Senior ETL Consultant
Deutsche Bank
Deutsche Bank, Frankfurt through CSC Deutschland GmbH / Datamatics Global Solutions GmbH & Hays AG
Job Type: Employee & Contract Type: Freelance

Role: Senior ETL Consultant
 

Responsibilities
Informatica PowerCenter ETL Tool Development & Support
for Data Migration and Data Integration Projects.

Projects:
#1 Retail Banking - Postbank Savings Deposit Accounts Migration

This project involves migration of savings account
deposits data from Mainframe IBM Z/OS systems to SAP
Deposit Management application using PowerCenter 9.1 HF
1 and Power exchange 9.1. This involves reading data
from flat files, mainframe data sets, oracle and
writing data into flat files which will be then
uploaded into SAP Deposit Management application. Power
exchange 9.1 was used for the connecting to mainframe
and reading of Mainframe data sets. ETL Informatica
PowerCenter 9.1 was used for the extraction,
transformation, and loading of data into the target
systems. The project had only single load i.e. one time
migration. This project involves data extract,
transformation and loads of 250 to 500 million records.

#2 Retail Banking - Postbank Savings Deposit Accounts Integration
This project involves integration of savings deposit
account data from SAP, Mainframe systems in to the
Oracle enterprise data warehouse.
ETL activities includes loading of this data into
Oracle enterprise data warehouse used for Retail
banking reporting in Deutsche Bank, Germany. ETL
Informatica PowerCenter 9.1 HF 1 was used for the
extraction, transformation, and loading of data into
the target systems. The project had several loads like
Daily Loads, Weekly Loads, Monthly Loads, Quarterly
Loads, and YTD Loads. These loads were implemented
using Incremental Loading (Change Data Capture), and
Slowly Changing Dimension Mappings. This project
involves data extract, transformation and loads of 30
to 50 million records.

#3 Retail Banking - Auto Deployment
Deutsche Bank started this project to save huge time
for deployment of all ETL components e.g. informatica
PowerCenter workflows, informatica powerexchange data
maps, parameter files, shell scripts,etc. This project
helped Deutsche Bank to save time spend by deployers in
deploying multiple environments, error free deployments
and hence reduce cost.

#4 Retail Banking - LDAP Integration


11/2010 - 06/2011
ETL Development and Support
Hitachi Consulting Pvt Ltd (Sonstiges, >10.000 Mitarbeiter)

Senior ETL Consultant
American Home Mortgage Servicing Inc
American Home Mortgage Servicing Inc, Texas through Hitachi Consulting Pvt Ltd, Pune
Job Type: Employee
Role: Senior ETL Consultant

Project:
Home Mortgage Enterprise Data Warehouse


10/2008 - 06/2010
Oracle/Unix/Java Production Support
Sigma Systems (Sonstiges, 1000-5000 Mitarbeiter)

Software Engineer
Sigma Systems
Sigma Systems, Pune
Job Type: Employee


Role: Software Engineer
Oracle, Unix, Java Development & Support

Zertifikate

Liquibase Certified Practitioner
2023
Databricks Lakehouse Fundamentals
2022
Azure Data Engineer Associate
2021
Azure Fundamentals
2021
WebUI Essentials
2020

Reisebereitschaft

Weltweit verfügbar
Yes
Profilbild von malhar parve Data Engineer | Python | PySpark | Azure Data Factory | Spark | Databricks | Snowflake aus FrankfurtamMain Data Engineer | Python | PySpark | Azure Data Factory | Spark | Databricks | Snowflake
Registrieren