Data engineering with the ability to design, implement, and deliver maintainable, high-quality code in Python/PySpark.
• Design and build data infrastructure, data plateform and delta lakehouse in Databricks.
• Developed and Optimized Compass, an internal Adecco product, to analyze supply and demand in the job market.
• Accomplished the migration of legacy Azure blob storage file system data lake to Databricks delta lake.
• Loaded data from various sources into delta lake, leveraging my expertise in data processing and aggregation to generate final delta tables. Loaded final data into CosmosDB using a document DB to enhance application performance. • Implemented seamless orchestration of Pyspark scripts through workflow in Databricks.
• Use Airflow to orchestrate the old legacy scripts.
• Build data normalization and company name matching algorithm to match internal and external companies.
• Built data models in dbt, implemented singular and generic tests to validate data integrity, and automated the execution of these tests within CI/CD pipelines using GitHub Actions.
• Responsible for peer code review and migration process.
• Conducted knowledge-sharing sessions with team members to foster collaboration and enhance overall team expertise.
• Implemented radius search functionality in Compass, an internal data product, using Geopandas, enhancing its capabilities to analyze and visualize location-based insights.
• Designed and implemented performance-driven metrics, including CPA (Cost Per Application) and CPSH (Cost Per Successful Hire), to optimize marketing spend and enhance ROI through actionable insights.
• Developed scalable dimensional models and dashboards for multi dimensional analysis of campaign performance, enabling efficient tracking across time, channel, and customer segments.