Data migration - Freelance - Remote

Berlin  ‐ Remote


Data Migration Apache Hadoop Open Source Airflow Github Python Data Governance Apache Spark Automatisierung Big Data Continuous Integration Information Engineering ETL Devops Hadoop Distributed File System Json Scala Extensible Markup Language Parquet Testen Avro Docker


If you are a Data Migration consultant and you are looking for a new freelance position then I have a great opportunity for you.

Location: Any where in Europe

Contract Duration: 12 months+

Language Requirements: Fluent in English

Job Title: Data Migration

Job Description:

We are seeking a skilled Data Migration Engineer to join our team. The primary responsibility of this role is to migrate data from our current Hadoop-based platform to a new on-premises data platform built with open-source tools. The ideal candidate will have strong experience in working with Spark, Scala, Airflow, Hadoop, Python, and GitHub.

  • Lead the migration of data from Hadoop to a new on-premises data platform using open-source tools.
  • Develop and implement ETL processes to ensure smooth data transition.
  • Design, develop, and schedule data processing jobs using Apache Airflow.
  • Write efficient and maintainable code in Scala and Python for data processing tasks.
  • Work closely with data engineers, data scientists, and other stakeholders to ensure data accuracy and integrity during the migration process.
  • Monitor and troubleshoot data migration processes to ensure high performance and reliability.
  • Document the migration process, including architecture, design, and implementation details.
  • Ensure compliance with data governance and security standards throughout the migration process.
  • Collaborate with the DevOps team to integrate CI/CD pipelines using GitHub for automated deployment and testing.

  • Proven experience in data migration projects, particularly from Hadoop to other platforms.
  • Strong expertise in Apache Spark and Scala for big data processing.
  • Proficiency in Python for scripting and data manipulation tasks.
  • Experience with Apache Airflow for workflow automation and scheduling.
  • Hands-on experience with Hadoop ecosystem tools and technologies.
  • Familiarity with various data storage solutions, including HDFS, and object storage systems.
  • Solid understanding of data formats such as AVRO, Parquet, JSON, and XML.
  • Experience with CI/CD tools and practices, particularly with GitHub.
  • Excellent problem-solving skills and the ability to work independently and as part of a team.
  • Strong communication skills to effectively collaborate with technical and non-technical stakeholders.

Preferred Qualifications:
  • Experience with additional open-source data processing tools and frameworks.
  • Knowledge of data governance and security best practices.
  • Prior experience in on-premises data platform development and management.
  • Familiarity with containerization and orchestration tools such as Docker and Kubernetes.

Join our team and play a critical role in transforming our data infrastructure to leverage cutting-edge open-source technologies. Apply today to be part of an exciting and challenging project!

Darwin Recruitment is acting as an Employment Business in relation to this vacancy.
12 months +
(Verlängerung möglich)
Darwin Recruitment
100 % Remote
Um sich auf dieses Projekt zu bewerben müssen Sie sich einloggen.