Beschreibung
you will:Lead the implementation and continuous improvements of a data lake architecture, optimised for analysis and machine learning.
Apply extensive knowledge of AWS services such as S3, IAM, Cloudwatch, EC2, Glue, Athena, Lambda and Kinesis etc.
Deliver data through various data ingestion options such as batch, stream, flat files, databases and APIs.
Get stuck into scripting for both data and file processing, e.g. SQL, Python, Shell, Perl.
Contribute effortlessly within an Agile, DevOps process methodology to CI/CD automation, Version Control (Git) & Test-Driven Development.
Experience with data orchestration platforms, preferably Airflow.
Requierments:
Technically adept and have a deep hands-on knowledge of data engineering patterns, practices and architectural principles.
Highly motivated, driven and adaptable.
Passionate about learning.
Experience providing mentorship for team members.
Leverage the right tools for the right job to deliver testable, maintainable and modern data solutions
Demonstrable experience in maintaining strong technical standards in a fast-paced environment.
Familiar with Pair Programming and have an interest in helping to grow the skills of your fellow engineers.
Technologies you will be familiar with:
Cloud fundamentals – preferably several of the following on AWS: S3, IAM, EC2, Cloudwatch etc, EMR, Glue, DynamoDB, Athena, Lake Formation, Redshift, Sagemaker, Lambda.
Plus some of the following: Teradata EDW (not essential), Airflow, PowerBI or similar visualisation tools, Test Driven Development (TDD), Version control systems (preferably Git), Experience of centralising data within a service oriented architecture, Grafana or similar monitoring/alerting tools.