Beschreibung
Skillset:
- Very good understanding of the Apache Hadoop stack: HDFS, Oozie, Hue, Spark, Hive
- Experience with Cloudera Impala
- Good skills in Python or any other non-Java language supported by Amazon AWS Lambda
- Knowledge of AWS Lambda preferred
- Understanding of Apache Tez and experience with Datameer is an optional plus
- Knowledge of ELK (Elastic stack) is a plus
Responsibilities:
- Assessment of data sources: are existing ones sufficient to satisfy a particular demand, or are new sources required?
- Technical integration discussions with the owners of new data sources
- Building of data pipelines (cleansing, joining, transformation, aggregation)
- Building of data pipeline monitoring (did all jobs run successfully? If not, trigger an alarm)
- Documentation of pipelines
- Investigations on data mismatch
- Creation, deployment, evolution and monitoring of ingestion scripts (AWS lambda)