Beschreibung
Are you an SRE looking to undertake an innovative AI integration project?
Our client, a leading household appliance company leveraging AI technologies, is seeking a freelance Site Reliability Engineer for a 12-month engagement. In this role, you will play a key part in building scalable, resilient systems to support a major AI-driven cloud integration initiative.
Project Scope:
Enhance and manage scalable cloud infrastructure to support AI workloads and data pipelines
Improve reliability and observability across distributed systems using monitoring, alerting, and logging best practices
Build and optimize CI/CD pipelines to enable consistent, automated deployments
Drive infrastructure-as-code practices using Terraform to manage cloud environments
Collaborate with software engineering and platform teams to ensure system uptime, performance, and incident response readiness
Champion SRE principles including error budgets, SLAs/SLIs, and automated remediation
What You Bring:
Strong expertise in site reliability engineering or production-grade DevOps
Proficiency with CI/CD tools and pipeline design
Solid experience in container orchestration with Kubernetes in high-availability setups
Hands-on knowledge of cloud platforms and infrastructure automation with Terraform
Experience with monitoring/observability stacks
Understanding of distributed systems and strategies for resilience and fault tolerance
Excellent communication skills in English and German
Our client, a leading household appliance company leveraging AI technologies, is seeking a freelance Site Reliability Engineer for a 12-month engagement. In this role, you will play a key part in building scalable, resilient systems to support a major AI-driven cloud integration initiative.
Project Scope:
Enhance and manage scalable cloud infrastructure to support AI workloads and data pipelines
Improve reliability and observability across distributed systems using monitoring, alerting, and logging best practices
Build and optimize CI/CD pipelines to enable consistent, automated deployments
Drive infrastructure-as-code practices using Terraform to manage cloud environments
Collaborate with software engineering and platform teams to ensure system uptime, performance, and incident response readiness
Champion SRE principles including error budgets, SLAs/SLIs, and automated remediation
What You Bring:
Strong expertise in site reliability engineering or production-grade DevOps
Proficiency with CI/CD tools and pipeline design
Solid experience in container orchestration with Kubernetes in high-availability setups
Hands-on knowledge of cloud platforms and infrastructure automation with Terraform
Experience with monitoring/observability stacks
Understanding of distributed systems and strategies for resilience and fault tolerance
Excellent communication skills in English and German