Beschreibung
MLOps SRE- Fully Remote - 12+ month contract (4+ year project)
We are seeking an experienced MLOps SRE to work on a long term, critical project for a leading global client of ours.
Key Responsibilities an Experience:
MLOps SRE/MLRE (Site Reliability Engineer/ML Reliability Engineer) - CNO (Cloud-native Operations) for MLOps, based upon CNO for DevOps, with knowledge about and (hands-on) experience with:
Keeping the infrastructure healthy, ensuring the reliability of infrastructure, apps, services, databases
Ensuring availability of inference services to products
Solving (operational) infra related issues (including Kubernetes related issues and issues related to the selected frameworks like Pachyderm and Kubeflow)
Linux. Standard Linux Shell Scripts capabilities.
GCP/Cloud knowledge
Systems and software architecture concepts.
Experience on Kubernetes & Appdynamics would be a bonus
For successful applicants interviews are guaranteed in 24 hours and same day decisions. I have an excellent relationship with my client and can guarantee within the hour response if you apply today.
Email: (see below)