Site Reliability Engineer: Azure: Global Energy Co

London  ‐ Vor Ort
Dieses Projekt ist archiviert und leider nicht (mehr) aktiv.
Sie finden vakante Projekte hier in unserer Projektbörse.

Beschreibung

World Leading Energy Company has an exciting opportunity for a Site Reliability Engineer to be responsible for keeping all user-facing services and other DSX production systems running smoothly. As a Site Reliabiltiy Engineer, you will be able to apply sound engineering principles, operational discipline and mature automation to their environments.

About the role:
- Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents.
- Use your on-call shift to prevent incidents from happening.
- Run their infrastructure with Terraform and Kubernetes.
- Use monitoring and alerting to alert on symptoms not outages.
- Document every action so that your findings turn into repeatable actions (playbooks) and then into automation.
- Improve the deployment process.
- Design, build and maintain core infrastructure pieces that allow DSX to scale to support hundreds and then thousands of concurrent users.
- Debug production issues across services and levels of the stack.
- Plan the growth of the DSX infrastructure.

About you:
- Think about systems, and particularly edge cases and failure modes.
- Know your way around Linux and the Unix Shell.
- Strong programming skills--preferably Nodejs, but it could be Python, Go, .NET or even Ruby.
- An urge to collaborate and communicate asynchronously.
- Document all the things so you don't need to learn the same thing twice.
- An enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
- Delivering quickly and iterating fast.
- Experience with Nginx, Docker, Kubernetes, Terraform, or similar technologies.
- Good experience with GitHub. Projects you could work on
- Coding infrastructure automation with GitHub Actions and Terraform.
- Improving Prometheus Monitoring or building new Metrics.
- Helping to deploy new versions of DSX.
- Helping to plan, prepare for, and execute the migration of DSX from virtual machines running on Azure to cloud-native container-based deployments with Kubernetes using Azure Kubernetes Service.

Details Description
Technical General knowledge of 4 of the following areas of technical expertise with deep knowledge in 1 area:
- Implement "Infrastructure as Code" using Terraform and GitHub CI/CD for automation.
- Load balancing of the application including Proxies and CDN.
- Kubernetes and containerising our system.
- Administering a high-availability MSSQL cluster.
- Monitoring and Metrics in Prometheus and Grafana, and their integrations with Slack/PagerDuty.
- Logging infrastructure.
- Backend storage management and scaling.
- Disaster Recovery and High Availability strategy.
- Contributing to code for services and automation.

Execution:
1. Provide emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed.
2. Propose ideas and solutions within the infrastructure team to reduce the workload by automation.
3. Plan, design and execute solutions within the team to reach specific, agreed-upon, goals.
4. Plan and execute configuration change operations both at the application and the infrastructure level.
5. Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.

This needs combining with a positive attitude and an ability to work within a large, globally dispersed project team in a multi-cultural environment. You also need to be a self-starter, a logical thinker and a quick learner, with strong initiative and excellent communication, interpersonal and presentation skills, able to write clearly and concisely. We believe in equality of opportunity for all job applicants regardless of gender, marital status, race, colour, nationality, ethnic origin, creed or religion, disability, sexual orientation or age. Specialising within Energy Trading, Oil & Gas, Financial Markets and TV & Entertainment, Eaglecliff Recruitment is ISO accredited, a Member of REC and listed within the top 4% for Financial stability by Dun & Bradstreet. Please telephone for an immediate response or email your CV for a reply within one hour. Eaglecliff Ltd is acting in the capacity of an employment agency for permanent recruitment and an employment business for contractor resourcing
Start
2021-12-06
Dauer
12 months Initially
Von
Eaglecliff Recruitment
Eingestellt
19.11.2021
Projekt-ID:
2254413
Vertragsart
Freiberuflich
Um sich auf dieses Projekt zu bewerben müssen Sie sich einloggen.
Registrieren