Site Reliability Engineering (SRE) Manager - Python, GO, GCP (Google Cloud) , Ansible

Job title : Site Reliability Engineering (SRE) Manager - Python, GO, GCP (Google Cloud) , Ansible
Location : London
Job type : Permanent
Salary : £0 - £95000 per annum
Sector : Digital & Media
Reference : BHJOB6067_BH-17687
Site Reliability Engineering (SRE) Manager - Python, GO, GCP (Google Cloud) , Ansible

Our client is a leading media/broadcasting organisation, due to a massive increase in exciting customer focused technologies they are now on the search for two Site Reliability Engineering (SRE) Managers.

You will be leading teams to develop and support technologies across the business, globally. No matter the device, the time, or the place, you will make sure that engaging large, diverse audiences with premium entertainment is possible.

You will be responsible for defining and managing the objectives of a team to support delivery of software, balanced with the operational readiness, resiliency, and quality standards of the OTT Reliability Engineering team.

Responsibilities include: 
  • Collaborating with development, architecture and other teams to provide a path to production that supports development and Reliability objectives.
  • Implementing infrastructure as code, monitoring as code, everything as code.
  • Engaging with teams to improve resilience, conduct formal operational readiness reviews of proposed software designs, controls, and develop test plans.
  • Performing incident analysis, providing recommendations and driving continuous improvement of the systems within your remit.
  • Advising engineers and engineering managers in the development of safer and more defensible software.
  • Developing and improving the capabilities of your team, allowing them to deliver better solutions and become experts in their chosen areas of focus.
What you'll bring:
  • Demonstrate a breadth of experience in technology architecture, design, and development.
  • Strong background in System Administration/architecture in the cloud (GCP)
  • Strong background in Configuration and management of large-scale platforms. (Terraform, Virtualization, Cloud, Unix, Java, Puppet, No SQL Databases, Kubernetes, Docker) 
  • Strong background in monitoring and logging of large-scale platforms. (Nagios, Prometheus, Splunk, Icinga, etc.) 
  • Proven experience of implementing change to enforce high availability on large scale platforms. (Eg: Circuit breakers, Fail Fast/Silent/Stubbed Fallback etc.) 
  • Strong Scripting knowledge (Ansible, JavaScript, Shell, C) with some of the following protocols Web/HTTP, Java, Java-Web, Web Services
  • Understanding of Agile and deep understanding of Dev Ops Practices.  e.g. Continuous Delivery
A fantastic benefits package is on offer, coupled with exposure to the latest technology and the ability to fast track your career.

Cornwallis Elt is an Employment Agency & Employment Business and has been listed 3 times in The Sunday Times Virgin Fast Track 100 of the UK`s fastest growing private companies, as well as in the Recruitment International Top 250, Top 50 in IT and the Recruiter Fast 50 & Hot 100 reports.