ComputerJobs: Where IT people and jobs click

Job Application

Please answer the following questions in order to process your application.

Email Address *

Send confirmation of my application to this Email Address

Select your working status in the UK *

File Attachments:

(2MB file maximum. doc, docx, pdf, rtf or txt files only)

Attach a CV *

Optional covering letter

OR
Clear covering letter

* denotes required field

Apply

Additional Information:

First Name

Last Name

Address

Country

Home Telephone

Mobile/Cell

Availability/Notice

Salary Expectation GBP

Approximately how far are you willing to travel to work (in miles) ?

Apply

Key Privacy Information

When you apply for a job, ComputerJobs will collect the information you provide in the application and disclose it to the advertiser of the job.

If the advertiser wishes to contact you they have agreed to use your information following data protection law.

ComputerJobs will keep a copy of the application for 90 days.

More information about our Privacy Policy.

Job Details

Site Reliability Engineer - SRE (Permanent)

Location: Hampshire, England Country: UK Rate: £65k - 68k per year + bonus + benefits

Site Reliability Engineer - SRE

One of our biggest customers based in the Financial Services sector is looking for an experienced Site Reliability Engineer - SRE to join them as they look to create a newly appointed team.

Site Reliability Engineer:

We have an exciting brand-new opportunity to join a dynamic IT Team as a Site Reliability Engineer. We are looking for an expert in this field who has extensive experience and knowledge in managing APM tools such as Dynatrace and has demonstrable experience (at least 3 years) as a Site Reliability Engineer.

The Site Reliability Engineer (SRE) will take ownership of the observability suite, leveraging deep DevOps skills and experience to proactively enhance the performance and stability of APIs and applications. This role will play a crucial part in ensuring reliability and scalability including managing APM tools such as Dynatrace or New Relic.

Main Responsibilities as Site Reliability Engineer:

Take ownership of the observability suite, including monitoring, logging, and alerting tools, to ensure comprehensive and holistic visibility into system performance and health.
Configure and manage APM tools such as Dynatrace or New Relic, utilizing their capabilities to monitor application performance and troubleshoot issues effectively.
Utilize deep DevOps skills and experience to implement and maintain infrastructure as code (IaC) practices, automating deployment, scaling, and management processes.
Proactively measure and identify performance bottlenecks and reliability issues in APIs and applications and implement solutions to mitigate these issues.
Collaborate with development teams to optimize application performance, improve resource utilization, and enhance scalability.
Implement and maintain robust incident response and post-incident review processes to minimize downtime and prevent recurrence of issues.
Drive continuous improvement initiatives to enhance the reliability, scalability, and efficiency of infrastructure and services, getting ahead of customer needs.
Participate in on-call rotation and provide support for incident resolution and troubleshooting as needed.

Skills and experience you need as Site Reliability Engineer

Demonstrable experience (at least 3 years) as a Site Reliability Engineer or similar role, with a focus on maintaining high availability, reliability, and scalability of production systems.
Strong expertise in monitoring, logging, and alerting tools such as Prometheus, ELK stack, Grafana, Azure Monitor etc., with the ability to take ownership of the observability suite.
Experience managing APM tools such as Dynatrace or New Relic, utilizing their capabilities to monitor application performance effectively.
Deep understanding of DevOps principles and practices, including infrastructure as code (IaC) using Terraform, automated deployment, and configuration management (including tools).
Experience with Node.js, Java and JavaScript frameworks
Experience with cloud technologies, preferably Azure, and proficiency in managing cloud-based infrastructure.
Proven ability to proactively identify and resolve performance bottlenecks and reliability issues in APIs and applications.
Strong collaboration and communication skills, with the ability to work effectively with cross-functional teams.
Experience with incident response and post-incident review processes, and a commitment to minimizing downtime and preventing recurrence of issues.
A proactive mindset with a focus on continuous improvement, constantly seeking opportunities to enhance the reliability, scalability, and efficiency of infrastructure and services.
Resilient work ethic and the ability to thrive in a fast-paced and dynamic environment, including participation in on-call rotation for incident response and troubleshooting.

Due to the volume of applications received for positions, it will not be possible to respond to all applications and only applicants who are considered suitable for interview will be contacted.

Proactive Appointments Limited operates as an employment agency and employment business and is an equal opportunities organisation

We take our obligations to protect your personal data very seriously. Any information provided to us will be processed as detailed in our Privacy Notice, a copy of which can be found on our website

Posted Date: 29 Apr 2024 Reference: JS10416JB Employment Agency: Proactive Appointments Contact: Resource19

hide