SRE Manager

Digit Insurance
  • Posted On: 2025-12-11 20:16:51
  • Openings: 1
  • Applicants: 0
Job Description

The Manager - SRE plays a pivotal role in driving the success of multiple teams within the organization. This position involves leading, guiding, and empowering the teams responsible for Major Incident Management, Event Monitoring, Release Engineering, and Automation Testing.

The role is critical in ensuring the reliability, performance, and availability of software systems within the organization. This position involves overseeing a team of Site Reliability Engineers (SREs) and collaborating with cross-functional teams to maintain robust and efficient systems.

Responsibilities

1. Incident Management:

  • Lead the Major Incident Management team in handling critical incidents.
  • Establish and maintain incident response procedures.
  • Ensure timely communication, tracking, and resolution of major incidents.
  • Coordinate cross-functional efforts during major incidents.
  • Analyze equipment failure data, performance reports, and incidents to identify trends and areas for improvement.
  • Focus on root cause analysis and implement long-term solutions to prevent recurring issues.
  • Influence and improve the incident management lifecycle to identify, mitigate, and learn from reliability risks.
  • Embed into SRE projects and on-call rotations to stay close to operational workflows and address issues promptly.
  • Lead and manage a team of SREs responsible for monitoring, automating, and improving system reliability.
  • Foster a healthy work environment, promote collaboration, and ensure the teams professional development.
  • Compile key performance indicators (KPIs) and advocate for best practices related to performance and reliability.

2. Event Monitoring:

  • Oversee the Event Monitoring Team’s activities.
  • Proactively identify and address potential incidents.
  • Ensure effective detection and response to critical events.
  • Automate processes to enhance system reliability and performance.
  • Build effective monitoring systems to proactively detect and address anomalies.

3. Release Engineering:

  • Provide leadership to the Release Engineering team.
  • Manage incidents related to software deployments, updates, and releases.
  • Collaborate with other teams to resolve deployment-related issues.
  • Lead the Automation Testing team.
  • Address incidents related to automated testing processes and tools.
  • Optimize testing workflows and ensure efficient resolution of issues.

Qualifications

  • Minimum 8 years of experience in team leadership or management roles.
  • Proficiency in incident management and crisis resolution.
  • Familiarity with ITIL and ITSM practices.
  • Technical knowledge in areas such as cloud platforms (AWS, Azure) networking, and Infrastructure support.
  • Strong situational awareness and decisive decision-making skills.Role & responsibilities


Preferred candidate profile



More Info
Full Time
o
Not mentioned
Not Disclosed
English
Not Disclosed
Education
Any Graduate
Not Disclosed
Required Skills
Major Incident Management Mim Change management Problem management Incident management Critical Incident Management Release management Sre

Contact Details
Digit Insurance
+91 987654567
grievance@godigit.com
  • Experience6+ years
  • Salary Above 10 LAKHS ANNUALLY
  • Location for Hiring Bengaluru
  • Apply Now
Latest Job

Similar Jobs

  • 1 years
  • Bengaluru
  • 2 Days
Customer Relationship Management (CRM) @Bengaluru
Eureka Outsourcing Solutions (EOS)
  • 1 years
  • Bengaluru
  • 2 Days
  • 1 years
  • Bengaluru
  • 2 Days
Customer Support Executive/Loan sales
Teamware Solutions, a division of Quantum Leap Consulting Pvt. Ltd.
  • 3 years
  • Remote
  • 2 Days