• polygon-breadcrumbs
  • Back
  • Sagentlogo


    Principal Site Reliability Engineer


    Full Time


    8-10 Years Exp

    Job Description

    The Principal Site Reliability Engineer is vital in our Site Reliability Engineering team. As the technical leader at the Center for Operational Excellence, you will guide our technology strategies and set standards for engineering practices. With your extensive software development, platform, and systems engineering expertise, you'll lead complex, mission-critical projects, and drive innovation across our tech stack. Your role is pivotal in strategic decision-making, influencing both business and technology outcomes, and mentoring future leaders within our teams.

    In this position, you are expected to demonstrate exceptional mastery and advanced skills in software, systems, and platform engineering. You'll ensure that our digital platforms are not only resilient but also incorporate the forefront of cutting-edge technologies and practices. Your influence will be crucial in driving the adoption of modern and emerging practices, shaping the technological future of our organization, and maintaining our leadership in innovative technology solutions.

    Roles and Responsibilities

      • Provide technical leadership to the Insights and Incident Response teams, promoting an AI-forward approach in operations and fostering innovation, collaboration, and continuous improvement.
      • Oversee the development and enhancement of the Insights team’s self-serve observability platform and the Incident Response team, guiding them towards proactive and predictive analysis to maintain high SLAs and preempt system issues while effectively leveraging AIOps for enhanced operational efficiency.
      • Cultivate an in-depth understanding of the enterprise, cloud, and production infrastructure, associated products, applications, and services, and analyze and troubleshoot these complex distributed systems, advocating best practices based on observed design and incident patterns and providing domain expertise for early design guidance and decision-making.
      • Collaboratively lead the development of secure, robust, and high-performing infrastructure architectures alongside the Principal DevOps Engineer and work closely with InfoSec and Enterprise Architecture teams to set and enforce standards across crucial infrastructure components like Kubernetes cluster designs, serverless architectures, data planes, and local VPC networks.
      • Establish and enforce reliability standards and gated thresholds, including Non-Functional Requirements (NFRs), in partnership with Enterprise Architecture to ensure all products, services, and applications, whether vendor-sourced, open-source, or internally developed, adhere to these standards to guarantee observability, resilience, and reliability.
      • Lead the post-mortem process, ensuring thorough analysis and actionable outcomes from each incident are addressed by the identified owners and act as the primary contact during incidents, guiding the response team to practical solutions that drive improvements to foster a culture of proactive learning and system resilience.
      • Design and implement internal tools and software solutions to address gaps in observability and reliability, bridging existing capabilities with emerging needs and ensuring these solutions enhance system monitoring, incident response, and overall infrastructure resilience.

    Desired Candidate Profile

      Software Engineering:

      • Demonstrated exceptional expertise in programming and scripting with a mastery of languages like Java, C#, Python, Go, JavaScript, TypeScript, Bash, and PowerShell and leveraging these skills for advanced automation, process optimization, and innovative solution development.
      • Expert use of monitoring and incident response tools (Dynatrace, Datadog, Grafana, New Relic, PagerDuty, OpsGenie, Splunk OnCall), applying strategic approaches to incident response, system troubleshooting, and performance optimization.
      • Advanced skills in analyzing, evaluating, and integrating vendor solutions and open-source projects. Proficiency in creating custom, high-performance tools, and solutions to bridge gaps in technology, striking a balance between innovative in-house development and external technological advances.
      • Deep expertise in Agile methodologies, DevOps practices, and mastery in Continuous Integration/Continuous Deployment (CI/CD) pipelines, coupled with sophisticated release management practices to ensure efficient, reliable software deployment.

      Systems Engineering:

      • Mastery in Infrastructure as Code, with extensive experience using tools like Terraform, CloudFormation, Azure Resource Manager, Google Deployment Manager, Ansible, and Pulumi for effective, scalable, and secure infrastructure management.
      • Comprehensive experience in managing systems across multiple and hybrid cloud environments, showcasing proficiency in optimizing for operational efficiency, latency, security, and compliance.
      • In-depth expertise in various data storage solutions (SQL, NoSQL), advanced skills in queueing systems (Kafka, RabbitMQ), and transient data solutions (Redis, Memcache), ensuring high data integrity and optimal performance.
      • Extensive knowledge and expertise in network architecture, including mastery of VPCs, DNS, CDN, load balancing, and network security practices, essential for designing robust, scalable systems.

      Platform Engineering:

      • Demonstrated mastery in planning, executing, and optimizing complex system architectures, with deep expertise in microservices and serverless frameworks. Proven ability in handling scalability and efficiency challenges.
      • Profound expertise across major cloud platforms (AWS, Azure, Google Cloud Platform), designing, deploying, and optimizing sophisticated cloud-native solutions. Expertise in high availability, disaster recovery, and scalability in cloud environments.
      • Extensive hands-on experience with container technologies, particularly Kubernetes, demonstrating advanced deployment capabilities, managing, and scaling applications. Proven ability to construct and maintain hyper-scalable, fault-tolerant infrastructures.
      • Leadership in technological innovation and cross-platform integrations, with the ability to foresee emerging technology trends and apply them in creating forward-thinking solutions driving the adoption of cutting-edge technologies and methodologies to maintain and enhance the company's competitive edge in platform engineering.

      Preferred Qualifications

      • Experience in the lending technology sector or related financial services industries, demonstrating an understanding of industry-specific challenges and the ability to tailor technological solutions to meet these unique requirements.
      • Hold advanced certifications in critical areas like cloud technologies, AI, data management, networking, and cybersecurity, reflecting deep and broad technical expertise that enhances our technology strategies.
      • Proven experience in leading technical projects with cross-functional and multinational teams, showcasing strong leadership skills and the ability to drive successful outcomes in complex, collaborative environments.
      • A track record of innovative problem-solving, with examples of implementing cutting-edge solutions or pioneering new technological approaches.
      • Strong capabilities in communicating complex technical concepts to non-technical stakeholders, fostering effective collaboration across different departments.
      • A demonstrated commitment to continual professional development, staying abreast of the latest industry trends and technologies, and adaptability in applying this knowledge to evolving challenges.
      • Active involvement in open-source projects, technical forums, or professional communities showcasing dedication to the broader tech community and a commitment to collaborative growth and learning.

    Principal Site Reliability Engineer

    location Chennai

    onsite Hybrid

    exp 8-10 Years Exp

    vector_icon Full Time

    apply now

    Have a question about this job?

    Our talent advisors are happy to answer.