Devops and Site Reliability Engineer
Join Lyzr as a DevOps & SRE engineer to design reliable multi-cloud infra, automate pipelines with Terraform & Python, and ensure production resilience.
Devops and Site Reliability Engineer
Experience : 2-5 Years
Location: Onsite -Hybrid
About Lyzr
At Lyzr, we aren't just building infrastructure; we’re architecting the backbone of the GenAI revolution. We are looking for a Cloud Reliability & DevOps Architect who thrives at the intersection of automation and operational excellence.
Role Overview
We are looking for a high-agility Cloud Reliability & DevOps Architect to join our engineering team. This is a hybrid role designed for a professional who sits at the intersection of infrastructure automation (DevOps) and operational excellence (SRE).
You will be responsible for architecting resilient multi-cloud environments, automating complex delivery pipelines, and ensuring the absolute reliability and cost-efficiency of our production systems. From writing modular Terraform code to leading deep-dive Root Cause Analysis (RCA), you will own the entire lifecycle of our infrastructure.
Key Responsibilities
1. IaC & Automation Architecture
Advanced Development: Architect and maintain complex infrastructure using Terraform (multi-cloud) and AWS CloudFormation.
Modular Design: Create reusable, version-controlled modules to standardize deployments and eliminate code duplication.
Eliminate Toil: Apply SRE principles to automate repetitive operational tasks and manual provisioning through Python, Bash, or Go.
2. Multi-Cloud Operations & Connectivity
Core Management: Optimize production environments across AWS (EC2, EKS, Lambda, VPC) and Azure (VMs, VNet, Functions).
Cross-Cloud Networking: Design secure connectivity solutions between disparate cloud providers and on-premise systems.
3. System Reliability & Observability
End-to-End Ownership: Own the health of production systems, ensuring High Availability (HA) and meeting strict SLOs/SLIs.
Incident Management: Lead the RCA process for outages and implement architectural changes to prevent recurrence.
Observability Frameworks: Build and maintain comprehensive monitoring and alerting (Prometheus, Grafana, ELK Stack, CloudWatch) for early anomaly detection.
4. Security, Compliance & FinOps
Security by Design: Build infrastructure with strict IAM roles, secret management (HashiCorp Vault/KMS), and automated compliance checks (SOC2/ISO).
Cost Optimization: Actively drive FinOps initiatives—rightsizing instances, managing Reserved/Spot instances, and identifying idle resources to reduce waste.
Disaster Recovery: Design and lead periodic DR failover drills to ensure business continuity.
5. CI/CD & Performance Tuning
Pipeline Ownership: Design end-to-end CI/CD pipelines (GitHub Actions, GitLab CI, or Jenkins) for seamless delivery.
Self-Healing Systems: Implement auto-remediation workflows to resolve common system issues without human intervention.
Technical Qualifications
Must-Have Skills:
Experience: 2–5 years in SRE, DevOps, or Cloud Engineering roles.
Cloud Mastery: Hands-on experience managing production workloads in AWS (Expert level) and Azure.
IaC Proficiency: Expert-level knowledge of Terraform (State management, Modules) and CloudFormation.
Scripting: Strong automation skills in Python and Bash.
Monitoring: Hands-on experience with Grafana, Prometheus, or Datadog.
Preferred Qualifications:
Containers: Experience with Kubernetes (EKS/AKS) and orchestration.
Certifications: HashiCorp Certified: Terraform Associate or AWS/Azure DevOps Professional.
Data: Understanding of database administration (PostgreSQL, MySQL, or DynamoDB).
Work Environment & Soft Skills
Global Flexibility: We support clients across IST, GMT, and EST. You must be flexible with working hours for deployments and on-call rotations.
Detective Mindset: You are relentless in debugging and won't stop until you find the root cause of a distributed system issue.
Financial Awareness: You treat cloud resources as real money and take pride in running a lean, efficient infrastructure.
Tech Agility: You are not married to one tool; you use the best tool for the job and pivot as technology evolves.
- Department
- Applied AI
- Locations
- Bengaluru