Site Reliability Engineer
If you want real ownership, real impact, and infrastructure that doesn’t page us at 3 AM — apply.
About The Job
Most infrastructure roles are about keeping the lights on.
This one is about owning the power grid.
We’re building systems that need to be reliable, secure, fast, and cost-efficient at scale—and we’re looking for an SRE who treats production like a product and cloud spend like real money.
If broken systems annoy you, flaky alerts offend you, and wasted AWS dollars physically hurt—you’ll fit right in.
What You’ll Own
End-to-end ownership of production systems
High availability, SLOs, and incident response
Root Cause Analysis (RCA) and permanent fixes
Disaster Recovery design, testing, and failover drills
Security & compliance (IAM, patching, SOC2 / ISO / HIPAA)
Monitoring, logging, alerting that actually works
What You’ll Build
Automated, self-healing infrastructure
IaC using Terraform / CloudFormation
Observability with Grafana, Prometheus, ELK, Datadog
Performance tuning across infra, apps, and database
Cost-efficient AWS architectures (no waste)
Must-Have Skills
2–5 yrs in SRE / DevOps / Systems Engineering
Strong AWS expertise (EC2, EKS/ECS, RDS, S3, Lambda, VPC, Route53)
Python or Bash for automation
Hands-on with monitoring & logging tools
Experience with AWS Cost Explorer / Trusted Advisor / CloudHealth
Nice to Have
Kubernetes / EKS
Hybrid cloud (AWS + On-Prem)
Databases: Postgres, MySQL, DynamoDB
How You Think
You fix root causes, not symptoms
You automate before scaling humans
You treat AWS spend as real money
You stay calm during incidents and ruthless after
Work Style
Work across IST / GMT / EST
On-call ownership when needed
Team-first, zero-ego execution
- Department
- Product
- Locations
- Bengaluru
- Remote status
- Hybrid
Already working at Lyzr AI?
Let’s recruit together and find your next colleague.