About The Job

Most infrastructure roles are about keeping the lights on.

This one is about owning the power grid.

We’re building systems that need to be reliable, secure, fast, and cost-efficient at scale—and we’re looking for an SRE who treats production like a product and cloud spend like real money.

If broken systems annoy you, flaky alerts offend you, and wasted AWS dollars physically hurt—you’ll fit right in.

What You’ll Own

End-to-end ownership of production systems
High availability, SLOs, and incident response
Root Cause Analysis (RCA) and permanent fixes
Disaster Recovery design, testing, and failover drills
Security & compliance (IAM, patching, SOC2 / ISO / HIPAA)
Monitoring, logging, alerting that actually works

What You’ll Build

Automated, self-healing infrastructure
IaC using Terraform / CloudFormation
Observability with Grafana, Prometheus, ELK, Datadog
Performance tuning across infra, apps, and database
Cost-efficient AWS architectures (no waste)

Must-Have Skills

2–5 yrs in SRE / DevOps / Systems Engineering
Strong AWS expertise (EC2, EKS/ECS, RDS, S3, Lambda, VPC, Route53)
Python or Bash for automation
Hands-on with monitoring & logging tools
Experience with AWS Cost Explorer / Trusted Advisor / CloudHealth

Nice to Have

Kubernetes / EKS
Hybrid cloud (AWS + On-Prem)
Databases: Postgres, MySQL, DynamoDB

How You Think

You fix root causes, not symptoms
You automate before scaling humans
You treat AWS spend as real money
You stay calm during incidents and ruthless after

Work Style

Work across IST / GMT / EST
On-call ownership when needed
Team-first, zero-ego execution

Site Reliability Engineer

If you want real ownership, real impact, and infrastructure that doesn’t page us at 3 AM — apply.