Amazon Lab126 is an inventive research and development company that designs and engineers high-profile consumer electronics. Lab126 began in 2004 as a subsidiary of Amazon.com, Inc., originally creating the best-selling Kindle family of products. Since then, Lab126 has produced devices like Fire tablets, Fire TV, Amazon Echo, and Dash Button. The Device OS team is a big part of creating these innovative devices at Lab126 providing the core OS, platform, features and components.
In the Device OS group, we are inventing the future for consumer electronics and are looking for a System Development Engineer III to help us bring the vision into reality and solve real world challenges that will transform our customers’ experiences in ways we can’t even imagine yet. The team develops scalable cloud solutions that enable our partners to quickly build and launch services/ devices quickly pin a cost-effective way. If you love to be hands on designing and implementing quality platform for our consumer electronic devices while working with a world class, highly accomplished team, we would love to talk with you.
This role is specially for engineers who have extensive experience in DevOps/ SRE roles and at the same time meeting SysDev guideline.
As a System Development Engineer III, you will technically contribute to a complex charter of building and delivering T1 cloud services for Device OS. You will implement initiatives defined as part of roadmap of multi year business critical cloud technology solution which will be rolled out of multiple devices across millions of devices. Work on delivering technical initiatives that are defined to drive cost optimization across various AWS environments, manage availability, latency and performance of our mission critical services and build automation to prevent problem recurrence. You will periodically participate in reviewing capacity planning, sizing and optimization of Cloud platform. You will work closely with Platform and application teams to ensure the highest level of quality for the Device OS deliverable.
You will act as a technical leader, driving architecture decisions, improving system reliability, mentoring engineers, and partnering with product and development teams to deliver resilient solutions at scale.
Key job responsibilities
• Design, implement, and operate highly available, fault-tolerant systems on AWS
• Architect solutions using AWS services such as EC2, EKS, ECS, ALB/NLB, API Gateway, Lambda, RDS, DynamoDB, S3, CloudFront
• Lead design reviews and influence long-term platform and reliability strategy
• Define and manage SLIs, SLOs, SLAs, and error budgets
• Drive improvements in availability, latency, performance, and scalability
• Lead incident response, root cause analysis (RCA), and post-incident reviews
• Reduce toil through automation and operational best practices
Automation & Infrastructure as Code
• Build and maintain Infrastructure as Code (IaC) using Terraform, CloudFormation, or CDK
• Automate deployments, scaling, and recovery using CI/CD pipelines
• Develop internal tools and scripts using Python, Java, Go, or Bash
• Implement robust monitoring, logging, and alerting using CloudWatch
• Optimize capacity planning and cost using AWS cost optimization techniques
• Ensure systems meet security, compliance, and operational standards
Leadership & Collaboration
• Act as a technical mentor for junior and mid-level engineers
• Collaborate with product, application, and security teams to deliver end-to-end solutions
• Influence engineering best practices and operational standards across teams
A day in the life
• Design and operate scalable, reliable AWS-based systems
• Develop automation and tooling using Python/Java/Go
• Manage infrastructure using Terraform/CloudFormation/CDK
• Monitor production systems and proactively resolve issues
• Participate in on-call, lead incident response, and drive RCAs
• Define and improve SLIs, SLOs, and error budgets
• Optimize system performance, availability, and AWS costs
• Collaborate with application teams and mentor engineers