Job DescriptionJob Description
Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.
About the role
We’re building and operating high-performance, large-scale AI workload data centers — and Kubernetes is at the heart of everything we do. As a Senior Kubernetes Platform Engineer, you’ll work closely with a team of experienced engineers to design, implement, and optimize secure, bare-metal Kubernetes infrastructure for both internal workloads and managed customer environments.
You will partner on architectural initiatives, drive innovation across ingress/egress solutions, and help harden our multi-tenant Kubernetes offerings. This role is ideal for deeply technical engineers who are passionate about performance, scale, and reliability in a fast-paced AI- environment.
Responsibilities
-
Design and deploy bare-metal Kubernetes clusters at scale using RKE2
-
Collaborate with senior engineers on architectural improvements, infrastructure planning, and automation
-
Lead the design and implementation of Ingress and Egress traffic solutions, leveraging HAProxy, Cilium, and other components
-
Contribute to multi-tenant environment designs including VPC-level isolation, network policy enforcement, and secure shared services
-
Drive continuous improvement around observability using Prometheus and related tooling
-
Serve as a subject matter expert in core Linux, networking, and Kubernetes internals
-
Collaborate cross-functionally with AI platform teams and internal/external customers
Required Experience
-
7+ years of experience in infrastructure engineering roles at a CSP or hyperscaler environments
-
5+ years hands-on experience managing Kubernetes in bare-metal environments
-
Proven expertise in designing multi-tenant Kubernetes clusters with strong network isolation
-
Deep understanding of Linux systems internals, networking (IPTables, CNI plugins, BGP), and DNS
-
Experience with ingress controllers, load balancing, and service mesh (e.g., HAProxy, Cilium, Envoy)
-
Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible
-
Experience monitoring Kubernetes workloads with Prometheus and related observability tools
Experience
-
Familiarity with RKE2, Rancher, or other downstream Kubernetes distributions
-
Exposure to AI/ML infrastructure workloads or GPU resource scheduling
-
Experience in infrastructure compliance or secure multi-tenancy (e.g., PCI, SOC2)
What We Bring
-
Mission driven company
-
Competitive Salary
-
Stock Options
-
100% paid Medical, Dental, and Vision insurance
-
Flexible PTO
-
Paid Holidays
-
401(k)
-
Parental Leave
-
Flexible Spending Account
-
Short Term Insurance
-
Life and Voluntary Supplemental Insurance
-
Mental Health Benefits through Spring Health
We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.
Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to , color, , , , , origin, or veteran status.