Job DescriptionJob DescriptionDescription
- Platform Architecture & Infrastructure Leadership
- Lead the design, evolution, and operation of Eagle Eye’s global Kubernetes-based platform across multiple private data centers.
- Guide architectural decisions around cluster topology, networking, storage systems, compute utilization, and platform scalability.
- Drive improvements to core platform components including ingress, service mesh, container runtimes, secret management, and automation frameworks.
- Team Leadership & Mentorship
- Manage, mentor, and grow a team of Platform Engineers, fostering a high-performance, collaborative, and ownership-driven culture.
- Provide technical leadership through design reviews, roadmap planning, and hands-on guidance.
- Develop clear engineering expectations, career paths, and a culture of continuous improvement.
- Developer Platform & Tooling
- Own the internal developer experience — including CI/CD frameworks, deployment tooling, Helm charts, automation pipelines, and environment standardization.
- Partner with application teams to simplify onboarding, improve reliability, and accelerate feature delivery.
- Observability Platform Ownership
- Oversee the design and reliability of metrics, logging, and tracing infrastructure used across all EEN services.
- Ensure observability systems (Prometheus, Grafana, VictoriaMetrics, OpenSearch/VictoriaLogs, etc.) are scalable, well-architected, and aligned to engineering needs.
- Data Systems & Storage Integration
- Provide technical leadership around the integration, performance, and reliability of large-scale distributed data systems (e.g., durable storage clusters, messaging systems, search/indexing systems, and stateful workloads).
- Oversee platform support for distributed storage technologies including Ceph and other high-throughput, high-availability storage systems.
- Ensure the platform and data layers interoperate efficiently across global regions.
- Strategic Planning & Execution
- Translate company goals into a clear, actionable platform roadmap.
- Own cross-functional initiatives that improve platform resilience, developer productivity, and operational efficiency.
- Evaluate emerging technologies and guide strategic bets that move the platform forward
Key Responsibilities
- Experience with large-scale distributed storage (e.g., Ceph) or data platforms (search, messaging, or distributed databases).
- Background with service meshes, advanced networking, or multi-cluster architectures.
- Knowledge of security architecture for Kubernetes and containerized platforms.
- Experience in hybrid or fully private-cloud environments with globally distributed workloads.
- Proficiency in Go, Python, or similar commonly used for platform tools.
Skills, Knowledge and Expertise
- 8+ years of experience in Platform Engineering, Infrastructure Engineering, or a related field, with at least 3 years in a leadership or managerial role.
- Strong hands-on experience with Kubernetes in production, including cluster design, scaling, observability, and platform integrations.
- Deep understanding of distributed systems fundamentals: networking, storage, scheduling, high availability, and stateful workloads.
- Experience leading and mentoring engineers, with proven ability to deliver large-scale platform initiatives.
- Expertise with infrastructure automation (Helm, Terraform, Ansible, GitOps tools, etc.).
- Strong technical communication skills and the ability to influence architecture across teams.
- Experience designing or operating observability platforms.