Skip to main content

Graphics Processing Unit (GPU) Engineer - Top Secret/SCI

Job DescriptionJob DescriptionSalary:

Location: Bethesda, MD

Category: Systems Engineer

Travel Required: No

Remote Type: Onsite

Clearance: Top Secret/SCI

Sunayu, LLC is looking for a highly skilled Systems Engineer with deep expertise in operating systems, hardware, GPU, and high-speed networking. In this role, you will design, develop, and optimize GPU clusters that power enterprise AI for the mission customers.

This is a 100% on-site position. All work must be performed at the customer site in Bethesda at the Intelligence Community Campus.

Primary Responsibilities

  • GPU Cluster Engineering: Design, configure, and maintain GPU Clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements.
  • Operating System Integration:Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates.
  • Performance Optimization:Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers.
  • Tooling and Automation:Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt.
  • Compliance & Documentation: Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards.

Basic Qualifications

  • Bachelor's or higher degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree.
  • 10+ years of relevant systems engineering experience
  • Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s).
  • Knowledge of enterprise server components (storage/network controllers, HBA, SSDs).
  • Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky).
  • Excellent problem-solving skills and the ability to collaborate within a team.
  • Candidate must, at a minimum, meet DoD 8140/8570- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).

Clearance

  • Due to the nature of the government contracts we support, US Citizenship is required.
  • TS/SCI clearance with Polygraph required or a TS/SCI and willingness to obtain a Polygraph prior to starting.

Qualifications

  • Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow).
  • Familiarity with GPU virtualization and cloud computing.
  • Experience with Prometheus/Grafana for monitoring.
  • Knowledge of distributed resource scheduling systems (Slurm (), LSF, etc.).