Skip to main content

GPU Infrastructure & AI Platform Engineer

Job DescriptionJob DescriptionGPU Infrastructure & AI Platform EngineerRole Overview

We are seeking a hands-on engineer to deliver end-to-end GPU infrastructure and AI/GenAI environment deployment within a lab or data center. This role covers hardware installation, platform setup, infrastructure optimization, and monitoring implementation, ensuring a fully operational and validated environment.

Key Responsibilities

  • Install and rack-mount GPU servers, including cabling, firmware/OS baseline configuration, driver installation, and integration testing
  • Set up AI/GenAI environments using container runtimes (Docker/Kubernetes) and deploy inference tooling, delivering at least one validated use case
  • Perform rack modernization and infrastructure cleanup, including audit, optimized rack design, equipment reorganization, and structured power/data cable remediation
  • Implement monitoring solutions for GPU servers and lab infrastructure, including dashboards, alerts, agent deployment, and documentation handover

Required Skills

  • Experience with GPU servers and data center environments (rack, power, cabling)
  • Strong Linux administration and system configuration
  • Knowledge of GPU drivers, CUDA, and performance validation
  • Experience with Docker and/or Kubernetes
  • Familiarity with AI/GenAI inference tools (e.g., Triton, vLLM, Ollama, or similar)
  • Experience with monitoring tools (Prometheus/Grafana, Zabbix, or equivalent)

Experience

  • 5+ years in systems, infrastructure, or data center engineering
  • Proven experience delivering GPU or AI infrastructure deployments

Profile Summary

A hands-on infrastructure engineer capable of delivering GPU hardware, AI platform setup, rack optimization, and monitoring end-to-end.