Job DescriptionJob DescriptionGPU Infrastructure & AI Platform EngineerRole Overview

We are seeking a hands-on engineer to deliver end-to-end GPU infrastructure and AI/GenAI environment deployment within a lab or data center. This role covers hardware installation, platform setup, infrastructure optimization, and monitoring implementation, ensuring a fully operational and validated environment.

Key Responsibilities

Install and rack-mount GPU servers, including cabling, firmware/OS baseline configuration, driver installation, and integration testing
Set up AI/GenAI environments using container runtimes (Docker/Kubernetes) and deploy inference tooling, delivering at least one validated use case
Perform rack modernization and infrastructure cleanup, including audit, optimized rack design, equipment reorganization, and structured power/data cable remediation
Implement monitoring solutions for GPU servers and lab infrastructure, including dashboards, alerts, agent deployment, and documentation handover

Required Skills

Experience with GPU servers and data center environments (rack, power, cabling)
Strong Linux administration and system configuration
Knowledge of GPU drivers, CUDA, and performance validation
Experience with Docker and/or Kubernetes
Familiarity with AI/GenAI inference tools (e.g., Triton, vLLM, Ollama, or similar)
Experience with monitoring tools (Prometheus/Grafana, Zabbix, or equivalent)

Experience

5+ years in systems, infrastructure, or data center engineering
Proven experience delivering GPU or AI infrastructure deployments

Profile Summary

A hands-on infrastructure engineer capable of delivering GPU hardware, AI platform setup, rack optimization, and monitoring end-to-end.

GPU Infrastructure & AI Platform Engineer

GPU Infrastructure & AI Platform Engineer

Share this job now

Similar jobs