Job DescriptionJob Description

Director of Platform Engineering & Operations

Location: Charlotte, NC (Onsite)
Reporting to: Chief Technology Officer (CTO)

South End Charlotte, Software company.

In Office

Role Summary

The Director of Platform Engineering & Operations is responsible for COMPANY’s entire technology platform—overseeing both customer-facing systems and internal infrastructure to ensure 24x7 availability, security, and scalability across Azure cloud and on-premise environments. This hands-on leadership role balances technical execution and strategic management, including building a high-performing team, driving operational excellence, implementing security controls, and supporting the company’s rapid growth.

Key Responsibilities

Leadership & Strategy

· Build, mentor, and retain a team of 8 engineers across infrastructure/network, DevOps/SRE, and desktop/end-user support, providing technical coaching, career development, and performance management.

· Own platform strategy, roadmap, and execution to meet business goals and customer SLAs

· Define and track operational KPIs (availability, MTTR, change success rate, incident volume, cloud cost efficiency) and present regular updates to the CTO and executive team.

· Take full ownership of platform strategy, roadmap, and execution aligned with business objectives, product needs, and customer SLAs.

· Establish operational cadence: incident reviews, change advisory board, service desk metrics, team retrospectives, and continuous improvement culture.

Platform Operations & Architecture

· Own the design, implementation, and 24x7 operation of COMPANY’s hybrid infrastructure (Azure + on-premise) supporting both production and internal corporate systems.

· Ensure high availability, scalability, performance, security, and cost efficiency across all environments.

· Hands-on architecture and implementation of cloud infrastructure, networking, management (Azure AD/Entra, RBAC), storage, backup, monitoring, and observability.

· Drive cloud optimization initiatives: rightsizing, reserved capacity, architectural improvements, and cost governance across Azure workloads.

· Define and enforce platform standards for networking, security, , logging, alerting, and operational discipline.

DevOps & Site Reliability Engineering

· Lead DevOps and SRE transformation: implement CI/CD pipelines, Infrastructure as Code (Terraform, ARM/Bicep), containerization (Kubernetes), and modern deployment practices

· Hands-on implementation of Kubernetes clusters, container orchestration, service mesh, and cloud- architecture patterns

· Establish SRE principles: error budgets, SLOs/SLIs, blameless postmortems, observability (metrics/logs/traces), and reliability engineering culture

· Build and optimize CI/CD tooling and workflows to improve release velocity, reduce deployment risk, and increase developer productivity

· Implement robust change management processes (risk assessment, testing, communication, rollback procedures) that balance speed, safety, and audit readiness

Information Security & Compliance

· Implement security and compliance controls, including access management, logging and monitoring, vulnerability management, incident response, and audit evidence collection.

· Establish security best practices across infrastructure: network segmentation, firewall rules, encryption (data at rest/in transit), secrets management, privileged access management.

· Lead incident response for infrastructure and platform issues, including root cause analysis, remediation, and process improvements.

· Own Disaster Recovery strategy and execution: define RPO/RTO targets, architect multi-region and hybrid DR solutions, develop runbooks, and conduct regular DR testing

· Ensure backup and restore capabilities across all critical systems with documented procedures and validated recovery processes

Desktop & End-User Support

· Oversee desktop, endpoint, and telecom services (laptops, mobile devices, productivity tools, collaboration platforms, voice/conferencing) to deliver reliable, secure employee experiences

· Implement IT service management practices (incident, request, problem, asset management) with clear SLAs and user satisfaction metrics

· Manage vendor relationships across infrastructure, telecom, SaaS, and managed services—evaluate contracts, optimize licensing, and ensure service quality

Required Qualifications

· 10+ years of progressive experience in IT infrastructure and operations, with at least 3–5 years in a leadership role managing teams delivering hybrid cloud environments.

· Deep expertise with Microsoft Azure including compute (VMs, App Services, Functions), networking (VNets, NSGs, load balancers), (Azure AD/Entra, RBAC), security, monitoring, and cost management.

· Proven track record architecting and operating highly available, mission-critical systems supporting 24x7 customer-facing platforms at enterprise scale.

· Strong background in security and compliance, with experience implementing controls

· Demonstrated leadership of DevOps/SRE teams with hands-on experience building CI/CD pipelines, managing Kubernetes clusters, implementing Infrastructure as Code (Terraform, ARM/Bicep), and operating observability platforms

· Solid understanding and ownership of change management processes (ITIL or similar) including change advisory boards, risk assessment, and audit-ready documentation.

· Hands-on experience designing and executing Disaster Recovery strategies in cloud and data center environments, including DR testing and runbook development.

· Experience overseeing desktop/end-user support and telecom services in a growing, distributed organization.

· Proven ability to recruit, develop, and retain high-performing technical teams with a coaching-oriented leadership style

· Excellent communication and stakeholder management skills—ability to translate technical complexity into business impact for executive and non-technical audiences

· Thrives in fast-paced, dynamic environments with rapidly changing priorities and ambiguity

· Strong ownership mentality: you take accountability for outcomes, drive issues to resolution, and lead by example

Skills

· Experience in B2B SaaS, telematics, fleet management, IoT, or other real-time, data-intensive platforms serving enterprise customers

· Familiarity with ITSM tools (Jira Service Management, ServiceNow), configuration management databases (CMDB), and IT asset management practices

· Experience with observability and monitoring platforms (Datadog, New Relic, Prometheus/Grafana, Azure Monitor, Application Insights)

· Background supporting real-time GPS tracking, vehicle telematics, or IoT device management platforms

· Relevant certifications: Microsoft Certified: Azure Solutions Architect Expert, Azure Administrator Associate, CISSP, CISM, ITIL Foundation or higher

· Experience scaling infrastructure to support rapid business growth (2x–3x revenue in 2–3 years)

· Prior experience operating in regulated or compliance-driven environments (SOC 2, ISO 27001, HIPAA, FedRAMP)

· Hands-on experience with Azure Kubernetes Service (AKS), Azure DevOps, GitHub Actions, or similar CI/CD platforms

· Understanding of fleet management industry compliance requirements (FMCSA, ELD mandates, hours-of-service regulations)

Director of IT Infrastructure and Engineering

Director of IT Infrastructure and Engineering

Share this job now

Similar jobs