Job DescriptionJob Description
Director of Platform Engineering & Operations
Location: Charlotte, NC (Onsite)
Reporting to: Chief Technology Officer (CTO)
South End Charlotte, Software company.
In Office
Role Summary
The Director of Platform Engineering & Operations is responsible for COMPANY’s entire technology platform—overseeing both customer-facing systems and internal infrastructure to ensure 24x7 availability, security, and scalability across Azure cloud and on-premise environments. This hands-on leadership role balances technical execution and strategic management, including building a high-performing team, driving operational excellence, implementing security controls, and supporting the company’s rapid growth.
Key Responsibilities
Leadership & Strategy
· Build, mentor, and retain a team of 8 engineers across infrastructure/network, DevOps/SRE, and desktop/end-user support, providing technical coaching, career development, and performance management.
· Own platform strategy, roadmap, and execution to meet business goals and customer SLAs
· Define and track operational KPIs (availability, MTTR, change success rate, incident volume, cloud cost efficiency) and present regular updates to the CTO and executive team.
· Take full ownership of platform strategy, roadmap, and execution aligned with business objectives, product needs, and customer SLAs.
· Establish operational cadence: incident reviews, change advisory board, service desk metrics, team retrospectives, and continuous improvement culture.
Platform Operations & Architecture
· Own the design, implementation, and 24x7 operation of COMPANY’s hybrid infrastructure (Azure + on-premise) supporting both production and internal corporate systems.
· Ensure high availability, scalability, performance, security, and cost efficiency across all environments.
· Hands-on architecture and implementation of cloud infrastructure, networking, management (Azure AD/Entra, RBAC), storage, backup, monitoring, and observability.
· Drive cloud optimization initiatives: rightsizing, reserved capacity, architectural improvements, and cost governance across Azure workloads.
· Define and enforce platform standards for networking, security, , logging, alerting, and operational discipline.
DevOps & Site Reliability Engineering
· Lead DevOps and SRE transformation: implement CI/CD pipelines, Infrastructure as Code (Terraform, ARM/Bicep), containerization (Kubernetes), and modern deployment practices
· Hands-on implementation of Kubernetes clusters, container orchestration, service mesh, and cloud- architecture patterns
· Establish SRE principles: error budgets, SLOs/SLIs, blameless postmortems, observability (metrics/logs/traces), and reliability engineering culture
· Build and optimize CI/CD tooling and workflows to improve release velocity, reduce deployment risk, and increase developer productivity
· Implement robust change management processes (risk assessment, testing, communication, rollback procedures) that balance speed, safety, and audit readiness
Information Security & Compliance
· Implement security and compliance controls, including access management, logging and monitoring, vulnerability management, incident response, and audit evidence collection.
· Establish security best practices across infrastructure: network segmentation, firewall rules, encryption (data at rest/in transit), secrets management, privileged access management.
· Lead incident response for infrastructure and platform issues, including root cause analysis, remediation, and process improvements.
· Own Disaster Recovery strategy and execution: define RPO/RTO targets, architect multi-region and hybrid DR solutions, develop runbooks, and conduct regular DR testing
· Ensure backup and restore capabilities across all critical systems with documented procedures and validated recovery processes
Desktop & End-User Support
· Oversee desktop, endpoint, and telecom services (laptops, mobile devices, productivity tools, collaboration platforms, voice/conferencing) to deliver reliable, secure employee experiences
· Implement IT service management practices (incident, request, problem, asset management) with clear SLAs and user satisfaction metrics
· Manage vendor relationships across infrastructure, telecom, SaaS, and managed services—evaluate contracts, optimize licensing, and ensure service quality
Required Qualifications
· 10+ years of progressive experience in IT infrastructure and operations, with at least 3–5 years in a leadership role managing teams delivering hybrid cloud environments.
· Deep expertise with Microsoft Azure including compute (VMs, App Services, Functions), networking (VNets, NSGs, load balancers), (Azure AD/Entra, RBAC), security, monitoring, and cost management.
· Proven track record architecting and operating highly available, mission-critical systems supporting 24x7 customer-facing platforms at enterprise scale.
· Strong background in security and compliance, with experience implementing controls
· Demonstrated leadership of DevOps/SRE teams with hands-on experience building CI/CD pipelines, managing Kubernetes clusters, implementing Infrastructure as Code (Terraform, ARM/Bicep), and operating observability platforms
· Solid understanding and ownership of change management processes (ITIL or similar) including change advisory boards, risk assessment, and audit-ready documentation.
· Hands-on experience designing and executing Disaster Recovery strategies in cloud and data center environments, including DR testing and runbook development.
· Experience overseeing desktop/end-user support and telecom services in a growing, distributed organization.
· Proven ability to recruit, develop, and retain high-performing technical teams with a coaching-oriented leadership style
· Excellent communication and stakeholder management skills—ability to translate technical complexity into business impact for executive and non-technical audiences
· Thrives in fast-paced, dynamic environments with rapidly changing priorities and ambiguity
· Strong ownership mentality: you take accountability for outcomes, drive issues to resolution, and lead by example
Skills
· Experience in B2B SaaS, telematics, fleet management, IoT, or other real-time, data-intensive platforms serving enterprise customers
· Familiarity with ITSM tools (Jira Service Management, ServiceNow), configuration management databases (CMDB), and IT asset management practices
· Experience with observability and monitoring platforms (Datadog, New Relic, Prometheus/Grafana, Azure Monitor, Application Insights)
· Background supporting real-time GPS tracking, vehicle telematics, or IoT device management platforms
· Relevant certifications: Microsoft Certified: Azure Solutions Architect Expert, Azure Administrator Associate, CISSP, CISM, ITIL Foundation or higher
· Experience scaling infrastructure to support rapid business growth (2x–3x revenue in 2–3 years)
· Prior experience operating in regulated or compliance-driven environments (SOC 2, ISO 27001, HIPAA, FedRAMP)
· Hands-on experience with Azure Kubernetes Service (AKS), Azure DevOps, GitHub Actions, or similar CI/CD platforms
· Understanding of fleet management industry compliance requirements (FMCSA, ELD mandates, hours-of-service regulations)