Job DescriptionJob Description

At Holiday Inn Club Vacations, we believe in strengthening families. And we look for people who exhibit the courage, caring and creativity to help us become the most loved brand in family travel. We’re committed to growing our people, memberships, resorts and guest love. That’s why we need individuals who are passionate in life and bring those qualities to work every day. Do you instill confidence, trust and respect in those around you? Do you encourage success and build relationships? If so, we’re looking for you.

The Manager, Site Reliability Engineering (SRE) is a hands‑on technical leader responsible for the health, performance, and reliability of all infrastructure systems across our hybrid environment. This includes on‑premises data centers, private cloud, and public cloud platforms. The SRE Manager leads a distributed team of five administrators and engineers, driving the adoption of automation, observability, and continuous improvement to support business agility and scalability. This role excludes enterprise networking, which is managed by a separate team.

KEY RESPONSIBILITIES

· Lead day‑to‑day infrastructure operations across data centers and cloud platforms (Azure, VMware, Hyper‑V, etc.).

· Drive adoption of site reliability engineering principles including automation, self‑healing systems, and infrastructure as code.

· Implement and manage modern monitoring and observability tools (e.g., Datadog) to proactively identify and address issues.

· Collaborate with security, architecture, and application teams to support cloud transformation initiatives.

· Own availability, capacity, and performance metrics for infrastructure platforms.

· Execute patching, upgrades, and operational maintenance with minimal downtime.

· Participate in disaster recovery planning and execution.

· Develop documentation, runbooks, and standard operating procedures for operational tasks.

· Coach and mentor a team of administrators and engineers, fostering a culture of ownership and accountability.

· Participate in 24/7 on‑call rotation for infrastructure outages.

QUALIFICATIONS:

· 7+ years of experience in infrastructure engineering and operations.

· Deep hands‑on experience with Azure IaaS, VMware, Hyper‑V, Windows/Linux servers.

· Working knowledge of automation and infrastructure‑as‑code (Terraform, PowerShell, etc.).

· Familiarity with CI/CD and DevOps tooling.

· Experience with backup/recovery, storage systems (Commvault, NetApp, Synology), and high‑availability design.

· Strong analytical and troubleshooting skills.

Certifications ( but not required):

· Microsoft Azure Administrator or Architect

· VMware VCP

· Linux+/RHCSA

WORK SCHEDULE/HOURS:

Hybrid work model: Monday – Thursday onsite, Friday remote. May include nights, weekends, and a 24/7 on‑call rotation during outages.

Manager - Site Reliability Engineering

Manager - Site Reliability Engineering

Share this job now

Similar jobs