Job Description
Join one of the nation’s leading retail automotive chains in a full-time, on-site role within our expanding enterprise IT environment. The NOC Analyst plays a key role in strengthening our monitoring capabilities by ensuring alerts are properly escalated and resolved in a timely manner. They will analyze current and historical data to identify recurring issues and drive proactive improvements. They will collaborate closely with IT teams to develop monitoring tests, enhance alerting systems, and ensure efficient escalation and resolution processes. This position is critical to maintaining stability and rapid incident response across our data centers, cloud services, enterprise applications, and networks that support business operations across the organization.
Essential Duties & Responsibilities
● Work closely with cross-functional teams to design and implement testing for network, server, and application health, including performance evaluation, trend analysis, process failure detection, and uptime verification.
● Perform initial triage and resolve or escalate incidents according to established procedures and SLAs.
● Monitor enterprise systems, servers, cloud workloads, networks, and business-critical applications for performance and availability issues.
● Coordinate with Network, Systems, Cloud, Application Support, Development, and Security teams to resolve incidents.
● Maintain accurate incident documentation and ensure proper communication throughout incident lifecycles.
● Conduct system health checks and validate stability following maintenance activities or changes.
● Identify recurring issues and provide recommendations for improvements to monitoring, alerting, or processes.
Qualifications
● 2–5 years of experience in a NOC, IT operations center, or enterprise systems observability role.
● Experience in a large corporate or distributed enterprise environment; retail industry experience but not required.
● Proficiency in web application technologies, with experience in APIs, HTTP status/error codes, and end-to-end request tracing.
● Proficiency with enterprise monitoring and alerting tools (e.g., SolarWinds, Datadog, Splunk, Dynatrace, Zabbix).
● Understanding of networking fundamentals (TCP/IP, VPN, DNS, VLANs,). ● Experience with writing SQL queries
● Experience supporting Windows/Linux servers, virtualization technologies, and cloud environments (AWS/Azure/GCP).
● Strong communication and analytical problem-solving skills.
● Ability to manage multiple incidents simultaneously and work effectively under time-sensitive conditions.
● Collaborative, detail-oriented, and able to work with cross-functional technical teams.