Job Description
The role is to support upcoming deployments, specifically in production-grade NPI / Production Lab environments. This role is end-to-end execution for bringing up, expanding, and operating NPI
infrastructure in Ashburn and Santa Clara, partnering closely with hardware, software, operations,
and capacity planning teams to deliver reliable, production-parity lab environments.
NPI / Prod Lab Execution (Core)
- Execute onsite build and expansion of Production NPI environments (rack/stack readiness coordination, bring-up, validation, handoff)
- Physical requirements: ability to lift 40–50 lbs, stand for extended periods, and work on ladders.
- Drive deployment plans, cutovers, and upgrades with clear documentation, checklists, MOP writing, peer review, change approval processes, risk management, and rollback planning.
- Coordinate with cross-functional partners and vendors to manage dependencies and resolve blockers quickly.
Data Center Network (DCN) / Fabric Engineering (Core)
- Build, configure, and troubleshoot data center fabrics (e.g., spine-leaf / Clos) and understand server + switch architecture.
- Operate L2/L3 networks using protocols such as BGP/MP-BGP, EVPN/VXLAN, and an IGP (e.g., OSPF, where applicable).
- Perform migrations and hardware/software upgrades with minimal disruption.
Optical Switching / Transport, Copper/DAC Cabling & Physical Layer (Core)
- Configure, deploy, and troubleshoot optical switching platforms such as Calient and/or
- Telescent (experience with either is valuable).
- Work with multi-vendor optics/cabling; apply best practices for fiber handling, inspection, cleaning, and troubleshooting.
- Use optical diagnostics and tools as needed (e.g., OTDR, power meters) and interpret
- transceiver diagnostics (DOM/DDM, BER, loss).
Test, Validation & Commissioning
- Execute lab and production acceptance testing for new deployments and upgrades.
- Use traffic /validation tooling (e.g., IXIA and/or Spirent TestCenter) to validate throughput, loss, latency, and failover behavior.
- Support regression and repeatability efforts by documenting test plans and results.
- Inventory and asset management: track hardware (switches, optics, patch panels, etc.), manage asset tracking, RMA processes, and inventory reconciliation.
Observability / Monitoring / Alert Triage
- Implement and use monitoring/telemetry to maintain reliability and speed up troubleshooting, using combinations of:
- gNMI-based telemetry ()
- IPFIX/NetFlow/sFlow, syslog, SNMP
- Active probes (ICMP, RPM/IP SLA style)
- Triage monitoring alerts during deployments and participate in break/fix activities, escalation paths, and incident triage during and after bring-up.
- Support deployment and cutover activities during maintenance windows, including evenings and weekends as required.
Required Qualifications
- Hands-on experience delivering and troubleshooting network infrastructure in lab and/or production environments.
- Minimum years of relevant experience in network engineering or infrastructure deployment.
- Experience with data center switching/routing concepts and operational troubleshooting.
- Familiarity with a subset of the following (depth can vary based on focus area):
- DCN fabric concepts (spine-leaf/Clos), routing/switching fundamentals
- BGP (and related routing concepts), EVPN/VXLAN, OSPF (where applicable)
- Telemetry/monitoring and incident-style troubleshooting
- Strong cross-functional coordination skills and clear written/verbal communication.
- Willingness to work onsite in Ashburn and/or Santa Clara as required by build and deployment schedules.
- Hands-on experience of IPv6, DHCPv6 and SLAAC
- Experience with different transceivers (SFP+, QSFP, OSFP, OSFP etc) and cables (DAC, AES, Fibre etc)
- Server Architecture and component knowledge
- Familiarity with internal network management and provisioning tools (training provided).
Qualifications (Nice to Have)
- Hands-on with Calient and/or Telescent optical switching platforms.
- Experience with IXIA / Spirent TestCenter or other traffic tools.
- Strong optical fundamentals: fiber standards, inspection/cleaning, OTDR/power testing,
- DOM/DDM, BER analysis.
- Familiarity with technologies, including flow telemetry (IPFIX, sflow), and Precision Time Protocol
- Experience with gNMI telemetry and modern monitoring stacks.
- Experience with version control (e.g., Git), configuration management, or automation
- frameworks (helpful, but not required).
- Scripting/automation experience (Python/Go/Bash) for improving repeatability and reducing manual operational load.
- Experience with different transceivers (SFP+, QSFP, OSFP, OSFP etc) and cables (DAC, AES, Fibre etc)