Outage on 11-30-2018
Posted Nov. 30, 2018 by: wilhelms
Initial symptoms indicated a route block through one of
our IP providers. Upstream provider was contacted to
drop the IP route for this provider. Additional
information was provided by upstream after detailed
log analysis that indicated the issue was with our BGP
flapping. On site detailed inspection revealed a core
switch was showing both (redundant) 6000W power
supplies in a failure state.
Both were fully operational up until the outage. The
chances of both failing at the same time are extremely
miniscule. Our network engineer was able to provide a
stable system on one of the power supplies. We suspect
a short in one was causing both to indicate a failed
status and provide only partial power to the
switch.
This partial power state was causing BGP routes to flap
and so the IP provider mentioned above was blocking
this unstable routing state. We are going to continue
to troubleshoot and both power supplies and our spares
to get redundant power restored. We already had plans
in progress to replace this switch.