Our first line of defense is a resilient system design that allows our software to compensate for many changing conditions and possible points of failure. We maintain an array of sensors, logs, and measurements that allow us to address many problems through normal operational procedures before a customer can see their effects.
When a customer issue can't be solved by technical support within Customer Care, or when our sensors detect a problem outside of normal operations, we declare an incident. Incidents are regularly handled by cooperative effort among engineering/systems development, network operations, and Customer Care personnel. In general, the more severe the incident (we grade them 4 (mild) through 1 (severe)), the more people are involved to work on it.
In all incidents, the goal is fast problem resolution, keeping customers informed and happy, ensuring the network is safe, and focusing the work of those on the incident while minimizing the impact on the rest of the company.
We regard our incident process as one of the security measures on the Akamai system. So do our auditors.
Incidents normally start in phase one, which lasts until the immediate problem is controlled. In phase two, we work to return the system to normal operation. Often, customer communication is a focus in phase two. Phase three is when we learn from the incident and take longer-term measures for future safety.
For all severity levels, we have an Incident Manager role on hand to evaluate the severity of a situation and coordinate with others working on the problem. Many employees can receive incident management training and can volunteer as an incident manager when an issue arises.
In fact, most technical departments in the company have people who are trained to step in and manage the incident with other departments. This cross-disciplinary incident manager coordinates a short-lived project team that forms when needed and then disbands. Participants temporarily put aside their primary duties to focus on the incident at hand.
The following is a breakdown of the roles employees take on to deal with a typical incident.