What follows is a glimpse of the procedures Akamai uses to enhance security of software during the development process. It's a process all engineering teams follow to help minimize cases where instabilities may be introduced during the build process.
While Akamai uses multiple QA tests, the entire Internet sometimes exercises edge cases that only apply to one geographic area, ISP, or customer.
First, some background: Akamai engineers develop software that is then installed on the hundreds of thousands of servers comprising the Akamai distributed networks. The Akamai software development process employs an operationally-focused model similar to that practiced by other large Software as a Service (SaaS) providers. It is an adaptive model, with an emphasis on teamwork, division of roles, and risk management. In addition to software developed in-house, Akamai also makes extensive use of software written by others, including FLOSS and proprietary software from upstream vendors.
These components are modified by Akamai for reasons such as scalability, troubleshooting improvements, to report telemetry and health checks to monitoring systems, or to only use a particular piece of the software.
Akamai Engineering uses a revision control system (RCS) that puts restrictions on the amount of software changes allowed and provides check-in time notification tracking of source code submissions. Check-out and check-in of source code is authenticated with public-key cryptography to provide both access control and non-repudiation.
Changes are rolled out in a staged fashion, called "phases", to ensure minimal impact to Akamai services. Akamai installs the software over secure connections with safeguards to check that components are of the correct revision and have not been modified in transit.
When each phase has proved to be running and stable, the decision is made to deploy the rest. Akamai calls this the “world” phase. At any phase there is a documented process for how to roll back the install. This is because the software will not automatically refrain from sending traffic to the newly installed regions, even if there's a problem. In many cases, when a rollback is needed, the impacted servers will self-suspend.
Software installs are typically done in “groups” when we take the rollout global to minimize impact on Akamai services. If a region has 3 servers, for example, only 1 server would be installed at a time. To ensure continuity, Akamai does not install entire regions at once.
These controls are designed to minimize cases where a change to software causes unintended impacts to the network during installs, thus allowing us to continue to serve our customers.