Founded in 1821, Douglas is a leading supplier of beauty products and cosmetics in the European retail market across 24 countries. Through 2,400 brick-and-mortar stores, an online shop, and its mobile application, Douglas offers more than 50,000 products.
In recent years, this multi-billion-dollar retailer faced burgeoning competition from exclusive cosmetic chains and bargain pricing by discounters, drug stores, and online retailers. Like many retailers, it began experiencing downward pricing and margin pressure due to the heavy operational costs of its physical stores. A big problem is the exorbitant rents and the cost of staffing stores, which can quickly become more of a problem when sales are declining. As a result, it became a strategic imperative for Douglas to expand its online revenues.
With a growing online business, Douglas needed to push new code and configurations multiple times per day. It’s common for things to break while operating at high velocity. In one instance, a code release caused a few broken internal links, resulting in HTTP 404 errors for site visitors and search engine crawlers.
Unfortunately, it took some time before the operations team discovered this because the data wasn’t available in real time. Yet such downtime is expensive from both a revenue and brand perspective, as it negatively impacts SEO and search engine visibility.
The DevOps teams use application performance monitoring (APM) tools for distributed tracing and customer analytics tools for end-user browser activity and traffic monitoring. However, this data is either geared toward business insights or limited by selective page instrumentation with JS tags. The DevOps teams needed to go one layer deeper into HTTP logs to discover, trace, and fix errors faster. They also needed real-time visibility into all layers of their technology stack, including the CDN-enabled middle mile.
Businesses like Douglas must know what’s happening at the edge of their networks in real time and bake that information into a holistic system health monitoring view with integrated log feeds from other layers of the stack. This visibility and control become increasingly indispensable as companies progressively move content and application logic to the edge away from the congested origins.
With DataStream from Akamai, Douglas’ DevOps teams could quickly trace and fix errors. More specifically, they gained programmatic access to real-time log data, enabling high-velocity, streamlined development and operational workflows. In addition to enabling faster discovery and mean time to recover, this provides the foundation for the agile DevOps model that Douglas is progressively implementing.
With the help of Push APIs, Douglas can operate on a low-cost, scalable serverless architecture, eliminating the need for servers forever polling APIs for data. Log collection runs automatically and regularly with DataStream pushing six raw log streams to the respective cloud storage buckets through the processing pipeline. At the same time, Douglas has the necessary controls to individually turn data streams on or off. Once the logs have been pre-processed and HTTP status codes are aggregated at the 300, 400, and 500 levels, operators can stay constantly informed about anomalies.
The risk of code breaks is particularly high following a new deployment. The aggregated logs from DataStream or third-party analytics systems such as Kinesis are piped to dashboarding and alerting tools like CloudWatch, Athena, or Grafana. When the aggregated metrics indicate unusual error patterns or trigger anomaly alerts — such as for a high error count over the predefined threshold — operators are informed in near real time. They can then pull raw logs to drill down to the cause and correlate with data from other layers of the stack for the time preceding the anomaly. It’s also possible for them to score anomalies and detect patterns. Both the raw and aggregated logs are stored as a short-term data buffer and as long-term storage for historical analysis.
Douglas runs a proprietary analytics engine that enables detailed and customized downstream aggregation by attaching useful qualifiers to logs — namely, URL pattern ID or user agents. Because these qualifiers help categorize the logs by page groups (e.g., product page, search page, category page), the logs can be piped to the respective code owners in the dev teams and can help provide meaningful alerting for the right people. Now both the dev and ops teams have the same near real-time visibility, augmenting their DevOps agility.
Douglas is exploring how to tie together and correlate the data from DataStream with APM and customer analytics systems. Doing so will enable the teams to correlate a site visitor’s browser activity with how the back-end system did or didn’t respond to it. In this case, DataStream will show what was served from cache and the latencies between origin, edge, and end user. With this insight, the operations team can quickly and efficiently remediate errors and tune CDN performance for maximum business value.
“Now both our development and operations teams can see errors in near real time, act fast and mitigate quickly, minimizing downtime,” concludes an online retail IT technology specialist for Douglas.
With around 2,400 stores and high-growth online shops in 24 European countries, Douglas is Europe’s leading premium beauty retailer. In fiscal year 2017/2018, the company generated sales of 3.3 billion euros. Around 20,000 Douglas Beauty Advisors strive daily to make their customers more beautiful and therefore happier. Douglas offers around 50,000 high-quality products from more than 650 brands in the fields of perfumery, decorative cosmetics, and skin care, as well as nutritional supplements and accessories. With around 40 million Beauty Card holders, Douglas has one of the largest customer loyalty programs in Europe. Thanks to its excellent advice and unique services, Douglas is one of the top addresses for beauty — both stationary and online.