Protect Client-Side Code and Certify the Authenticity of Data Collection
Contents
It is well known that efficient protection of web applications and websites requires using JavaScript for client-side data collection from the browser. This typically consists of device and browser characteristics, user preferences (fingerprinting), and data that reflects user interaction with their devices, such as mouse movements, touch, and key presses (telemetry).
Web security practitioners and vendors process data through various detection methods (from simple rules to advanced AI models) to verify the likelihood that a legitimate request originated from a legitimate device controlled by a human.
Combining the different data points also helps differentiate users and assess their activity over time. This is the core principle that bot management and fraud detection products use to help detect attacks such as credential stuffing, account takeover, account opening abuse, and content scraping, among others.
Data collection integrity
Ensuring the authenticity and integrity of the data is key to accurately assessing the user interaction with the site and to flagging threats. How can anyone assert the authenticity and integrity of the data while knowing that anything executed on the client side could be tampered with and manipulated?
When executing client-side JavaScript code, there are two reasons for ensuring that the code is well protected.
1. JavaScript code is part of an organization’s intellectual property. It must be protected as much as possible against threat actors and competitors.
2. Data integrity is critical to properly understanding the environment and its risk factors. Protected JavaScript ensures that the data is trustworthy because it was genuinely collected by running the script and was not manipulated or transformed.
How to protect client-side code and ensure data authenticity
As with any other aspect of security, there is no single solution that will address the problem. In this blog post, we present a collection of methods that Akamai uses to protect JavaScript code, enforce its execution, and ensure the authenticity of the data collected, including:
- Code obfuscation
- Data integrity check
- VM obfuscation
- Misleading and extra code insertion
- JavaScript code rotation
- Dynamic field rotation
- JavaScript build pipeline and data validation
Should you decide to follow similar practices to protect your own code, we recommend using a combination of these methods based on the needs of your team, organization, and tech stack.
Code obfuscation
Obfuscation is one of the most common methods used to protect JavaScript code. Obfuscation makes it more difficult to follow and understand the code.
Sound development practices recommend that naming functions and variables be as descriptive as possible and that code is structured logically for ease of debugging and maintenance. Although this is a valuable time- and effort-saving practice, clean code is an easy target for reverse engineering.
When obfuscation is applied, these sound development practices are broken, and variables and functions with descriptive names are replaced with random ones. They may be reordered and encoded, and some logic may be split. A web browser can still execute the code without trouble, and the outcome will be the same. However, anyone trying to reverse engineer the code will have a more challenging time.
Developers still use well-structured code for maintenance and enhancements. Once a new version is ready, the code is run through an obfuscation engine before being released. Several free/open source and commercial products, such as Code Beautify, JScrambler, and Digital.ai, are available to quickly and easily obfuscate JavaScript code.
Figure 1 is an example of a simple JavaScript function commonly used when fingerprinting, designed to extract various device characteristics, shown before obfuscation.
function getDeviceInfo() {
return {
userAgent: navigator.userAgent,
hardwareConcurrency: navigator.hardwareConcurrency || "unknown",
screenOrientation: screen.orientation.type,
};
}
Fig. 1: Original code before obfuscation
You can see how simple it is to understand the code in its original state. Even someone with limited coding knowledge can grasp the intended purpose and understand how it achieves its goal.
Figure 2 is the same JavaScript function after being run through the Code Beautify online tool.
(function(_0xbf521e,_0x43c80b){var _0x4ad763=_0x3e09,_0x18fc85=_0xbf521e();while(!![]){try{var_0x40d2a7=parseInt(_0x4ad763(0xfc))/(0x18d1+-0xe6d+-0xa63)+-parseInt(_0x4ad763(0xf6))/(0x2*-0x7e4+0x171a+-0x750)+-parseInt(_0x4ad763(0xfb))/(-0x2e7*-0xb+0x6b*0x1f+-0x2cdf)*(parseInt(_0x4ad763(0xef))/(0x40f*-0x4+-0x897+0x18d7))+-parseInt(_0x4ad763(0xf3))/(0x3*-0xb5f+0x462+0x1dc*0x10)*(parseInt(_0x4ad763(0xf0))/(-0xb87*-0x1+0x18e8+-0x3*0xc23))+-parseInt(_0x4ad763(0xfa))/(0x2258+0x8f7+-0x2b48)*(-parseInt(_0x4ad763(0xee))/(0x3e9+-0xe93+0xab2))+parseInt(_0x4ad763(0xf1))/(0x1*-0x81e+0x525*-0x5+0x4*0x878)+parseInt(_0x4ad763(0xed))/(-0x59*-0x1f+0x779+-0x6f*0x2a);if(_0x40d2a7===_0x43c80b)break;else _0x18fc85['push'](_0x18fc85['shift']());}catch(_0x4460fc){_0x18fc85['push'](_0x18fc85['shift']());}}}(_0x1950,-0x1f*-0x38cb+0x17f2fa+-0x10aebf));function getDeviceInfo(){var _0x7a196=_0x3e09,_0x52340e={'VEDsL':_0x7a196(0xf8)};return{'userAgent':navigator[_0x7a196(0xf4)],'hardwareConcurrency':navigator[_0x7a196(0xf2)+_0x7a196(0xfd)]||_0x52340e[_0x7a196(0xf5)],'screenOrientation':screen[_0x7a196(0xf9)+'n'][_0x7a196(0xf7)]};}function _0x3e09(_0x56cbb3,_0x1167d0){var _0xddc250=_0x1950();return _0x3e09=function(_0x363b57,_0x27d74c){_0x363b57=_0x363b57-(-0x6d9+0x1316*0x1+-0xb50);var _0x1b2eec=_0xddc250[_0x363b57];return _0x1b2eec;},_0x3e09(_0x56cbb3,_0x1167d0);}function _0x1950(){var _0x1d7105=['ncurrency','20162890GviEyp','2488DLGTpn','4rCTHCm','65154TKsGUe','7673175smCphy','hardwareCo','670lOXWEG','userAgent','VEDsL','1749116JlgXKK','type','unknown','orientatio','12971xihUJr','2027775PnQRTc','487370FufNiT'];_0x1950=function(){return _0x1d7105;};return _0x1950();}
Fig. 2: Obfuscated code (via Code Beautify)
If for no other reason than its length, the obfuscated code is clearly more difficult to understand. The code may look complex, but the methods to reverse these simpler obfuscation techniques exist and are well understood by threat actors. But at least this raises the bar to deter the least sophisticated and least knowledgeable threat actors.
Half of the battle in security is exhausting the threat actor and/or making the prospect of targeting your organization unattractive based on the perceived or actual effort required to orchestrate a successful attack.
Data integrity check
As we’ve seen, code obfuscation is a good starting point, but it is not sufficient on its own to deter motivated threat actors, as deobfuscation methods and tools exist to reverse the code to its original format. In addition to obfuscation methods, implementing extra code and data integrity check functions can further protect the integrity of the collected information.
Code and data integrity checks are small functions added at various locations throughout the code to verify that the output produced by the script is indeed legitimate. The checks typically use multiple variables, including the output of existing core JavaScript functions along with a unique seed specific to a user session to produce a secondary output.
Figure 3 is an example of a function that takes three variables as input, uses the variables within a simple mathematical formula and a hash function, and returns the result. Variables a and b could correspond to the output of two core functions, and variable c could be a unique seed. In this example, all properties must be numerical values.
function IntegrityCheck(a, b, c) {
const mathResult = a + b * c;
const stringResult = String(mathResult);
let hash = 0;
for (let i = 0; i < stringResult.length; i++) {
hash = (hash * 31 + stringResult.charCodeAt(i)) >>> 0;
}
return hash;
}
Fig. 3: Example of code with multiple variables for data integrity
More concretely, the screen.colorDepth and the navigator.hardwareConcurrency properties that both return numerical values could be used as the variables a and b in the simple function in Figure 3. That function is not actually limited to properties that return a numerical value, since any value can be hashed and transformed into an integer before being fed to the integrity check function. It was merely done that way for the sake of our simple example.
For diversity, some integrity check function may hash the output of the core function, as shown in the example in Figure 4.
import { createHash } from 'crypto';
function hashTwoVariables(a, b) {
const concatenatedString = String(a) + String(b);
const hash = createHash('sha256').update(concatenatedString).digest('hex');
return hash;
}
Fig. 4: Example hash output
There may be dozens of such small functions, each doing different operations and consuming different output from core functions sprinkled throughout the code to protect key data points. As a final check, you may also “sign” the entire payload, including all fingerprint and behavioral data, as well as the results of individual integrity check functions. One way to do this is by hashing the whole payload and comparing the initial output. If the hashes match on both the sender's and receiver's sides, the payload is considered safe and unaltered.
VM obfuscation
These simple integrity check functions cannot be left in the open or be hidden using simple obfuscation methods. This is where the more advanced virtual machine (VM) obfuscation technique comes into play, making it harder for the threat actor to understand what’s happening under the hood and how to produce a valid payload.
VM obfuscation transforms code into virtual machine bytecode: something that a machine can interpret, yet a lot more challenging for threat actors to reverse engineer.
Several vendors offer VM obfuscation methods, but VM obfuscation does not always support all types of function logic. When using VM obfuscation, follow your vendor’s guidelines and thoroughly regression test your code.
Regression testing is a great practice in general, not just for VM obfuscation, and is worth implementing as part of your security routine. However, it is particularly useful in conjunction with VM obfuscation, considering the complex code output of the method.
Misleading and extra code insertion
To make things more challenging for the threat actor attempting to reverse engineer the code, an additional layer involves adding code that serves no real purpose to the core logic. This is designed to lead threat actors off track, frustrate them, and compel them to abandon their efforts.
Similarly, you can consider varying the structure of the integrity check functions to make deobfuscation and reverse engineering more challenging. One way to achieve this is by developing several structurally distinct but equivalent functions that produce the same output.
A functionally identical but structurally different function will result in a different encoding of the function after it has undergone VM obfuscation, making the code much more complex to reverse engineer.
Figure 5 is an example of three such functions that always return the same output but are all slightly different.
function IntegrityCheck_1(a, b) {
return a + b * 1;
}
function IntegrityCheck_2(a, b) {
return a + 0 + b;
}
function IntegrityCheck_3(a, b, c) {
return a + b + c * 0;
}
Fig. 5: Three examples of differing code that achieve the same output
JavaScript code rotation
Having misleading code, advanced obfuscation, and integrity checks in place is good, but threat actors can be very persistent — and no stagnant code is ever impossible to reverse engineer, given time, effort, and skill. That is, unless we limit the validity of the script.
Imagine generating 1,000s of unique iterations of the same functionally equivalent code, each with different integrity check functions in place for each new JavaScript code release. Each iteration is only used and valid for 10 to 20 minutes, and controls are in place to force the client to reload a new iteration regularly, making older iterations quickly obsolete and invalid.
The goal of this method is to overwhelm the threat actor with complexity and to outpace their efficiency, so that they have no other option but to execute the JavaScript through a browser and remain unaware of what the code does.
Dynamic field rotation
The code may be difficult to read and decipher, but one can often infer its purpose by examining the output and the data collected and sent. Some of the information sent to the server may be obvious, especially regarding details such as device and browser characteristics.
However, it would be more difficult to infer the intent for functions that simply return a boolean, or for an integrity check function that returns an integer.
One way to make the payload structure less predictable and more confusing to the threat actors is to change the field names used to report each collected data point, as well as their relative position in the payload for each iteration.
As we’ve discussed, each JavaScript iteration has a unique set of code integrity checks. Additionally, the payload will use different field names, and the position of a given data point changes with each iteration.
The field names and their positions are defined at JavaScript build time based on a predefined algorithm that the server processing the data can also execute to retrieve the various pieces of information that are critical for accurate bot and fraud detection in the correct location.
Figure 6 illustrates how each field and its position may vary from one iteration to another. The field names must be nondescriptive to make them the least obvious.
Payload Iteration #1
mx01: [user-agent]
mx02: [display-mode]
mx03: [hardconcur]
mx04: [pixelDepth]
mx05: [language]
mx06: [WebGL_Rend]
mx07: [intg_chck_1]
Payload Iteration #2
yw01: [display-mode]
yw02: [intg_chck_1]
yw03: [user-agent]
yw04: [pixelDepth]
yw05: [hardconcur]
yw06: [WebGL_Rend]
yw07: [language]
Payload Iteration #3
za01: [language]
za02: [WebGL_Rend]
za03: [hardconcur]
za04: [pixelDepth]
za05: [intg_chck_1]
za06: [user-agent]
za07: [display-mode]
Fig. 6: Examples of field name iterations
With only seven fields in the output (as in the above example), it’s easy to spot the change from one iteration to another, but imagine doing this when hundreds of data points are collected and returned.
JavaScript build pipeline and data validation
The various methods used to protect the JavaScript code and ensure the integrity of the data collected require developing a complex build pipeline and release process. First, developers will update the raw and well-formatted JavaScript file, test the functionality, and run regression tests.
Next, developers will use an algorithm to generate the thousands of iterations, which will produce unique versions each with different:
- Data integrity check functions that vary the data points from the core JavaScript, from the mathematical/hash functions used, and from their relative position in the overall logic
- Sets of misleading or unused code
- Payload output field names
- Payload output field orders
Once these unique components are generated, the JavaScript file iteration goes through the following processes:
- Obfuscate the data integrity check and other critical functions via VM
- Obfuscate the overall code
- Upload the iteration to the web server
Once all the iterations have been generated and uploaded, the new set of JavaScript must be enabled in production. This change is coordinated with the server running the bot and fraud detection engine that receives the data. It must run part of the algorithm used in the JavaScript build system to be able to:
- Validate that the client is sending the payload of the current JavaScript iteration and not an outdated one
- Parse the different fields of the payload according to the JavaScript iteration with which it was generated
- Validate the code integrity check values by running equivalent functions
The final product, with the final obfuscation, must be thoroughly tested end to end in pre-production before release to ensure that all components are in sync and produce the expected result. This requires building a somewhat complex build workflow for JavaScript.
Still, when its content must be preserved from curious competitors and threat actors, and its output affects the security of users on the internet and the websites they visit, it becomes a worthwhile effort.
Conclusion
JavaScript code running on the client side, used to collect fingerprints and telemetry, and custom logic designed to detect bots and fraud, must be secured. Several strategies exist to protect the code and the data, but implementing one or two will only provide marginal protection against the most sophisticated threat actors.
Securing client-side code and its payload requires a complex strategy that involves multiple layers of defense and technologies, including code obfuscation, misleading or unused code, code integrity check functions combined with VM obfuscation, randomizing the payload structure to make it less predictable, and regularly updating the code.
The equation in Figure 7 summarizes the complexity of the overall combination of strategies to develop to ensure efficient protection.
[JS Code obfuscation[
+ Misleading code
+ unused code
+ VM Obfuscation [code integrity check]
+ unique field names
+ field relative position shift]
x [Number of unique iterations]
+ Limited version validity (10 minutes)
+ Force JS reload]
Fig. 7: Equation of JavaScript protection strategies
Ultimately, this combination forces the client to execute the JavaScript, reducing their opportunity to tamper with the data and defeat the detection engine. To limit the development effort, commercial solutions are strongly recommended for some of the most complex steps, such as VM obfuscation. Some strategies, however — such as code integrity checks, misleading code snippets, and multiple iterations — should be built and maintained in-house to provide protection in case a deobfuscator built by threat actors becomes available.