Executive summary
The use of AI in vulnerability discovery brings both significant benefits and risks, particularly the potential for generating false positives and inaccurate vulnerability reports.
An influx of unverified, AI-generated CVEs can overwhelm security databases, erode trust in the research process, and divert attention from genuine threats.
Real-world cases, such as the shutdown of curl’s bug bounty program, highlight the operational challenges and negative impacts of low-quality AI-driven submissions.
Human oversight is crucial for validating AI findings, ensuring that only legitimate vulnerabilities are reported, and maintaining the integrity of the CVE system.
In the ever-evolving landscape of cybersecurity, the application of artificial intelligence (AI) in vulnerability discovery has emerged as a powerful tool. However, this tool, like any technology, carries risks that must be carefully managed.
One significant concern is the potential for AI systems to generate false positives, leading to an influx of false vulnerability (nonsense) reports. If these unverified findings are submitted without proper validation, they can result in an abundance of false CVE IDs, which would complicate the identification of real security threats.
The nature of AI in vulnerability detection
AI tools can analyze code and systems to identify potential vulnerabilities, often by leveraging patterns and data from past CVEs. Although these tools can be effective, they are not infallible. They might misinterpret code snippets, misidentify a pattern unrelated to actual exploits, or mislabel nonissues as vulnerabilities (i.e., flag a file as vulnerable when it is benign), which can all lead to false positives.
The problem of false positives
False positives in AI-driven vulnerability detection can manifest in various ways. An AI tool might detect a pattern resembling a known exploit but fail to validate whether it truly poses a risk.
This oversight can result in the creation of a new CVE, even when no actual vulnerability exists. Such false positives can then be submitted to MITRE, leading to the assignment of CVE IDs that may lack genuine significance.
The consequences of unverified submissions
The proliferation of false CVEs can have severe implications. These IDs can clutter the CVE database, making it more challenging to identify and address actual security issues. This would lead to information overload and decreased efficiency in addressing real vulnerabilities. Worse, malicious actors might exploit this system to generate CVE IDs for their own purposes, further complicating the security landscape.
There is also the potential for a barrage of false-positive vulnerabilities to damage the reputation and consumer confidence in a brand or vendor. If a product or platform is perceived to be laden with dangerous bugs, then a customer may feel that it presents too high a risk for their organization, and look into switching vendors.
Additionally, if a wave of unverified vulnerabilities are published without coordinating or validating with a vendor, it can create a lot of unnecessary work and bog down a company’s resources. For example, the public relations department may have to quickly issue statements to address the situation, and the engineers on the back end will have to rush through cycles to investigate and validate the numerous claims being made.
A real-world example
The curl utility is shutting down their bug bounty program due to AI slop bug reports. We believe this will become more of a problem as vulnerability researchers turn to AI when discovering new vulnerabilities. This new AI technology will also surely lead to the discovery of verifiable and legitimate bugs, as well, but those will be slower to verify as developers and vendors must sift through all bug reports, legit and not.
The closure of curl's bug bounty program because of an influx of low-quality AI-generated bug reports highlights the challenges faced by developers and vendors in verifying and addressing legitimate vulnerabilities. This situation has led to a slower verification process as humans sort through numerous, often inaccurate, AI-driven reports.
The shutdown also risks demotivating contributors and potentially influencing other companies to follow suit, which would reduce the overall accumulation of resources available for security issue identification.
Addressing this issue requires improved systems for filtering and validating reports. Improved systems may lead to changes in how bug bounty programs manage AI contributions and help ensure the quality and reliability of the submissions.
An AI-generated example
The following figure is a sample generated from Claude Code with Vivotek firmware that I had been reverse engineering. The vulnerability isn’t exploitable as the user supplied input is an integer not a string so injecting commands isn’t possible.
OS Command Injection in apply_ipfilter_rule()
CVSS score: 9.8 (CRITICAL)
CWE: CWE-78
Location: 0x0000a0cc
Description
Complex iptables rule construction with unsanitized IP range parameters passed to shell commands via popen().
Vulnerable code
// Constructs command like: /usr/sbin/confclient -s ipfilter_ipv4list_0_"0;malicious;echo"
snprintf(PTR_DAT_0000a4fc, 0xff, PTR_s__sipfilter_ipv4list_i_d__s__s_____0000a530, PTR_DAT_0000a4dc, // confclient script
param_3, // List index (user controlled)
param_1, // IP range START (unvalidated)
param_2); // IP range END (unvalidated)
pFVar5 = popen(PTR_DAT_0000a4fc, "r"); // Passes to shell
The role of the researcher: Human oversight is crucial
AI is a valuable tool, but it must be tempered with human expertise. Researchers who use AI tools without understanding their underlying logic run the risk of missing errors and creating unreliable findings.
It is not responsible to rely solely on AI; human oversight is essential to verify and validate results to ensure that only genuine vulnerabilities are reported.
Mitigating the risks
To mitigate these risks, the security community must adopt a cautious approach. Researchers should verify AI findings through manual analysis and experimentation before submission.
Additionally, guidelines for the responsible use of AI in vulnerability research should be established. These guidelines should include mandatory verification steps and training for researchers to enhance their understanding of the limitations of AI.
The broader implications
The community faces not just technical challenges from the use of AI in vulnerability discovery but must also confront broader implications. A flood of low-quality CVEs can erode trust in the security research process and divert attention from critical vulnerabilities. It is imperative to maintain the integrity of the CVE system by ensuring that it remains a reliable resource for developers and organizations.
Many organizations offer payouts to vulnerability researchers in the form of bug bounties after the discovery and responsible disclosure of a vulnerability in a vendor’s product. Although these programs have a lot of value to both parties, they can sometimes get bogged down by false-positive reporting.
AI tools: More speed and accessibility with little to no validation
Researchers who participate in these programs have a large financial incentive to churn out as many vulnerability reports as they can, and by using AI tools, the speed at which they can create these reports increases exponentially.
Individuals with little to no experience in programming are already creating apps and services through “vibe coding” (AI-assisted code generation), which may sound great in theory, but if used improperly will introduce security flaws and efficiency issues.
And the same dangers exist for AI-assisted vulnerability research. Individuals with little to no knowledge in vulnerability research may be inclined to do the research equivalent of vibe coding and outsource AI tools to do all the work for them, with little to no validation.
Who cares if the AI-generated vulnerability report is accurate when your actual effort is low and you can pump out reports far quicker than doing it manually? All you need is for a few of them to be valid for you to get a quick payout.
Meanwhile, this clogs up cycles for the vendors and organizations running these bug bounty programs, decreases their confidence, and lowers their incentive to rely on reports from third-party researchers.
A call for caution and responsibility
AI holds immense potential for enhancing cybersecurity, but it must be used judiciously. By integrating human oversight and adopting responsible practices, the security community can harness AI's power without compromising the accuracy and reliability of vulnerability reporting.
Let’s move forward with a balanced approach, combining the strengths of AI with the wisdom of human expertise to navigate the complex landscape of cybersecurity.
Tags