Evaluating dependence on NVD

Painted image of a poppy

As I mentioned at the beginning of this year, I am trying to do a monthly blog post on what might be termed “Major Security Events”. In particular this year, I’ve written about the Ivanti meltdown, Lockbit ransomware, and the xz backdoor. These events usually emerge cacophonously and suddenly into the cybersecurity landscape, and generally get everyone’s attention “real quick”. Bitsight has good data, leading to interesting and unique insights into what is happening with these events. My bleeding-edge conceptions1 tend to come a little after the ambulance has passed and others have chased, as it is the result of careful data analysis, but I like to think the insights are more interesting. One event that is less of a loud bang and more of a slowly unfolding disaster is the reduction in capacity of the US National Institute of Standards and Technologies (NIST) National Vulnerability Database (NVD).

Discussing this now is timely because I gave a talk with Sander Vinberg at VulnCon in March covering the history and evolution of the CVE landscape, and it is finally available to stream! This talk is a “Data Science” perspective on how vulnerability reporting through CVE has changed over the years, how these changes manifest in data (and how uncareful data analysis can trip on them, leading to erroneous conclusions), and some waxing poetic about what CVE is for and what happens in its absence. This is one of my favorite pieces of research I have ever done, and I want to especially thank Sander, because I am not a trained historian, but he is and his insights really pulled my faffing around with data together in a meaningful way.

Why NVD is/was important

Nearly all of the data analysis in that talk relied on NVD. This reliance was based largely on the fact that NVD does a few things very well:

  1. Every published CVE makes it into NVD relatively quickly.
  2. They provided third party analysis of those vulnerabilities in the form of Common Vulnerability Scoring System (CVSS), Common Weakness Enumeration(CWE), and Common Product Enumeration (CPE).
  3. They have a nice API folks can query to get all the above information.
  4. They maintain a changelog so you can go back in time and see the state of the NVD at any point.

This means that more than a few organizations rely on NVD as the one true source of vulnerability information. So when a banner got posted on NVD on February 13, 2024, indicating:

“February 13, 2024: NIST is working to establish a consortium to improve the NVD program, and there will be some temporary delays in analysis efforts. For more information please review the NVD program transition announcement page.”

Speculation was rife. There was also a fair bit of noise about “WHERE ARE WE GOING TO GET VULNERABILITY INFORMATION NOW?!?!” So how bad has this degradation in service been since this announcement? Others have made similar figures, but I always like to get my hands dirty and do it myself:

Figure 1 Red tower of fear. That gap in analysis is… significant.

Figure 1 paints a picture of a growing backlog throughout 2024. However, there does seem to be a light at the end of the tunnel. Starting on June 2nd, with NVD getting back in the game. Time will tell how long it will take to handle that backlog.

That isn’t the whole story, and there have been a number of significant events since that initial announcement. Tanya Brewer bravely took the stage at VulnCon to give an update on the status of the NVD and indicated that a consortium was in the works to help fund and build and back NVD, and that NIST was prioritizing vulnerabilities, particularly those in CISA’s Know Exploited Vulnerability (KEV) Catalog and those on Microsoft’s patch Tuesday.

NIST seems to be holding up at least part of that bargain. Of the 1,117 vulns in the CISA KEV catalog as of May 31, 2024, all but 2 have been analyzed by NVD2, and those two Microsoft vulns are currently “Undergoing Analysis”. The “patch Tuesday” thing is less apparent., Since February 13, only 93 of Microsoft’s 357 vulnerabilities have been analyzed.

Where else can we look for data?

As someone who is absolutely mired in vulnerability data, I was perplexed by some of the anxiety around the degraded capacity of NVD. NVD Is not even the original font from which all CVEs and their constituent parts spring. That would be MITRE and the menagerie of Certified Numbering Authorities (CNAs) that exist. A quick primer on how a vuln (in an ideal world) becomes a CVE3:

  1. A researcher finds a bug that turns out to be a vulnerability.
  2. They report the bug to the software vendor and a totally mutually agreeable and smooth disclosure process takes place.
  3. If the vendor is a CNA, they are able to issue a CVE for the vulnerability themselves. If they aren’t, another CNA can issue a CVE with everyone’s agreement. MITRE acts as the CNA of last resort.
  4. A CVE is assigned, which eventually becomes “PUBLISHED” and then shows up in various places:
    1. on cve.org
    2. in the CVEv5 json schema via their API
    3. eventually in MITRE’s CVE github repository.
  5. NVD then analyzes the CVE and publishes more information about the CVE4.

Of note in step 4 is the CVEv5 json schema. That schema is vast, and includes a myriad of fields that can be filled in by the CNA relating to an alphabet soup of frameworks. One issue is that all but the description, references, and “affected” are optional, and most CNAs don’t bother to fill them out. Even though “affected” is required, a full 37% of the time the vendor and product information is missing5. For example, CVE-2019-0103 manages to fill in “n/a” for the product, vendor, and version, though it does have this information in the description. Extracting that information from the description is significantly harder than finding it where it should be. Anyone with kids understands the concept of “Just put it back where it’s supposed to go and everyone’s life will be much easier”. Figure 2 indicates the full extent to which CNAs can’t be bothered.

Figure 2 Information provided by CNAs in the current MITRE CVE list.

One important caveat for Figure 2 is that just because the field exists doesn’t mean that useful information is available within that field. For example, for many CVEs “exploit” simply has information such as “at the current time we have no knowledge of exploitation”. Similarly, “Credit” and “Timeline” are inconsistent across entries with some describing the full timeline from the who and when of discovery, disclosure, and publication and others simply reiterating the publication date and vendor responsible.

Speaking of inconsistencies, a major one is where exactly certain types of information get included. Importantly, NVD has always included information on the vendor, product, and version of software affected via CPE and maintains a large dictionary of CPEs others can use6. In contrast, Figure 2 indicates that most CNAs use “version” which is simply a subschema of “affected” in the CVE format that can include version information7. A similar thing happens with the “weakness”; NVD uses the CWE framework, but only one fifth of vulns include it, with some others using the “Other Problem Type” field to describe the bug in a less structured manner.

This analysis is looking at an approximate quarter century of CVEs, and it’s worthwhile to ask whether CNAs are getting any better at including this information. To examine that, I am going to take a shortcut and collapse CWE and “other problem type” together as well as CPE and “version”. I do this because if you are a human looking at vulnerability information, those might be two things you are concerned about, and don’t fully worry about the exact format the data is in.8

Figure 3 Different types of CVE information percentages over time.

The good news in Figure 3 is that the percentage of CVEs that contain specific bits of information does seem to be increasing. In particular, organizations seem more and more likely to include information about version, type, who discovered the thing and when, and remediation tips.

Two fields are bucking the trend though with exploits and CAPEC information on the decline9. CAPECs are an interesting beast; they are similar to and predate ATT&CK TTPs but never gained wide adoption. They peaked towards the end of 2023, but are rarely used. This is likely because it’s a little tough to understand exactly how an attacker may utilize a vulnerability in a particular campaign. Also maybe folks just prefer ATT&CK. Exploits are also interesting. As I’ve noted, sometimes that field just includes “No known exploitation” or “We don’t know of any PoC exploit code”. It’s very popular among some CNAs, but rarely utilized by others.

Speaking of which CNAs utilize which fields, it’s interesting to check who exactly is filling this stuff out in Figure 4.

Figure 4 Field completion rate by CNA. This is for the top 20 CNAs by total CVEs. We exclude the “Description” and “References” column because those are all 100% across the board.

What’s interesting here is that no CNA is doing everything. Most of the big CNAs are doing Vulnerability Type and Version Info, but not everyone. Linux and Jenkins refuse to provide any info on the type of vulnerability. Intel and Cisco seem averse to providing version information. We can see that the CAPEC charge was led by Patchstack. MITRE, the OG CNA and one of two Root CNAs, refuses to do anything but Description and Reference (almost) all of the time. There is really no “bad” or “good” here as, remember, the only required things are description, some sort of reference, and no blank “affected” fields so any other information is above and beyond.

That said, can we measure how bad or good CNAs are at filling in their CVE entries? To grade CNAs, I pulled out a favorite technique of mine, Item Response Theory, which is normally used to score standardized tests. In Figure 5 below, the horizontal is the rate the CNA produces CVEs, the vertical is the “CVE completeness score”. As I cannot resist including Bitsight data10, I have also colored the values by the prevalence of that CNAs CVEs in our scanning. That is the percentage of organizations we scan that we’ve detected at least 1 CVE for which that CNA is responsible.

Figure 5 CNA completeness score by CVE rate, and prevalence of CVEs.

Way out there on the right is MITRE, which is a clear outlier in multiple ways. They have obviously produced the most CVEs, and continue to crank them out at the highest rate. Because for a long time they were the only game in town, CVEs for which MITRE is the CNA have the highest prevalence. Apache (smack dab in the middle) also has a high detection rate (due to, you know, all the webservers), but also does a pretty good job of filling out their CVE worksheets. Please peruse the interactive above at your leisure.

Enter Vulnrichment

Around the time of this year’s RSAC conference, during which I participated in a panel on CVEs and gave an invited talk and gave a sponsored talk and generally took in all that our glorious industry has to offer, CISA came out with their own announcement. They were going to start a “vulnrichment” program in which they would pick up the slack from NVD on vulns they felt were important. Specifically, when the CNA fails to do so, they would fill in CVSS, CPE, and CWE information. They are technically doing this as an Authorized Data Provider (ADP), but let's not get bogged down in the bureaucratic details of CVE schemas and just know they are filling in the data11.

So have they succeeded in pickup up the slack in the last month? They’ve certainly put a dent in it. If we consider something “NVD Analyzed” once it has version information of any kind, vulnerability type information of any kind, and a CVSS score, then we can revisit Figure 1 to see the proportion of 2024 CVEs that now have complete information.

Figure 6 Percentage of CVEs in 2024 with complete analysis information.

So rather than ~75% going unanalyzed, the figure is more like 27%. And for the most part, that is due entirely to CNAs doing an OK job of filling in the data with the CISA vulnrichment adding a bit of whipped cream on top.

That’s not to say there aren’t downsides here. Data in multiple places makes collecting data for aggregate analysis harder, and usually means more looking when trying to gather intelligence. While we examined CNAs based on whether they filled in specific fields or not, they may take different approaches to frameworks and may have varying quality of the data. Folks complain about NVD data quality, but at least they were consistent in format and coverage.

Road to recovery from NVD Dependence

There are bright signs ahead for NVD. They indicate they will fill the backlog by the end of the fiscal year 🤞, and they have signed a contract with Analygence to help them clear that backlog. Hopefully that will come with other promised improvements, like integration of the CVE v5 schema rather than NVD’s own format.

Vulnrichment seems to have a bright future as well. They are analyzing vulns, growing fast, and every day covering more of what CNAs are missing. Moreover, they are also including yet another scoring system known as SSVC. That’s a subject we’ll have to return to another time, but it’s great to see more useful, actionable information around vulns being put out for the public.

So what should you do if you’ve been feeling the pain from lack of NVD information in your life? If your organization relies on the NVD API for vulnerability information, explore some other options. As stated above, MITRE and CISA both provide feeds, but some engineering work will be needed to reconcile all that data. There are, of course, folks who will package it up for you (not us) and sell you some other proprietary blobs of json served via an API and displayed in a shiny dashboard.

If you are an in-the-trenches defender, your first stop for info may just need to be cve.org instead of nvd.nist.gov. There at least you can find the same reference links to security notifications and potential remediation information. But you’re probably going to have to get used to an increasingly federated and inconsistent landscape of vulnerability info. Even if NVD does fill that backlog, they will still only provide a fraction of the potential information that they could fill in for those CVEs. Everything else is going to be scattered around and relatively inconsistent. The days of plenty may have come to an end; we must tighten our belts and roam the land for that which once spilled forth from a single fountain.

1 “Thought Leadership”
2This is to say nothing of “other” KEV catalogs.
3All bugs are not necessarily vulns and not all discovered vulns are reported. Some discovered vulns never reach the official status of CVE because of a breakdown in any of the above steps.
4 This division of labor has a long history and is something that we covered in our VulnCon talk. In short, in its early days, folks were worried about MITRE being the one and only source of CVEs as a private organization. Government funding and some heroic private sector intervention in the early 2000s made sure MITRE’s CVE issuing and NVD’s analysis were separately governed.
5 When this does occur, version information is also missing.
6 While NVD is the sole maintainer of that dictionary that is based on an extensively documented standard , the entries are not always consistent, correct, or even easy to parse. There are many complicated reasons for that and future Bitsight research will examine them.
7 Don’t get me started on how “version” information is allowed to be collected. It’s a hot mess.
8 OK, I am gonna get started. If you are a data scientist this is super annoying. CNAs can provide a single version, indicate “less than version” or “greater than version” (or both!), or they can provide a list of affected versions. Note the versions are not totally in nice SemVer format, but often in whatever the developer has decided is a good versioning system or git commit hashes. Extracting consistent information in a systematic way from these requires endless coding of corner cases. It’s honestly a hot mess. There was another very interesting talk at VulnCon about the OSV by Andrew Pollack at Google.
9 CAPEC is not explicitly in the actual full CVEv5 schema, but rather would fall under the “taxonomyMappings” section of the schema. Interestingly, in that section of the schema they mention ATT&CK but not CAPEC, even though CAPEC is the one that gets used.
10 Not by contract or anything, it’s just a personal impulse because we’ve got so much.
11 If you're interested in wading through the schema swamp, ADPs are a separate but mirrored section of the CVE schema where others can provide data that does not come from the CNA. My guess is it’s going to be a fertile ground for additional information as well as disagreement about the nature of vulnerabilities. More breaking news while this blog was being written is that CISA’s ADP section is in the official MITRE feed.