AI models target cybersecurity gaps

Richland, WA – Imagine you are the new janitor of a large apartment building and someone has stolen one of your keys. But I don’t know which is which. Were you up to the first floor of the apartment? mail room? Perhaps it is the master key for all units.

As far as you know all locks are vulnerable and all locks need to be changed to be completely secure.

But if you know exactly which key is missing, you can set the goal of quickly eliminating the threat by changing only the relevant lock.

Multiply this problem by a thousand times and you can see what cyber defenders are up to. There are over 213,800 known “keys” available, informal entry points into computer systems, commonly known vulnerabilities or bugs, that have already fallen into the hands of criminals. There are probably many more that you don’t know about. How can you track, prioritize and prevent all threats and attacks?

It is impossible for one person or team. Computer analysts share clues by entering information into multiple databases, but it’s a map of how adversaries can exploit most of these bugs to wreak havoc. i don’t have

Now, a team of scientists at the Department of Energy’s Pacific Northwest National Laboratory, Purdue University, Carnegie Mellon University, and Boise State University are looking to artificial intelligence to solve the problem. Researchers created his three large databases that compiled information about computer vulnerabilities, weaknesses, and possible attack patterns.

AI-based models automatically associate vulnerabilities with specific lines of attack that attackers might use to compromise computer systems. This effort should help defenders find and prevent attacks more often and quickly. This work is open source and parts of it are now available on GitHub. The team plans to release the rest of the code soon.

“Cyber defenders are flooded with information and lines of code. What they need is interpretation and support to prioritize. Where are we vulnerable? Can we take action?” said Mahantesh Harappanavar, chief computer scientist at PNNL, who led the whole effort.

“If you are a cyber defender, you may be dealing with hundreds of vulnerabilities in a day. You have to know what you need to do to get there, and that’s the crucial missing piece,” Harappanavar added. “We want to know the impact of the bug, how it can be exploited, and how to stop the threat.”

Mahantesh Harappanavar and Cyber Security — Developed by Mahantesh Harappanavar and his colleagues at Purdue, Carnegie Mellon, and Boise State University, and colleagues at PNNL, AI algorithms relied on limited training data to identify attack patterns and computer weaknesses and vulnerabilities. showed high accuracy in terms of connecting (Photo credit: Andrea Starr | Pacific Northwest National Laboratory)

From CVE to CWE to CAPEC: The Path to Better Cybersecurity

A new AI model uses natural language processing and supervised learning to bridge information in three separate cybersecurity databases.

Vulnerability – A specific piece of computer code that may be an opening for attack. These over 200,000 “Common Vulnerabilities and Exposures,” or CVEs, are listed in the National Vulnerability Database maintained by the Institute of Information Technology.
Weaknesses – A leaner set of definitions that categorize vulnerabilities based on what happens if the vulnerability is addressed. The Common Weakness Enumeration Database maintained by MITER Corp. lists approximately 1,000 “Common Weakness Enumerations” or CWEs.
Attack – what a real-world attack looks like by exploiting the vulnerability or weakness. Over 500 potential attack routes or “vectors” known as “CAPEC” are included in the Common Attack Pattern Enumeration and Classification resource maintained by MITER.

All three databases contain important information for cyber defenders, but are designed to help users quickly detect and understand potential threats and their origins, and mitigate or prevent these threats and attacks. In fact, there have been few attempts to integrate all three.

“If we can classify vulnerabilities into general categories and know exactly how attacks progress, we will be able to neutralize threats more effectively,” Harappanabha said. “The more sophisticated the bug classification, the more threats we can stop with a single action. The ideal goal is to prevent all exploits.”

Ashutosh Datta — Ashutosh Dutta, former researcher at PNNL (Photo credit: Ashutosh Dutta)

The research won the Best Paper Award at the IEEE International Symposium on Homeland Security Technology held in November. This research was funded by DOE’s Office of Science and his PNNL’s Data Model Convergence Initiative.

In addition to Harappanavar, the team also includes lead author Siddhartha Shankar Das from Purdue University, who was an intern at PNNL. Former PNNL scientist Ashutosh Dutta, now working at Amazon. Sumit Prohit of PNNL. Edoardo Serra of Boise State University, co-appointee of PNNL. and Purdue’s Alex Posen.

In previous research, the team used AI to correlate two resources: vulnerabilities and weaknesses. The work that produced the model V2W-BERT awarded the team of Das, Pothen, Halappanavar, Serra, and Ehab Al-Shaer from Carnegie Mellon University the Best Applied Paper Award at the 2021 IEEE International Conference on Data Science and Advanced Analytics. bottom.

AI automatically links computer bugs to potential cyberattacks

Former PNNL intern Siddhartha Shankar Das (Photo credit: Siddhartha Shankar Das)

The new model VWC-MAP extends the project to a third category, attack actions.

“There are thousands of bugs and vulnerabilities out there, and new bugs and vulnerabilities are created and discovered every day,” said Dr. Evans, who has led the development of this research since his internship at PNNL in 2019. said Das, a PhD student at Purdue University. is coming. We need to develop ways to stay ahead of these vulnerabilities, not just known ones, but also those yet to be discovered. ”

The team’s model automatically associates vulnerabilities with good weaknesses with up to 87 percent accuracy, and weaknesses with good attack patterns with up to 80 percent accuracy. These numbers are far superior to what today’s tools offer, but scientists warn that new techniques need to be tested more extensively.

One hurdle is the lack of labeled data for training. For example, today less than 1% of vulnerabilities are associated with a given attack. There is not much data available for training.

To overcome the data scarcity and perform their work, the team fine-tuned pre-trained natural language models using both autoencoders (BERT) and sequence-to-sequence models (T5). In the first approach, using a language model he related CVE to CWE, and then CWE to CAPEC through a binary link prediction approach. In a second approach, we used sequence-to-sequence techniques to convert CWE to CAPEC with intuitive prompts to rank associations. These approaches produced very similar results and were validated by our team of cybersecurity experts.

“We will publish this for others to test, look for vulnerabilities and make sure the model bins them properly,” Harappanavar said. “We sincerely hope that cybersecurity professionals will be able to test this open source platform.”

Source link