Skip to content

zhangj111/VulTeller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

VulTeller: Learning to Locate and Describe Vulnerabilities

This repository includes the experimental data for our work published at ASE 2023. Automatically discovering software vulnerabilities is a long-standing pursuit for software developers and security analysts. Since detection tools usually provide limited information for vulnerability inspection, recent work turns the attention to identify fine-grained vulnerabilities, i.e., vulnerable statements. However, existing work for vulnerability localization struggles to capture long-range and integral dependency information due to the bottleneck of Graph Neural Networks (GNNs). Moreover, little research has been done to help developers understand detected vulnerabilities, leaving vulnerability diagnosis a challenging task. In this paper, we propose VulTeller, a deep learning-based approach that can automatically locate vulnerable statements in a function and more importantly, can describe the vulnerability. Our approach focuses on extracting precise control and data dependencies in the code, achieved through modeling control flow paths and employing taint analysis. We design a novel neural model that encodes the control flows and taint flows which reside in the control flow paths, and decodes them via node classification and an attentional decoder for the two tasks respectively. We conduct extensive experiments with real-world vulnerabilities to evaluate the proposed approach. The evaluation results, including quantitative measurement and human evaluation, demonstrate that our approach is highly effective and outperforms state-of-the-art approaches. Our work for the first time formulates the problem of vulnerability description generation, and makes one step further towards automated vulnerability diagnosis.

Due to the involvement and support of Desay SV in this project, and considering its utilization for patent-related purposes, we are unable to release the tool as open-source. We are exploring possibilities to contribute to the community in other ways that do not conflict with these obligations, potentially through sharing non-core aspects of the project that don't infringe on any intellectual property rights.

Dataset

The zipped file data/data.rar contains split datasets of train.csv, valid.csv and test.csv. Each of them is formatted as id, function, location, description, and is constructed based on BigVul.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published