diff --git a/README.md b/README.md index 318cec8..94f54f3 100644 --- a/README.md +++ b/README.md @@ -1,69 +1,15 @@ -# Graph.js: A Static Vulnerability Scanner for _npm_ packages +# Efficient Static Vulnerability Analysis for JavaScript with Multiversion Dependency Graphs -Graph.js is a static vulnerability scanner specialized in analyzing _npm_ -packages and detecting taint-style and prototype pollution vulnerabilities. - -- Currently, detects 4 types of vulnerabilities: - - _Path Traversal_ (CWE-22); - - _Command Injection_ (CWE-94); - - _Code Execution_ (CWE-78); - - _Prototype Pollution_ (CWE-1321). -- Our evaluation on two curated datasets (VulcaN [1]; SecBench) shows that it significantly - outperforms ODGen, the state-of-the-art tool, with lower false negatives and shorter analysis time. - ---- - -### Publications and Open-Source Repositories - -The development of Graph.js relates to additional research performed by this group. - -#### 1. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages -This work comprises an empirical study of static code analysis tools for detecting vulnerabilities in Node.js code. -We created a curated dataset of 957 Node.js code vulnerabilities, characterized and annotated by analyzing the information contained in _npm_ advisory reports. - -The dataset is available [here](https://github.com/VulcaN-Study/Supplementary-Material). - -The publication associated with this work is: -- **VulcaN Dataset [1]**: Tiago Brito, Mafalda Ferreira, Miguel Monteiro, Pedro Lopes, Miguel Barros, José Fragoso Santos, Nuno Santos: -*"Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages"*, -in *IEEE Transactions on Reliability 2023 (ToR 2023)*. -``` -@inproceedings{vulcan_tor, - author = {Brito, Tiago and Ferreira, Mafalda and Monteiro, Miguel and Lopes, Pedro and Barros, Miguel and Santos, José Fragoso and Santos, Nuno}, - booktitle = {IEEE Transactions on Reliability}, - title = {Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages}, - year = {2023}, - pages = {1-16}, - doi = {10.1109/TR.2023.3286301}, -} -``` +Mafalda Ferreira, Miguel Monteiro, Tiago Brito, Miguel E. Coimbra, Nuno Santos, Limin Jia, and José Fragoso +Santos. 2024. Efficient Static Vulnerability Analysis for JavaScript with Multiversion Dependency Graphs. +https://doi.org/XXXX -#### 2. RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks -In this work we developed a prototype of RuleKeeper, a GDPR-aware policy compliance system for web frameworks. -RuleKeeper uses Graph.js to automatically check for the presence of GDPR compliance bugs in Node.js servers. +## Artifact evaluation +The [Artifact Evaluation](./artifact-evaluation) folder contains all the necessary instructions and scripts used to reproduce the results and the figures from the original paper. -The prototype is available [here](https://github.com/rulekeeper/rulekeeper). +[//]: [![DOI](https://zenodo.org/badge/724237294.svg)](https://zenodo.org/badge/latestdoi/724237294) -The publication associated with this work is: -- **RuleKeeper**: -Mafalda Ferreira, Tiago Brito, José Fragoso Santos, Nuno Santos: -*"RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks"*, -in *Proceedings of 44th IEEE Symposium on Security and Privacy (S&P’23)*, 2023. -``` -@inproceedings{ferreira_sp23, - author = {Ferreira, Mafalda and Brito, Tiago and Santos, José Fragoso and Santos, Nuno}, - title = {RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks}, - booktitle = {Proceedings of 44th IEEE Symposium on Security and Privacy (S&P'23)}, - year = {2023}, - doi = {10.1109/SP46215.2023.00058}, - pages = {1014-1031}, - publisher = {IEEE Computer Society}, - address = {Los Alamitos, CA, USA}, -} -``` - ---- ## Team ### Main Contributors @@ -83,27 +29,93 @@ in *Proceedings of 44th IEEE Symposium on Security and Privacy (S&P’23)*, 2023 #### Collaborators - - [Tiago Brito](https://www.dpss.inesc-id.pt/blog/tiago-brito/) - - [Miguel Coimbra](https://www.dpss.inesc-id.pt/~mcoimbra/) - - [Limin Jia](https://www.andrew.cmu.edu/user/liminjia/) - - [Miguel Monteiro](https://www.linkedin.com/in/miguel-monteiro-229b86195/) +- [Tiago Brito](https://www.dpss.inesc-id.pt/blog/tiago-brito/) +- [Miguel Coimbra](https://www.dpss.inesc-id.pt/~mcoimbra/) +- [Limin Jia](https://www.andrew.cmu.edu/user/liminjia/) +- [Miguel Monteiro](https://www.linkedin.com/in/miguel-monteiro-229b86195/) --- -## Tool Installation +## Graph.js: A Static Vulnerability Scanner for _npm_ packages +Graph.js is a static vulnerability scanner specialized in analyzing _npm_ +packages and detecting taint-style and prototype pollution vulnerabilities. -Graph.js generates a graph using [npm](https://www.npmjs.com/)/[node](https://nodejs.org/en) and uses [Neo4j](https://neo4j.com/) to query the graph.
-This last component can be executed in a docker container (easier setup) or locally. +Its execution flow is composed of two phases: **graph construction** +and **graph queries**. In the first phase, Graph.js builds a +Multiversion Dependency Graph (MDG) of the program to be analyzed. +This graph-based data structure coalesces into the same +representation the abstract syntax tree, control flow graph, and +data dependency graph. This phase has two outputs: +1. Graph output: nodes and edges in .csv format. +2. Graph metrics: graph_stats.json + +In the second phase, Graph.js imports the graph to a Neo4j graph +database, and executes graph queries, written in Cypher, to capture +vulnerable code patterns, e.g. data dependency paths connecting +unreliable sources to dangerous sinks. + +- Currently, Graph.js detects four types of vulnerabilities: prototype +pollution (CWE-1321), OS command injection (CWE-78), +arbitrary code execution (CWE-94), and path traversal (CWE-22). + +--- -#### Requirements + +## Installation + +Graph.js generates a graph using [Node](https://nodejs.org/en) and uses [Neo4j](https://neo4j.com/) to query the graph.
+It can be executed locally, or in a Docker container (easier and more robust setup). + +### Using Docker +#### Requirements: +- [Python3](https://www.python.org/downloads/) +- [Docker](https://www.docker.com/) + +Build the Docker container by running the command: +``` +docker build -t graphjs . +``` + +### Run locally +#### Requirements: - [Node](https://nodejs.org/en) (I've tested v18+). -- [Python3](https://www.python.org/downloads/). -- **Option 1 (Local queries)**: [Neo4j v5](https://neo4j.com/). Instructions: https://neo4j.com/docs/operations-manual/current/installation/linux/ -- **Option 2 (Docker)**: [Docker](https://www.docker.com/). +- [Neo4j v5](https://neo4j.com/). Instructions: https://neo4j.com/docs/operations-manual/current/installation/linux/ + +Set up the local environment by running the command: +``` +./setup.sh +``` + --- ## Usage +### Using Docker + +Graph.js provides a command-line interface. Run it with **-h** for a short description. + +```console +Usage: ./graphjs_docker.sh -f [options] +Description: Run Graph.js for a given file in a Docker container. + +Required: +-f Filename (.js). + +Options: +-o Path to store analysis results. +-l Store docker logs. +-e Create exploit template. +-s Silent mode: Does not save graph .svg. +-h Print this help. +``` + +To run Graph.js, run the command: +```bash +./graphjs_docker.sh -f [options] +``` + +### Run locally + Graph.js provides a command-line interface. Run it with **-h** for a short description. ```console @@ -119,6 +131,12 @@ Options: -e, --exploit Generates symbolic tests. ``` +To run Graph.js, run the command: +```bash +python3 graphjs.py -f [options] +``` + +--- By default, all the results are stored in a *graphjs-results* folder, in the root of the project, with the following structure: ``` @@ -131,43 +149,38 @@ graphjs-results └── taint_summary_detection.json (detection results) ``` - -#### Run -- Execute inside the root folder -- If first time, execute the setup (`./setup.sh`) -- To run with docker: - - Have docker service running - - Use flag **-d** - -```bash -python3 graphjs.py -f -s [-d] -``` - --- -### Graph.js phases - -#### 1. Build the code property graph (representation of source code) - -This stage builds the code property graph of the program to be analyzed, a graph-based data structure that coalesces into the same representation the abstract syntax tree, control flow graph, and data dependency graph of the given program. +## Reusability -The code for the code property graph is in the [parser](./parser) folder. - -This step outputs: -- Normalized javascript file of the program -- Graph outputs (svg and/or csv) -- Graph metrics (graph_stats.json) - -#### 2. Query the graph - -This stage queries the graphs to capture vulnerable code patterns, e.g. a data dependency paths connecting unreliable sources to dangerous sinks. - -The code for the queries is in the [detection](./detection) folder. +Graph.js code is designed to enable straightforward usage by others, and can be easily adapted to accommodate +new scenarios. As described before, Graph.js is composed of two phases: graph construction and graph queries. +The graph construction code is located in the `graphjs/parser/src` folder, and the most relevant files are organized as follows: +``` +src +├── parser.ts +├── output # Code to generate outputs (.csv and .svg) +├── traverse # Parsing algorithms +├── dependency +│ ├── structures/dependency_trackers.ts +│ └── dep_builder.ts +├── ast-builder.ts +├── cfg-builder.ts +└── cg-builder.ts +``` +The code referring to the MDG construction algorithm is located +in `src/traverse/dependency, where the file `structures/dependency_trackers.ts` +contains the rules and structures referred in the paper. +The MDG is intended to be generic, so all the building steps can be +adapted to new scenarios by creating new types of nodes and edges. -This step uses the graph csv output and produces a summary file (*taint_summary.json*) with the detection results. +The code for the queries is in located in the `graphjs/detection` +folder. The queries are entirely customizable, so, it is possible not +only modify the existing queries but also to create new queries that +search for new and different patterns in the graph. -### Generate only the graph +## Generate only the graph - Execute inside the *parser* folder @@ -189,3 +202,56 @@ npm start -- -f [options] | Set array of functions to ignore in graph figure | --if=[...] | _[]_ | No | _graph_ | | Show the code in each statement in graph figure | --sc | _false_ | No | _graph_ | | Silent mode (not verbose) | --silent | _false_ | No | - | + +--- + + +### Publications and Open-Source Repositories + +The development of Graph.js relates to additional research performed by this group. + +#### 1. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages +This work comprises an empirical study of static code analysis tools for detecting vulnerabilities in Node.js code. +We created a curated dataset of 957 Node.js code vulnerabilities, characterized and annotated by analyzing the information contained in _npm_ advisory reports. + +The dataset is available [here](https://github.com/VulcaN-Study/Supplementary-Material). + +The publication associated with this work is: +- **VulcaN Dataset [1]**: Tiago Brito, Mafalda Ferreira, Miguel Monteiro, Pedro Lopes, Miguel Barros, José Fragoso Santos, Nuno Santos: + *"Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages"*, + in *IEEE Transactions on Reliability 2023 (ToR 2023)*. +``` +@inproceedings{vulcan_tor, + author = {Brito, Tiago and Ferreira, Mafalda and Monteiro, Miguel and Lopes, Pedro and Barros, Miguel and Santos, José Fragoso and Santos, Nuno}, + booktitle = {IEEE Transactions on Reliability}, + title = {Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.js Packages}, + year = {2023}, + pages = {1-16}, + doi = {10.1109/TR.2023.3286301}, +} +``` + + +#### 2. RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks +In this work we developed a prototype of RuleKeeper, a GDPR-aware policy compliance system for web frameworks. +RuleKeeper uses Graph.js to automatically check for the presence of GDPR compliance bugs in Node.js servers. + +The prototype is available [here](https://github.com/rulekeeper/rulekeeper). + +The publication associated with this work is: +- **RuleKeeper**: + Mafalda Ferreira, Tiago Brito, José Fragoso Santos, Nuno Santos: + *"RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks"*, + in *Proceedings of 44th IEEE Symposium on Security and Privacy (S&P’23)*, 2023. +``` +@inproceedings{ferreira_sp23, + author = {Ferreira, Mafalda and Brito, Tiago and Santos, José Fragoso and Santos, Nuno}, + title = {RuleKeeper: GDPR-Aware Personal Data Compliance for Web Frameworks}, + booktitle = {Proceedings of 44th IEEE Symposium on Security and Privacy (S&P'23)}, + year = {2023}, + doi = {10.1109/SP46215.2023.00058}, + pages = {1014-1031}, + publisher = {IEEE Computer Society}, + address = {Los Alamitos, CA, USA}, +} +``` \ No newline at end of file diff --git a/artifact-evaluation/README.pdf b/artifact-evaluation/README.pdf new file mode 100644 index 0000000..e69de29 diff --git a/artifact-evaluation/graphjs_ae.zip b/artifact-evaluation/graphjs_ae.zip new file mode 100644 index 0000000..e69de29 diff --git a/graphjs_docker.sh b/graphjs_docker.sh index eca43b5..56c02f6 100755 --- a/graphjs_docker.sh +++ b/graphjs_docker.sh @@ -80,7 +80,6 @@ if [ "$DOCKER_LOGS" = true ]; then /bin/bash -c "python3 /graphjs/graphjs.py -f /input-file.js -o /output_path -s &> /docker_logs/graphjs-debug.log; cp /var/log/neo4j/debug.log /docker_logs/neo4j-debug.log" mv docker_logs ${output_path}/ - docker system prune -f else docker run -it \ -v "${filename}":/input-file.js \