Skip to content

Commit

Permalink
feat: indexer (#5)
Browse files Browse the repository at this point in the history
* Replaces the `index` method by the `Indexer` class
* Adds a lot of tests
* Better readme
  • Loading branch information
nohehf authored May 21, 2024
1 parent b0fddeb commit afd0d4f
Show file tree
Hide file tree
Showing 12 changed files with 417 additions and 38 deletions.
94 changes: 82 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,93 @@

Opinionated Python bindings for the [tree-sitter-stack-graphs](https://github.com/github/stack-graphs) rust library.

It exposes very few, easy to use functions to index files and query references.
It exposes a minimal, opinionated API to leverage the stack-graphs library for reference resolution in source code.

This is a proof of concept draft, to test scripting utilities using stack-graphs easily.
The rust bindings are built using [PyO3](https://pyo3.rs) and [maturin](https://maturin.rs).

It uses pyo3 and maturin to generate the bindings.
Note that this is a work in progress, and the API is subject to change. This project is not affiliated with GitHub.

## Installation & Usage

```bash
pip install stack-graphs-python-bindings # or poetry, ...
pip install stack-graphs-python-bindings
```

### Example

Given the following directory structure:

```bash
tests/js_sample
├── index.js
└── module.js
```

`index.js`:

```javascript
import { foo } from "./module"
const baz = foo
```

`module.js`:

```javascript
export const foo = "bar"
```

The following Python script:

```python
import os
from stack_graphs_python import index, Querier, Position, Language
from stack_graphs_python import Indexer, Querier, Position, Language

db_path = os.path.abspath("./db.sqlite")
dir = os.path.abspath("./tests/js_sample")

# Index the directory (creates stack-graphs database)
index([dir], db_path, language=Language.JavaScript)
indexer = Indexer(db_path, [Language.JavaScript])
indexer.index_all([dir])

# Instantiate a querier
querier = Querier(db_path)

# Query a reference at a given position (0-indexed line and column):
# Query a reference at a given position (0-indexed line and column):
# foo in: const baz = foo
source_reference = Position(path=dir + "/index.js", line=2, column=12)
results = querier.definitions(source_reference)

for r in results:
print(f"{r.path}, l:{r.line}, c: {r.column}")
print(r)
```

Will result in:
Will output:

```bash
[...]/stack-graphs-python-bindings/tests/js_sample/index.js, l:0, c: 9
[...]/stack-graphs-python-bindings/tests/js_sample/module.js, l:0, c: 13
Position(path="[...]/tests/js_sample/index.js", line=0, column=9)
Position(path="[...]/tests/js_sample/module.js", line=0, column=13)
```

That translates to:

```javascript
// index.js
import { foo } from "./module"
// ^ line 0, column 9

// module.js
export const foo = "bar"
// ^ line 0, column 13
```

> **Note**: All the paths are absolute, and line and column numbers are 0-indexed (first line is 0, first column is 0).
## Known stack-graphs / tree-sitter issues

- Python: module resolution / imports seems to be broken: <https://github.com/github/stack-graphs/issues/430>
- Typescript: module resolution doesn't work with file extensions (eg. `import { foo } from "./module"` is ok, but `import { foo } from "./module.ts"` is not). **An issue should be opened on the stack-graphs repo**. See: `tests/ts_ok_test.py`
- Typescript: tree-sitter-typescript fails when passing a generic type to a decorator: <https://github.com/tree-sitter/tree-sitter-typescript/issues/283>

## Development

### Ressources
Expand All @@ -67,7 +103,7 @@ https://pyo3.rs/v0.21.2/getting-started
### Setup

```bash
# Setup venv and install maturin through pip
# Setup venv and install dev dependencies
make setup
```

Expand All @@ -76,3 +112,37 @@ make setup
```bash
make test
```

### Manual testing

```bash
# build the package
make develop
# activate the venv
. venv/bin/activate
```

### Roadmap

Before releasing 0.1.0, which I expect to be a first stable API, the following needs to be done:

- [ ] Add more testing, especially:
- [ ] Test all supported languages (Java, ~~Python~~, ~~TypeScript~~, ~~JavaScript~~)
- [ ] Test failing cases, eg. files that cannot be indexed
- [ ] Add options to the classes:
- [ ] Verbosity
- [ ] Force for the Indexer
- [ ] Fail on error for the Indexer, or continue indexing
- [ ] Handle the storage (database) in a dedicated class, and pass it to the Indexer and Querier
- [ ] Add methods to query the indexing status (eg. which files have been indexed, which failed, etc.)
- [ ] Rely on the main branch of stack-graphs, and update the bindings accordingly
- [ ] Better error handling, return clear errors, test them and add them to the `.pyi` interface
- [ ] Lint and format the rust code
- [ ] CI/CD for the rust code
- [ ] Lint and format the python code
- [ ] Propper changelog, starting in 0.1.0

I'd also like to add the following features, after 0.1.0:

- [ ] Expose the exact, lower-level API of stack-graphs, for more flexibility, in a separate module (eg. `stack_graphs_python.core`)
- [ ] Benchmark performance
41 changes: 38 additions & 3 deletions src/classes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@ use std::fmt::Display;

use pyo3::prelude::*;

use stack_graphs::storage::SQLiteReader;
use stack_graphs::storage::{SQLiteReader, SQLiteWriter};
use tree_sitter_stack_graphs::cli::util::{SourcePosition, SourceSpan};
use tree_sitter_stack_graphs::loader::Loader;

use crate::stack_graphs_wrapper::query_definition;
use crate::stack_graphs_wrapper::{index_all, new_loader, query_definition};

#[pyclass]
#[derive(Clone)]
Expand Down Expand Up @@ -62,7 +63,41 @@ impl Querier {
}
}

// TODO(@nohehf): Indexer class
#[pyclass]
pub struct Indexer {
db_writer: SQLiteWriter,
db_path: String,
loader: Loader,
}

#[pymethods]
impl Indexer {
#[new]
pub fn new(db_path: String, languages: Vec<Language>) -> Self {
Indexer {
db_writer: SQLiteWriter::open(db_path.clone()).unwrap(),
db_path: db_path,
loader: new_loader(languages),
}
}

pub fn index_all(&mut self, paths: Vec<String>) -> PyResult<()> {
let paths: Vec<std::path::PathBuf> =
paths.iter().map(|p| std::path::PathBuf::from(p)).collect();

match index_all(paths, &mut self.loader, &mut self.db_writer) {
Ok(_) => Ok(()),
Err(e) => Err(e.into()),
}
}

// @TODO: Add a method to retrieve the status of the files (indexed, failed, etc.)
// This might be done on a separate class (Database / Storage), as it is tied to the storage, not a specific indexer

fn __repr__(&self) -> String {
format!("Indexer(db_path=\"{}\")", self.db_path)
}
}

#[pymethods]
impl Position {
Expand Down
7 changes: 4 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use pyo3::prelude::*;
mod classes;
mod stack_graphs_wrapper;

use classes::{Language, Position, Querier};
use classes::{Indexer, Language, Position, Querier};

/// Formats the sum of two numbers as string.
#[pyfunction]
Expand All @@ -20,10 +20,10 @@ fn index(paths: Vec<String>, db_path: String, language: Language) -> PyResult<()
let paths: Vec<std::path::PathBuf> =
paths.iter().map(|p| std::path::PathBuf::from(p)).collect();

Ok(stack_graphs_wrapper::index(
Ok(stack_graphs_wrapper::index_legacy(
paths,
&db_path,
language.into(),
&language.into(),
)?)
}

Expand All @@ -35,5 +35,6 @@ fn stack_graphs_python(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<Position>()?;
m.add_class::<Language>()?;
m.add_class::<Querier>()?;
m.add_class::<Indexer>()?;
Ok(())
}
40 changes: 37 additions & 3 deletions src/stack_graphs_wrapper/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ impl std::convert::From<StackGraphsError> for PyErr {
}
}

fn get_langauge_configuration(lang: Language) -> LanguageConfiguration {
pub fn get_langauge_configuration(lang: &Language) -> LanguageConfiguration {
match lang {
Language::Python => {
tree_sitter_stack_graphs_python::language_configuration(&NoCancellation)
Expand All @@ -36,10 +36,10 @@ fn get_langauge_configuration(lang: Language) -> LanguageConfiguration {
}
}

pub fn index(
pub fn index_legacy(
paths: Vec<PathBuf>,
db_path: &str,
language: Language,
language: &Language,
) -> Result<(), StackGraphsError> {
let configurations = vec![get_langauge_configuration(language)];

Expand Down Expand Up @@ -81,6 +81,40 @@ pub fn index(
}
}

pub fn new_loader(languages: Vec<Language>) -> Loader {
let configurations = languages
.iter()
.map(|l| get_langauge_configuration(l))
.collect();

Loader::from_language_configurations(configurations, None).unwrap()
}

pub fn index_all(
paths: Vec<PathBuf>,
loader: &mut Loader,
db_writer: &mut SQLiteWriter,
) -> Result<(), StackGraphsError> {
let reporter = ConsoleReporter::none();

let mut indexer = Indexer::new(db_writer, loader, &reporter);

// For now, force reindexing
indexer.force = true;

let paths = canonicalize_paths(paths);

// https://github.com/github/stack-graphs/blob/7db914c01b35ce024f6767e02dd1ad97022a6bc1/tree-sitter-stack-graphs/src/cli/index.rs#L107
let continue_from_none: Option<PathBuf> = None;

match indexer.index_all(paths, continue_from_none, &NoCancellation) {
Ok(_) => Ok(()),
Err(e) => Err(StackGraphsError {
message: format!("Failed to index: {}", e),
}),
}
}

pub fn query_definition(
reference: SourcePosition,
db_reader: &mut SQLiteReader,
Expand Down
41 changes: 39 additions & 2 deletions stack_graphs_python.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ class Language(Enum):
Java = 3

class Position:
"""
A position in a given file:
- path: the path to the file
- line: the line number (0-indexed)
- column: the column number (0-indexed)
"""

path: str
line: int
column: int
Expand All @@ -16,8 +23,38 @@ class Position:
def __repr__(self) -> str: ...

class Querier:
"""
A class to query the stack graphs database
- db_path: the path to the database
Usage: see Querier.definitions
"""
def __init__(self, db_path: str) -> None: ...
def definitions(self, reference: Position) -> list[Position]: ...
def definitions(self, reference: Position) -> list[Position]:
"""
Get the definitions of a given reference
- reference: the position of the reference
- returns: a list of positions of the definitions
"""
...
def __repr__(self) -> str: ...

class Indexer:
"""
A class to build the stack graphs of a given set of files
- db_path: the path to the database
- languages: the list of languages to index
"""
def __init__(self, db_path: str, languages: list[Language]) -> None: ...
def index_all(self, paths: list[str]) -> None:
"""
Index all the files in the given paths, recursively
"""
...
def __repr__(self) -> str: ...

def index(paths: list[str], db_path: str, language: Language) -> None: ...
def index(paths: list[str], db_path: str, language: Language) -> None:
"""
DeprecationWarning: The 'index' function is deprecated. Use 'Indexer' instead.
"""
...
6 changes: 3 additions & 3 deletions tests/helpers/virtual_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def _get_positions_in_file(file_path: str, contents: str) -> dict[str, Position]


@contextlib.contextmanager
def string_to_virtual_repo(
def string_to_virtual_files(
string: str,
) -> Iterator[tuple[str, dict[str, Position]]]:
"""
Expand All @@ -62,7 +62,7 @@ def string_to_virtual_repo(
^{pos2}
\"""
with string_to_virtual_repo(string) as (repo_path, positions):
with string_to_virtual_files(string) as (repo_path, positions):
...
```
Expand Down Expand Up @@ -104,7 +104,7 @@ def string_to_virtual_repo(
When parsed via:
```py
with string_to_virtual_repo(string) as (repo_path, positions):
with string_to_virtual_files(string) as (repo_path, positions):
...
```
Expand Down
Loading

0 comments on commit afd0d4f

Please sign in to comment.