Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: indexer #5

Merged
merged 4 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 82 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,93 @@

Opinionated Python bindings for the [tree-sitter-stack-graphs](https://github.com/github/stack-graphs) rust library.

It exposes very few, easy to use functions to index files and query references.
It exposes a minimal, opinionated API to leverage the stack-graphs library for reference resolution in source code.

This is a proof of concept draft, to test scripting utilities using stack-graphs easily.
The rust bindings are built using [PyO3](https://pyo3.rs) and [maturin](https://maturin.rs).

It uses pyo3 and maturin to generate the bindings.
Note that this is a work in progress, and the API is subject to change. This project is not affiliated with GitHub.

## Installation & Usage

```bash
pip install stack-graphs-python-bindings # or poetry, ...
pip install stack-graphs-python-bindings
```

### Example

Given the following directory structure:

```bash
tests/js_sample
├── index.js
└── module.js
```

`index.js`:

```javascript
import { foo } from "./module"
const baz = foo
```

`module.js`:

```javascript
export const foo = "bar"
```

The following Python script:

```python
import os
from stack_graphs_python import index, Querier, Position, Language
from stack_graphs_python import Indexer, Querier, Position, Language

db_path = os.path.abspath("./db.sqlite")
dir = os.path.abspath("./tests/js_sample")

# Index the directory (creates stack-graphs database)
index([dir], db_path, language=Language.JavaScript)
indexer = Indexer(db_path, [Language.JavaScript])
indexer.index_all([dir])

# Instantiate a querier
querier = Querier(db_path)

# Query a reference at a given position (0-indexed line and column):
# Query a reference at a given position (0-indexed line and column):
# foo in: const baz = foo
source_reference = Position(path=dir + "/index.js", line=2, column=12)
results = querier.definitions(source_reference)

for r in results:
print(f"{r.path}, l:{r.line}, c: {r.column}")
print(r)
```

Will result in:
Will output:

```bash
[...]/stack-graphs-python-bindings/tests/js_sample/index.js, l:0, c: 9
[...]/stack-graphs-python-bindings/tests/js_sample/module.js, l:0, c: 13
Position(path="[...]/tests/js_sample/index.js", line=0, column=9)
Position(path="[...]/tests/js_sample/module.js", line=0, column=13)
```

That translates to:

```javascript
// index.js
import { foo } from "./module"
// ^ line 0, column 9

// module.js
export const foo = "bar"
// ^ line 0, column 13
```

> **Note**: All the paths are absolute, and line and column numbers are 0-indexed (first line is 0, first column is 0).

## Known stack-graphs / tree-sitter issues

- Python: module resolution / imports seems to be broken: <https://github.com/github/stack-graphs/issues/430>
- Typescript: module resolution doesn't work with file extensions (eg. `import { foo } from "./module"` is ok, but `import { foo } from "./module.ts"` is not). **An issue should be opened on the stack-graphs repo**. See: `tests/ts_ok_test.py`
- Typescript: tree-sitter-typescript fails when passing a generic type to a decorator: <https://github.com/tree-sitter/tree-sitter-typescript/issues/283>

## Development

### Ressources
Expand All @@ -67,7 +103,7 @@ https://pyo3.rs/v0.21.2/getting-started
### Setup

```bash
# Setup venv and install maturin through pip
# Setup venv and install dev dependencies
make setup
```

Expand All @@ -76,3 +112,37 @@ make setup
```bash
make test
```

### Manual testing

```bash
# build the package
make develop
# activate the venv
. venv/bin/activate
```

### Roadmap

Before releasing 0.1.0, which I expect to be a first stable API, the following needs to be done:

- [ ] Add more testing, especially:
- [ ] Test all supported languages (Java, ~~Python~~, ~~TypeScript~~, ~~JavaScript~~)
- [ ] Test failing cases, eg. files that cannot be indexed
- [ ] Add options to the classes:
- [ ] Verbosity
- [ ] Force for the Indexer
- [ ] Fail on error for the Indexer, or continue indexing
- [ ] Handle the storage (database) in a dedicated class, and pass it to the Indexer and Querier
- [ ] Add methods to query the indexing status (eg. which files have been indexed, which failed, etc.)
- [ ] Rely on the main branch of stack-graphs, and update the bindings accordingly
- [ ] Better error handling, return clear errors, test them and add them to the `.pyi` interface
- [ ] Lint and format the rust code
- [ ] CI/CD for the rust code
- [ ] Lint and format the python code
- [ ] Propper changelog, starting in 0.1.0

I'd also like to add the following features, after 0.1.0:

- [ ] Expose the exact, lower-level API of stack-graphs, for more flexibility, in a separate module (eg. `stack_graphs_python.core`)
- [ ] Benchmark performance
41 changes: 38 additions & 3 deletions src/classes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@ use std::fmt::Display;

use pyo3::prelude::*;

use stack_graphs::storage::SQLiteReader;
use stack_graphs::storage::{SQLiteReader, SQLiteWriter};
use tree_sitter_stack_graphs::cli::util::{SourcePosition, SourceSpan};
use tree_sitter_stack_graphs::loader::Loader;

use crate::stack_graphs_wrapper::query_definition;
use crate::stack_graphs_wrapper::{index_all, new_loader, query_definition};

#[pyclass]
#[derive(Clone)]
Expand Down Expand Up @@ -62,7 +63,41 @@ impl Querier {
}
}

// TODO(@nohehf): Indexer class
#[pyclass]
pub struct Indexer {
db_writer: SQLiteWriter,
db_path: String,
loader: Loader,
}

#[pymethods]
impl Indexer {
#[new]
pub fn new(db_path: String, languages: Vec<Language>) -> Self {
Indexer {
db_writer: SQLiteWriter::open(db_path.clone()).unwrap(),
db_path: db_path,
loader: new_loader(languages),
}
}

pub fn index_all(&mut self, paths: Vec<String>) -> PyResult<()> {
let paths: Vec<std::path::PathBuf> =
paths.iter().map(|p| std::path::PathBuf::from(p)).collect();

match index_all(paths, &mut self.loader, &mut self.db_writer) {
Ok(_) => Ok(()),
Err(e) => Err(e.into()),
}
}

// @TODO: Add a method to retrieve the status of the files (indexed, failed, etc.)
// This might be done on a separate class (Database / Storage), as it is tied to the storage, not a specific indexer

fn __repr__(&self) -> String {
format!("Indexer(db_path=\"{}\")", self.db_path)
}
}

#[pymethods]
impl Position {
Expand Down
7 changes: 4 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use pyo3::prelude::*;
mod classes;
mod stack_graphs_wrapper;

use classes::{Language, Position, Querier};
use classes::{Indexer, Language, Position, Querier};

/// Formats the sum of two numbers as string.
#[pyfunction]
Expand All @@ -20,10 +20,10 @@ fn index(paths: Vec<String>, db_path: String, language: Language) -> PyResult<()
let paths: Vec<std::path::PathBuf> =
paths.iter().map(|p| std::path::PathBuf::from(p)).collect();

Ok(stack_graphs_wrapper::index(
Ok(stack_graphs_wrapper::index_legacy(
paths,
&db_path,
language.into(),
&language.into(),
)?)
}

Expand All @@ -35,5 +35,6 @@ fn stack_graphs_python(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<Position>()?;
m.add_class::<Language>()?;
m.add_class::<Querier>()?;
m.add_class::<Indexer>()?;
Ok(())
}
40 changes: 37 additions & 3 deletions src/stack_graphs_wrapper/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ impl std::convert::From<StackGraphsError> for PyErr {
}
}

fn get_langauge_configuration(lang: Language) -> LanguageConfiguration {
pub fn get_langauge_configuration(lang: &Language) -> LanguageConfiguration {
match lang {
Language::Python => {
tree_sitter_stack_graphs_python::language_configuration(&NoCancellation)
Expand All @@ -36,10 +36,10 @@ fn get_langauge_configuration(lang: Language) -> LanguageConfiguration {
}
}

pub fn index(
pub fn index_legacy(
paths: Vec<PathBuf>,
db_path: &str,
language: Language,
language: &Language,
) -> Result<(), StackGraphsError> {
let configurations = vec![get_langauge_configuration(language)];

Expand Down Expand Up @@ -81,6 +81,40 @@ pub fn index(
}
}

pub fn new_loader(languages: Vec<Language>) -> Loader {
let configurations = languages
.iter()
.map(|l| get_langauge_configuration(l))
.collect();

Loader::from_language_configurations(configurations, None).unwrap()
}

pub fn index_all(
paths: Vec<PathBuf>,
loader: &mut Loader,
db_writer: &mut SQLiteWriter,
) -> Result<(), StackGraphsError> {
let reporter = ConsoleReporter::none();

let mut indexer = Indexer::new(db_writer, loader, &reporter);

// For now, force reindexing
indexer.force = true;

let paths = canonicalize_paths(paths);

// https://github.com/github/stack-graphs/blob/7db914c01b35ce024f6767e02dd1ad97022a6bc1/tree-sitter-stack-graphs/src/cli/index.rs#L107
let continue_from_none: Option<PathBuf> = None;

match indexer.index_all(paths, continue_from_none, &NoCancellation) {
Ok(_) => Ok(()),
Err(e) => Err(StackGraphsError {
message: format!("Failed to index: {}", e),
}),
}
}

pub fn query_definition(
reference: SourcePosition,
db_reader: &mut SQLiteReader,
Expand Down
41 changes: 39 additions & 2 deletions stack_graphs_python.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ class Language(Enum):
Java = 3

class Position:
"""
A position in a given file:
- path: the path to the file
- line: the line number (0-indexed)
- column: the column number (0-indexed)
"""

path: str
line: int
column: int
Expand All @@ -16,8 +23,38 @@ class Position:
def __repr__(self) -> str: ...

class Querier:
"""
A class to query the stack graphs database
- db_path: the path to the database

Usage: see Querier.definitions
"""
def __init__(self, db_path: str) -> None: ...
def definitions(self, reference: Position) -> list[Position]: ...
def definitions(self, reference: Position) -> list[Position]:
"""
Get the definitions of a given reference
- reference: the position of the reference
- returns: a list of positions of the definitions
"""
...
def __repr__(self) -> str: ...

class Indexer:
"""
A class to build the stack graphs of a given set of files
- db_path: the path to the database
- languages: the list of languages to index
"""
def __init__(self, db_path: str, languages: list[Language]) -> None: ...
def index_all(self, paths: list[str]) -> None:
"""
Index all the files in the given paths, recursively
"""
...
def __repr__(self) -> str: ...

def index(paths: list[str], db_path: str, language: Language) -> None: ...
def index(paths: list[str], db_path: str, language: Language) -> None:
"""
DeprecationWarning: The 'index' function is deprecated. Use 'Indexer' instead.
"""
...
6 changes: 3 additions & 3 deletions tests/helpers/virtual_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def _get_positions_in_file(file_path: str, contents: str) -> dict[str, Position]


@contextlib.contextmanager
def string_to_virtual_repo(
def string_to_virtual_files(
string: str,
) -> Iterator[tuple[str, dict[str, Position]]]:
"""
Expand All @@ -62,7 +62,7 @@ def string_to_virtual_repo(
^{pos2}
\"""

with string_to_virtual_repo(string) as (repo_path, positions):
with string_to_virtual_files(string) as (repo_path, positions):
...
```

Expand Down Expand Up @@ -104,7 +104,7 @@ def string_to_virtual_repo(

When parsed via:
```py
with string_to_virtual_repo(string) as (repo_path, positions):
with string_to_virtual_files(string) as (repo_path, positions):
...
```

Expand Down
Loading
Loading