-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Java #58
base: main
Are you sure you want to change the base?
Conversation
Fixes #50 Add support for Java code analysis using tree-sitter. * Add `api/analyzers/java/analyzer.py` to implement `JavaAnalyzer` class for parsing Java code and extracting method and class details. * Modify `api/analyzers/source_analyzer.py` to import `JavaAnalyzer` and add `.java` to the list of supported analyzers. * Add `tree-sitter-java` dependency to `pyproject.toml`. * Modify `api/__init__.py` to import `JavaAnalyzer`. * Modify `api/analyzers/__init__.py` to import `JavaAnalyzer`. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/FalkorDB/code-graph-backend/issues/50?shareId=XXXX-XXXX-XXXX-XXXX).
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
WalkthroughThe pull request introduces Java language support to the code analysis system. A new Changes
Sequence DiagramsequenceDiagram
participant SA as SourceAnalyzer
participant JA as JavaAnalyzer
participant G as Graph
SA->>JA: first_pass(path, file, graph)
JA->>JA: Parse Java source file
JA->>G: Add class entities
JA->>G: Add method entities
SA->>JA: second_pass(path, file, graph)
JA->>JA: Identify method calls
JA->>G: Connect method relationships
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
api/analyzers/java/analyzer.py (3)
1-3
: Replace star imports with explicit imports.
Usingfrom ..utils import *
andfrom ...entities import *
can unintentionally pollute the namespace, causing potential naming conflicts and making it harder to track dependencies.-from ..utils import * -from ...entities import * +from ..utils import find_child_of_type # or any specific methods +from ...entities import Function, Class, File # etc.🧰 Tools
🪛 Ruff (0.8.2)
3-3:
from ..utils import *
used; unable to detect undefined names(F403)
79-109
: Include docstrings or comments for class extraction logic.
Theprocess_class_declaration
method works as intended, but adding more descriptive docstrings or inline comments explaining how class modifiers (e.g.,public
,abstract
) might be handled in future versions can improve maintainability.🧰 Tools
🪛 Ruff (0.8.2)
79-79:
Class
may be undefined, or defined from star imports(F405)
92-92:
find_child_of_type
may be undefined, or defined from star imports(F405)
106-106:
Class
may be undefined, or defined from star imports(F405)
164-223
: Add defensive checks for node traversal in second_pass.
When traversing the AST (caller.parent.parent
), unexpected structures can causeAttributeError
orNoneType
. Consider validating intermediate nodes to avoid runtime errors.- method_calls = query_call_exp.captures(caller.parent.parent) + caller_parent = caller.parent + if caller_parent is None or caller_parent.parent is None: + continue + method_calls = query_call_exp.captures(caller_parent.parent)🧰 Tools
🪛 Ruff (0.8.2)
219-219:
Function
may be undefined, or defined from star imports(F405)
api/analyzers/__init__.py (1)
2-2
: Consider addingJavaAnalyzer
to__all__
or referencing it within the module.
If the analyzer is imported here primarily for others to use, listing it in__all__
clarifies that it’s part of the public API. If it’s not used in this module, the import could be flagged as unused.__all__ = [ "SourceAnalyzer", + "JavaAnalyzer", ]
🧰 Tools
🪛 Ruff (0.8.2)
2-2:
.java.analyzer.JavaAnalyzer
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/analyzers/__init__.py
(1 hunks)api/analyzers/java/analyzer.py
(1 hunks)api/analyzers/source_analyzer.py
(2 hunks)pyproject.toml
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/analyzers/__init__.py
2-2: .java.analyzer.JavaAnalyzer
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
api/analyzers/java/analyzer.py
3-3: from ..utils import *
used; unable to detect undefined names
(F403)
5-5: from ...entities import *
used; unable to detect undefined names
(F403)
22-22: Function
may be undefined, or defined from star imports
(F405)
35-35: find_child_of_type
may be undefined, or defined from star imports
(F405)
44-44: find_child_of_type
may be undefined, or defined from star imports
(F405)
52-52: find_child_of_type
may be undefined, or defined from star imports
(F405)
59-59: find_child_of_type
may be undefined, or defined from star imports
(F405)
60-60: find_child_of_type
may be undefined, or defined from star imports
(F405)
70-70: Function
may be undefined, or defined from star imports
(F405)
79-79: Class
may be undefined, or defined from star imports
(F405)
92-92: find_child_of_type
may be undefined, or defined from star imports
(F405)
106-106: Class
may be undefined, or defined from star imports
(F405)
130-130: File
may be undefined, or defined from star imports
(F405)
219-219: Function
may be undefined, or defined from star imports
(F405)
🔇 Additional comments (3)
api/analyzers/java/analyzer.py (1)
110-163
: Check query capture structure.
Your query usage checks'class' in captures
and'method' in captures
. Ensure thequery.captures()
method returns a dictionary with those keys; some Tree-sitter APIs return lists of(node, capture_name)
pairs instead. If so, consider a different approach to iterating over captures.-captures = query.captures(tree.root_node) -if 'class' in captures: - classes = captures['class'] +captures_list = query.captures(tree.root_node) +for (node, capture_name) in captures_list: + if capture_name == 'class': + # process node🧰 Tools
🪛 Ruff (0.8.2)
130-130:
File
may be undefined, or defined from star imports(F405)
api/analyzers/source_analyzer.py (1)
11-11
: Java analyzer addition looks good.
The.java
extension is now mapped toJavaAnalyzer
. This integration is consistent with existing analyzers, ensuring that Java files are processed in the same pipeline. Good job!Also applies to: 20-21
pyproject.toml (1)
16-16
: Dependency addition is consistent.
tree-sitter-java
is correctly added to support the Java analyzer. Ensure the version^0.23.2
remains compatible with your other tree-sitter dependencies.✅ Verification successful
Tree-sitter dependencies are consistently versioned
The verification confirms that all tree-sitter dependencies are using compatible versions:
tree-sitter
: ^0.23.2tree-sitter-c
: ^0.23.2tree-sitter-java
: ^0.23.2tree-sitter-python
: ^0.23.6 (minor version bump, still compatible)The newly added
tree-sitter-java
dependency aligns perfectly with the coretree-sitter
version and other language parsers, maintaining version consistency across the tree-sitter ecosystem.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check installed versions for consistent tree-sitter libraries grep "tree-sitter" pyproject.tomlLength of output: 143
def process_method_declaration(self, node: Node, path: Path, source_code: str) -> Optional[Function]: | ||
""" | ||
Processes a method declaration node to extract method details. | ||
|
||
Args: | ||
node (Node): The AST node representing a method declaration. | ||
path (Path): The file path where the method is defined. | ||
|
||
Returns: | ||
Optional[Function]: A Function object containing details about the method, or None if the method name cannot be determined. | ||
""" | ||
|
||
# Extract method name | ||
res = find_child_of_type(node, 'identifier') | ||
if res is None: | ||
return None | ||
|
||
identifier = res[0] | ||
method_name = identifier.text.decode('utf-8') | ||
logger.info(f"Method declaration: {method_name}") | ||
|
||
# Extract method return type | ||
res = find_child_of_type(node, 'type') | ||
ret_type = 'Unknown' | ||
if res is not None: | ||
ret_type = res[0] | ||
ret_type = ret_type.text.decode('utf-8') | ||
|
||
# Extract method parameters | ||
args = [] | ||
res = find_child_of_type(node, 'formal_parameters') | ||
if res is not None: | ||
parameters = res[0] | ||
|
||
# Extract arguments and their types | ||
for child in parameters.children: | ||
if child.type == 'formal_parameter': | ||
arg_type = find_child_of_type(child, 'type')[0].text.decode('utf-8') | ||
arg_name = find_child_of_type(child, 'identifier')[0].text.decode('utf-8') | ||
args.append((arg_name, arg_type)) | ||
|
||
# Extract method definition line numbers | ||
start_line = node.start_point[0] | ||
end_line = node.end_point[0] | ||
|
||
# Create Function object | ||
docs = '' | ||
src = source_code[node.start_byte:node.end_byte] | ||
f = Function(str(path), method_name, docs, ret_type, src, start_line, end_line) | ||
|
||
# Add arguments to Function object | ||
for arg in args: | ||
name, type_ = arg | ||
f.add_argument(name, type_) | ||
|
||
return f | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
The review comment is correct - direct indexing of find_child_of_type results is unsafe
The implementation of find_child_of_type
in utils.py
returns either None
or a tuple of (Node, int)
. The code in analyzer.py
correctly checks for None
in most places, but in the parameter processing section it directly indexes the result with [0]
without checking:
arg_type = find_child_of_type(child, 'type')[0].text.decode('utf-8')
arg_name = find_child_of_type(child, 'identifier')[0].text.decode('utf-8')
This could raise an IndexError if find_child_of_type
returns None
. The suggested fix to check the result before indexing and provide a fallback value is the correct approach.
🔗 Analysis chain
Validate the existence of method type and parameter identifiers.
Accessing [0]
from calls like find_child_of_type(child, 'type')[0]
may cause an index error if the list is empty. Consider checking for empty results before indexing.
-arg_type = find_child_of_type(child, 'type')[0].text.decode('utf-8')
+arg_type_nodes = find_child_of_type(child, 'type')
+if arg_type_nodes:
+ arg_type = arg_type_nodes[0].text.decode('utf-8')
+else:
+ arg_type = 'Unknown'
Let me verify the concern by checking the implementation of find_child_of_type
and its usage in the codebase.
Let me check the implementation of find_child_of_type
to understand its return type.
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Find the implementation of find_child_of_type function
ast-grep --pattern 'def find_child_of_type'
# Search for other usages of find_child_of_type to understand the pattern
rg "find_child_of_type" -A 2
Length of output: 2948
Script:
#!/bin/bash
# Get the content of utils.py to see the implementation
cat api/analyzers/utils.py
Length of output: 674
🧰 Tools
🪛 Ruff (0.8.2)
22-22: Function
may be undefined, or defined from star imports
(F405)
35-35: find_child_of_type
may be undefined, or defined from star imports
(F405)
44-44: find_child_of_type
may be undefined, or defined from star imports
(F405)
52-52: find_child_of_type
may be undefined, or defined from star imports
(F405)
59-59: find_child_of_type
may be undefined, or defined from star imports
(F405)
60-60: find_child_of_type
may be undefined, or defined from star imports
(F405)
70-70: Function
may be undefined, or defined from star imports
(F405)
Fixes #50
Add support for Java code analysis using tree-sitter.
api/analyzers/java/analyzer.py
to implementJavaAnalyzer
class for parsing Java code and extracting method and class details.api/analyzers/source_analyzer.py
to importJavaAnalyzer
and add.java
to the list of supported analyzers.tree-sitter-java
dependency topyproject.toml
.api/__init__.py
to importJavaAnalyzer
.api/analyzers/__init__.py
to importJavaAnalyzer
.For more details, open the Copilot Workspace session.
Summary by CodeRabbit
New Features
Dependencies
tree-sitter-java
library to project dependenciesImprovements