Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for JavaScript #59

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gkorland
Copy link
Contributor

@gkorland gkorland commented Jan 3, 2025

Fixes #51

Add support for JavaScript code analysis using tree-sitter.

  • Add api/analyzers/javascript/analyzer.py implementing JavaScriptAnalyzer class using tree-sitter for JavaScript.
    • Implement methods for first and second pass analysis.
    • Use tree-sitter to parse JavaScript code.
    • Extract functions and classes from JavaScript code.
    • Connect entities in the graph.
  • Update api/analyzers/source_analyzer.py to include JavaScriptAnalyzer in the analyzers list.
  • Add tree-sitter-javascript dependency to pyproject.toml.
  • Add utility functions for JavaScript analysis in api/analyzers/utils.py.

For more details, open the Copilot Workspace session.

Summary by CodeRabbit

  • New Features

    • Added support for analyzing JavaScript source files
    • Introduced JavaScript-specific code parsing and analysis capabilities
  • Dependencies

    • Added Tree-sitter JavaScript library for parsing JavaScript code
  • Improvements

    • Enhanced source code analysis to include JavaScript file processing
    • Implemented functions to extract function and class names from JavaScript AST nodes

Fixes #51

Add support for JavaScript code analysis using tree-sitter.

* Add `api/analyzers/javascript/analyzer.py` implementing `JavaScriptAnalyzer` class using tree-sitter for JavaScript.
  - Implement methods for first and second pass analysis.
  - Use tree-sitter to parse JavaScript code.
  - Extract functions and classes from JavaScript code.
  - Connect entities in the graph.
* Update `api/analyzers/source_analyzer.py` to include `JavaScriptAnalyzer` in the analyzers list.
* Add `tree-sitter-javascript` dependency to `pyproject.toml`.
* Add utility functions for JavaScript analysis in `api/analyzers/utils.py`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/FalkorDB/code-graph-backend/issues/51?shareId=XXXX-XXXX-XXXX-XXXX).
Copy link

vercel bot commented Jan 3, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
code-graph-backend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 3, 2025 0:57am

Copy link

coderabbitai bot commented Jan 3, 2025

Walkthrough

This pull request introduces comprehensive support for JavaScript source code analysis in the Code-Graph backend. A new JavaScriptAnalyzer class is implemented to process JavaScript files, utilizing Tree-sitter for parsing. The implementation includes methods for extracting function and class declarations, performing two-pass analysis to build a graph representation of the code structure. The changes extend the existing source analyzer infrastructure to handle JavaScript files with .js extension, adding utility functions for name extraction and including the necessary JavaScript Tree-sitter grammar dependency.

Changes

File Changes
api/analyzers/javascript/analyzer.py Added JavaScriptAnalyzer class with methods for processing JavaScript source files, including function and class declaration handling, first and second pass analysis
api/analyzers/source_analyzer.py Updated analyzers dictionary to include JavaScript file support with .js extension
api/analyzers/utils.py Added extract_js_function_name and extract_js_class_name utility functions for JavaScript AST node name extraction
pyproject.toml Added tree-sitter-javascript dependency for JavaScript parsing support

Sequence Diagram

sequenceDiagram
    participant SA as SourceAnalyzer
    participant JSA as JavaScriptAnalyzer
    participant TS as Tree-sitter Parser
    participant G as Code Graph

    SA->>JSA: Analyze JavaScript file
    JSA->>TS: Parse source code
    TS-->>JSA: Return AST
    JSA->>JSA: First pass: Extract functions/classes
    JSA->>G: Add function/class entities
    JSA->>TS: Parse source code again
    TS-->>JSA: Return AST
    JSA->>JSA: Second pass: Link function calls
    JSA->>G: Establish function relationships
Loading

Assessment against linked issues

Objective Addressed Explanation
Add support for JavaScript Code-Graph generation
Use tree-sitter for JavaScript parsing

Poem

🐰 A JavaScript adventure begins today,
Code graphs dancing in a playful array,
Tree-sitter parsing with rabbit-like glee,
Functions and classes now set free!
Hop, hop, hooray for code insight! 🚀


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
api/analyzers/javascript/analyzer.py (3)

22-71: Enhance function declaration processing and handle arrow functions.

Currently, process_function_declaration only captures standard function declarations. If you'd like to capture arrow functions or function expressions, consider expanding the query or logic. Also, note that using find_child_of_type(node, 'identifier') might skip function declarations without a named identifier (like anonymous functions).

🧰 Tools
🪛 Ruff (0.8.2)

22-22: Function may be undefined, or defined from star imports

(F405)


35-35: find_child_of_type may be undefined, or defined from star imports

(F405)


45-45: find_child_of_type may be undefined, or defined from star imports

(F405)


62-62: Function may be undefined, or defined from star imports

(F405)


72-102: Extend class declaration handling for inheritance.

This method correctly extracts the class name from the identifier child. You may want to handle extends clauses (e.g., class Foo extends Bar) or keep track of implemented interfaces in the future.

🧰 Tools
🪛 Ruff (0.8.2)

72-72: Class may be undefined, or defined from star imports

(F405)


85-85: find_child_of_type may be undefined, or defined from star imports

(F405)


99-99: Class may be undefined, or defined from star imports

(F405)


157-220: Protect against missing function entities and arrow function calls.

  1. The second pass currently assumes function declarations are always standard. Arrow functions won't be captured, so the calls might remain unresolved.
  2. assert(caller_f is not None) may crash if the function is somehow not recognized. Consider a safer check, logging a warning, or creating a placeholder entity to avoid halting the entire analysis.
-assert(caller_f is not None)
+if caller_f is None:
+    logger.warning(f"Caller function '{caller_name}' not found. Skipping relationship.")
+    continue
🧰 Tools
🪛 Ruff (0.8.2)

216-216: Function may be undefined, or defined from star imports

(F405)

api/analyzers/utils.py (2)

25-38: Augment arrow function or unnamed function handling.

extract_js_function_name assumes there's an identifier child. Consider fallback logic for arrow or anonymous functions if needed (e.g., generating a placeholder name).


40-53: Handle anonymous or default-export classes.

Similar to functions, classes can sometimes be declared without a direct identifier (export default class, etc.). Consider a fallback name or a distinct approach for these cases.

api/analyzers/source_analyzer.py (1)

20-21: Consider broader JavaScript-related extensions.

Mapping .js to JavaScriptAnalyzer is a good start. If you plan to handle .mjs, .cjs, or .jsx, consider extending the analyzers dictionary or making it more flexible.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8ec9a4 and 754f412.

📒 Files selected for processing (4)
  • api/analyzers/javascript/analyzer.py (1 hunks)
  • api/analyzers/source_analyzer.py (2 hunks)
  • api/analyzers/utils.py (1 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/analyzers/javascript/analyzer.py

3-3: from ..utils import * used; unable to detect undefined names

(F403)


5-5: from ...entities import * used; unable to detect undefined names

(F403)


22-22: Function may be undefined, or defined from star imports

(F405)


35-35: find_child_of_type may be undefined, or defined from star imports

(F405)


45-45: find_child_of_type may be undefined, or defined from star imports

(F405)


62-62: Function may be undefined, or defined from star imports

(F405)


72-72: Class may be undefined, or defined from star imports

(F405)


85-85: find_child_of_type may be undefined, or defined from star imports

(F405)


99-99: Class may be undefined, or defined from star imports

(F405)


123-123: File may be undefined, or defined from star imports

(F405)


216-216: Function may be undefined, or defined from star imports

(F405)

🔇 Additional comments (5)
api/analyzers/javascript/analyzer.py (3)

18-21: Analyzer constructor looks good.

The constructor properly initializes the Parser with JS_LANGUAGE. This is a succinct and clear approach.


103-156: Consider robust UTF-8 handling.

The code tries to decode the file content using UTF-8, which might fail if the file is in another encoding. This fallback is fine in many cases, but you might want to log or handle suspected encoding issues more explicitly if supporting diverse codebases.

🧰 Tools
🪛 Ruff (0.8.2)

123-123: File may be undefined, or defined from star imports

(F405)


10-13: Confirm Tree-sitter JavaScript library availability.

The import statements import tree_sitter_javascript as tsjs and Language, Parser from tree_sitter look correct for parsing JavaScript. Just verify that the pyproject.toml changes (adding tree-sitter-javascript) are indeed installed in your environment.

✅ Verification successful

tree-sitter-javascript dependency is properly configured

The pyproject.toml file shows that tree-sitter-javascript version ^0.23.2 is correctly listed as a dependency in the project's Poetry configuration. This confirms that the import statement import tree_sitter_javascript as tsjs will work as expected when the dependencies are installed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Ensure tree-sitter-javascript is installed in the current environment
pip freeze | grep "tree-sitter-javascript"

Length of output: 136


Script:

#!/bin/bash
# Check if pyproject.toml exists and contains tree-sitter-javascript dependency
fd pyproject.toml --exec cat {}

Length of output: 877


Script:

#!/bin/bash
# Check if requirements.txt exists and contains tree-sitter-javascript dependency
fd requirements.txt --exec cat {}

Length of output: 65574


Script:

#!/bin/bash
# Check if setup.py exists and contains tree-sitter-javascript dependency
fd setup.py --exec cat {}

Length of output: 29


Script:

#!/bin/bash
# Look for any other Python dependency files
fd ".*requirements.*\.txt|poetry\.lock|Pipfile" --exec cat {}

Length of output: 65602

api/analyzers/source_analyzer.py (1)

11-11: JavaScript analyzer import addition confirmed.

Adding JavaScriptAnalyzer aligns with the pull request objective to support .js files.

pyproject.toml (1)

16-16: Dependency addition looks appropriate.

tree-sitter-javascript = "^0.23.2" is correctly specified. Validate that this version meets any advanced language feature requirements you may want to parse (e.g., newer JS syntax).

✅ Verification successful

Version 0.23.2 is appropriate for the JavaScript parsing requirements

Based on the code analysis, the tree-sitter-javascript dependency is used for basic JavaScript AST parsing, focusing on:

  • Function declarations and their parameters
  • Class declarations
  • Function call expressions
  • Basic identifier extraction

The specified version (^0.23.2) is actually newer than the latest PyPI version (0.23.1) and supports all the JavaScript syntax features currently being used in the analyzer. The codebase doesn't use any advanced JavaScript features that would require a newer version, as it's primarily focused on fundamental syntax elements like functions, classes, and method calls.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the newly added dependency is at the latest stable version
curl -s https://pypi.org/pypi/tree-sitter-javascript/json | jq '.info.version'

Length of output: 87


Script:

#!/bin/bash
# Check if there are any JavaScript files in the repository to understand the JS features being used
fd -e js -x head -n 50

Length of output: 24


Script:

#!/bin/bash
# Look for any configuration files that might specify JavaScript version/features
fd -g '*babel*' -g '*eslint*' -g '*tsconfig*' -g '*jsconfig*'

Length of output: 273


Script:

#!/bin/bash
# Check if tree-sitter-javascript is already in use and how
rg -l "tree-sitter-javascript"

Length of output: 57


Script:

#!/bin/bash
# Check the content of poetry.lock to see how tree-sitter-javascript is being used
cat poetry.lock | grep -A 10 "tree-sitter-javascript"

Length of output: 678


Script:

#!/bin/bash
# Look for Python files that might be using tree-sitter-javascript
rg -l "from tree_sitter" 

Length of output: 147


Script:

#!/bin/bash
# Check how tree-sitter-javascript is being used in the JavaScript analyzer
cat api/analyzers/javascript/analyzer.py

Length of output: 7853


Script:

#!/bin/bash
# Check utils.py to understand the common tree-sitter usage patterns
cat api/analyzers/utils.py

Length of output: 1429

Comment on lines +1 to +3
import io
import os
from ..utils import *
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid import * from relatively unknown modules.

Using star imports (from ..utils import * and from ...entities import *) can cause namespace conflicts and obfuscate which names are actually used. Consider importing only the required objects or using explicit imports to maintain clearer code and prevent accidental overrides.

-from ..utils import *
-from ...entities import *
+from ..utils import find_child_of_type  # and any other required imports
+from ...entities import File, Function, Class  # or whichever entities are actually used

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

3-3: from ..utils import * used; unable to detect undefined names

(F403)

Comment on lines +25 to +53
def extract_js_function_name(node: Node) -> str:
"""
Extract the function name from a JavaScript function node.

Args:
node (Node): The AST node representing the function.

Returns:
str: The name of the function.
"""
for child in node.children:
if child.type == 'identifier':
return child.text.decode('utf-8')
return ''

def extract_js_class_name(node: Node) -> str:
"""
Extract the class name from a JavaScript class node.

Args:
node (Node): The AST node representing the class.

Returns:
str: The name of the class.
"""
for child in node.children:
if child.type == 'identifier':
return child.text.decode('utf-8')
return ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language specific utilizes shouldn't be added to utils

Comment on lines +104 to +114
"""
Perform the first pass processing of a JavaScript source file.

Args:
path (Path): The path to the JavaScript source file.
f (io.TextIOWrapper): The file object representing the opened JavaScript source file.
graph (Graph): The Graph object where entities will be added.

Returns:
None
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should include information about what the function does, "processing JavaScript file" is too general.
specify which entities are extracted.

Comment on lines +182 to +188
try:
# Parse file
content = f.read()
tree = self.parser.parse(content)
except Exception as e:
logger.error(f"Failed to process file {path}: {e}")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit of a waste, we've already read the file and parsed it on the first pass.

@gkorland gkorland marked this pull request as draft January 5, 2025 17:43
@gkorland gkorland requested a review from Copilot January 9, 2025 13:56
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

caller = function_def[0]
caller_name = caller.text.decode('utf-8')
caller_f = graph.get_function_by_name(caller_name)
assert(caller_f is not None)
Copy link
Preview

Copilot AI Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assert in production code is unconventional. Consider handling this case more gracefully.

Suggested change
assert(caller_f is not None)
if caller_f is None: logger.error(f'Caller function not found: {caller_name}'); continue

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for JavaScript
2 participants