Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document downloading data through BugBug #3873

Merged
merged 3 commits into from
Nov 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ More information on the Mozilla Hacks blog:
- https://hacks.mozilla.org/2020/07/testing-firefox-more-efficiently-with-machine-learning/
- https://hacks.mozilla.org/2019/04/teaching-machines-to-triage-firefox-bugs/

Data generated by BugBug to train the models can be used independently from BugBug. See the [docs](docs/data.md) for details.

## Classifiers

- **assignee** - The aim of this classifier is to suggest an appropriate assignee for a bug.
Expand Down
54 changes: 54 additions & 0 deletions docs/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Downloading Data Using BugBug
marco-c marked this conversation as resolved.
Show resolved Hide resolved

BugBug relies on various types of data, such as bugs, commits, issues, and crash reports, to build its models. Although all this data is publicly available through different APIs, retrieving it every time we train a model is not an efficient solution. Hence, a copy of the data is saved as downloadable compressed files through a simple API.

> **Note:**
> You can use the data outside this project by using BugBug as a dependency (`pip install bugbug`).

## Bugzilla Bugs

```py
from bugbug import bugzilla, db

# Downland the latest version if the data set if it is not already downloaded
db.download(bugzilla.BUGS_DB)

# Iterate over all bugs in the dataset
for bug in bugzilla.get_bugs():
# This is the same as if you retrieved the bug through Bugzilla REST API:
# https://bmo.readthedocs.io/en/latest/api/core/v1/bug.html
print(bug["id"])
```

## Phabricator Revisions

```py
from bugbug import phabricator, db

db.download(bugzilla.REVISIONS_DB)

for revision in phabricator.get_revisions():
# The revision here combines the results retrieved from two API endpoints:
# https://phabricator.services.mozilla.com/conduit/method/differential.revision.search/
# https://phabricator.services.mozilla.com/conduit/method/transaction.search/
print(revision["id"])
```

## Repository Commits

```py
from bugbug import repository, db

db.download(bugzilla.COMMITS_DB)

for commit in repository.get_commits():
print(commit["node"])
```

## Github Issues

> _TODO_

## Mozilla Crash Reports

> _TODO_