Skip to content

Commit

Permalink
Document downloading data through BugBug (#3873)
Browse files Browse the repository at this point in the history
  • Loading branch information
suhaibmujahid authored Nov 30, 2023
1 parent 066746d commit cdd80bb
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ More information on the Mozilla Hacks blog:
- https://hacks.mozilla.org/2020/07/testing-firefox-more-efficiently-with-machine-learning/
- https://hacks.mozilla.org/2019/04/teaching-machines-to-triage-firefox-bugs/

Data generated by BugBug to train the models can be used independently from BugBug. See the [docs](docs/data.md) for details.

## Classifiers

- **assignee** - The aim of this classifier is to suggest an appropriate assignee for a bug.
Expand Down
54 changes: 54 additions & 0 deletions docs/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Downloading Data Using BugBug

BugBug relies on various types of data, such as bugs, commits, issues, and crash reports, to build its models. Although all this data is publicly available through different APIs, retrieving it every time we train a model is not an efficient solution. Hence, a copy of the data is saved as downloadable compressed files through a simple API.

> **Note:**
> You can use the data outside this project by using BugBug as a dependency (`pip install bugbug`).
## Bugzilla Bugs

```py
from bugbug import bugzilla, db

# Downland the latest version if the data set if it is not already downloaded
db.download(bugzilla.BUGS_DB)

# Iterate over all bugs in the dataset
for bug in bugzilla.get_bugs():
# This is the same as if you retrieved the bug through Bugzilla REST API:
# https://bmo.readthedocs.io/en/latest/api/core/v1/bug.html
print(bug["id"])
```

## Phabricator Revisions

```py
from bugbug import phabricator, db

db.download(bugzilla.REVISIONS_DB)

for revision in phabricator.get_revisions():
# The revision here combines the results retrieved from two API endpoints:
# https://phabricator.services.mozilla.com/conduit/method/differential.revision.search/
# https://phabricator.services.mozilla.com/conduit/method/transaction.search/
print(revision["id"])
```

## Repository Commits

```py
from bugbug import repository, db

db.download(bugzilla.COMMITS_DB)

for commit in repository.get_commits():
print(commit["node"])
```

## Github Issues

> _TODO_
## Mozilla Crash Reports

> _TODO_

0 comments on commit cdd80bb

Please sign in to comment.