Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include documentation on how to use training on Taskcluster in a Pull Request #3844

Merged
merged 5 commits into from
Nov 22, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,24 @@ To use a model to classify a given bug, you can run `python -m scripts.bug_class

**testing** To use the model to classify a given bug, you can run `python -m scripts.bug_classifier defect --bug-id ID_OF_A_BUG_FROM_BUGZILLA`.

### Training on Taskcluster (Mozilla's CI platform)

You could run the model training task on the CI. To do this, simply include `Train on Taskcluster: <model name>` in the pull request description.

#### Example

To train the `spambug` model on Taskcluster, you need to add the following line in the pull request description, ideally at the bottom:

```
Train on Taskcluster: spambug
```

There are a few things to consider when training a model on Taskcluster:

- This is currently only supported in GitHub pull requests.
- The training task will be re-run every time you push to the branch linked to the pull request. Limiting the number of times you push is wise to avoid unnecessary training and resource wastage. Alternatively, you could temporarily remove the "Train on Taskcluster" keyword from the pull request description.
- Currently, the training task extracts only the model's name and does not consider arguments.

### Running the repository mining script

Note: This section is only necessary if you want to perform changes to the repository mining script. Otherwise, you can simply use the commits data we generate automatically.
Expand Down