A simple queue worker that produces Mozilla code repository telemetry.
The bin/process-queue-messages
script reads messages about Mozilla source
code pushes from the Mozilla
Pulse messaging service.
It adds some data about the code review system used by the commit author and
submits the data to telemetry.mozilla.org
where we can build nifty dashboards.
You will need to create an account on Mozilla Pulse to collect messages about hgpush events.
These programs were designed to run on Heroku and follow the Heroku Architectural Principles. They read their settings from environment variables.
See the file dotenv.example.txt in the project root for possible values. The values that must be present in your local and/or heroku execution environments:
$ cp dotenv.example.txt .env
$ vim .env
# Add your personal environment's configuration
Run the following command to check that everything works. It won't send any data:
$ PYTHONPATH=. bin/process-queue-messages --no-send
$ PYTHONPATH=. bin/process-queue-messages
Read all push messages from the hgpush event queue, figure out which review system was used for each, and send the result to telemetry.mozilla.org.
Use --help
for full command usage info.
Use --debug
for full command debug output.
Use --no-send
to gather all the data and build a payload, but do not
send any real pings. All push event messages remain in queues, too. This is
great for testing changes or diagnosing problems against a live queue.
$ PYTHONPATH=. bin/dump-telemetry SOME_COMMIT_SHA1
Calculate and print the ping that would have been sent to telemetry.mozilla.org for a given changeset ID. This command does not send any data to telemetry.mozilla.org. Useful for debugging troublesome changesets and testing service connectivity.
Use --help
for full command usage info.
Use --debug
for full command debug output.
$ PYTHONPATH=. bin/backfill-pushlog REPO_URL STARTING_PUSHID ENDING_PUSHID
Read the Mercurial repository pushlog at REPO_URL, fetch all pushes from STARTING_PUSHID to ENDING_PUSHID, then calculate and publish their telemetry. This can be used to back-fill pushes missed by service gaps.
Use --help
for full command usage info.
Use --debug
for full command debug output.
Use --no-send
to gather all the data and build a payload, but do not
send any real pings.
Use pyenv to install the same python version listed in the project's .python-version file:
$ pyenv install
Set up a virtual environment (e.g. with pyenv virtualenv) and install the project development dependencies:
$ pip install -r requirements.txt -r dev-requirements.txt
Code formatting is done with black.
requirements.txt
and dev-requirements.txt
are updated using hashin.
Push event messages are read from a Pulse message queue. You can inspect a live hgpush message queue with Pulse Inspector.
Messages use the hgpush message format.
Push events are generated from the mercurial repo pushlogs.
Pings (telemetry data) are sent to TMO using the hgpush ping schema. Make sure you match the schema or your pings will be dropped!
The unit test suite can be run with py.test.
Manual testing can be done with:
$ PYTHONPATH=. bin/dump-telemetry --debug <SOME_CHANGESET_SHA>
and
$ PYTHONPATH=. bin/process-queue-messages --no-send --debug
If you need a message queue with a lot of traffic for testing you may want to
listen for messages on integration/mozilla-inbound
. To switch the message
queue set the following environment variables:
PULSE_QUEUE_NAME=hgpush-inbound-test-queue
PULSE_QUEUE_ROUTING_KEY=integration/mozilla-inbound
After deploying a schema change check these monitors:
- Graph of all pings for the last 8 days (successes and failures)
- List of the last 10 ingested pings (both successful and rejected)
- Reason for the last 10 ping rejections
You can also write custom monitors using hand-crafted CEP dashboards.
Ask in #datapipeline
on IRC if you need help with this.