Skip to content

Commit

Permalink
Blocking events
Browse files Browse the repository at this point in the history
Changes:
* Refactors verdicts into the experiment_results and blocking_events model for the websites experiment
* Add version information to CLI tool
* Add support for older python versions (>=3.7)
* Add support for running tests under CI
  • Loading branch information
hellais authored Nov 7, 2022
1 parent 0646d24 commit 7b41136
Show file tree
Hide file tree
Showing 23 changed files with 1,827 additions and 1,938 deletions.
56 changes: 56 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: Tests
on:
push:
branches:
- main
pull_request:
branches:
- '**'
jobs:
Tests:
name: ${{ matrix.os }} / ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}-latest
strategy:
matrix:
os: [Ubuntu, MacOS]
python-version: [3.7, 3.8, 3.9, "3.10"]
defaults:
run:
shell: bash

steps:
- uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Get full Python version
id: full-python-version
run: |
echo ::set-output name=version::$(python -c "import sys; print('-'.join(str(v) for v in sys.version_info))")
- name: Install poetry
run: |
curl -fsS https://install.python-poetry.org | python - --preview -y
- name: Add poetry to PATH
run: echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Set up cache
uses: actions/cache@v3
id: cache
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('**/poetry.lock') }}

- name: Install dependencies
run: poetry install

- name: Run all tests
run: poetry run pytest --cov=./ --cov-report=xml -q tests

- name: Upload coverage to codecov
uses: codecov/codecov-action@v3

10 changes: 6 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
__pycache__
/.coverage
/output
/asdata
/geoip
/.coverage*
/coverage.xml
/tests/data/datadir/*
/tests/data/measurements/*
/dist
/datadir
/output
/attic
104 changes: 83 additions & 21 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,35 +49,97 @@ graph TD
MsmtProcessor --> HTTPObservations
```


The `measurement_processor` stage can be run either in a streaming fashion as
measurements are uploaded to the collector or in batch mode by reprocessing
existing raw measurements.

### Verdict generation
```mermaid
graph LR
P((Probe)) --> M{{Measurement}}
BE --> P
M --> PL[(Analysis)]
PL --> O{{Observations}}
O --> PL
PL --> BE{{ExperimentResult}}
BE --> E((Explorer))
O --> E
```

### ExperimentResult generation

A verdict is the result of interpreting one or more network observations
collected within a particular testing session. For example, a verdict could
conceptually look like "we think there's interference with TLS handshakes
to 8.8.4.4:443 using dns.google as the SNI in country ZZ and AS0".
The data flow of the blocking event generation pipeline looks as follows:
```mermaid
classDiagram
direction RL
ExperimentResult --* WebsiteExperimentResult
ExperimentResult --* WhatsAppExperimentResult
ExperimentResult : +String measurement_uid
ExperimentResult : +datetime timestamp
ExperimentResult : +int probe_asn
ExperimentResult : +String probe_cc
ExperimentResult : +String network_type
ExperimentResult : +struct resolver
ExperimentResult : +List[str] observation_ids
ExperimentResult : +List[BlockingEvent] blocking_events
ExperimentResult : +float ok_confidence
ExperimentResult : +bool anomaly
ExperimentResult : +bool confirmed
class WebsiteExperimentResult {
+String domain_name
+String website_name
}
An important component to verdict generation is having some form of baseline to
establish some ground truth. This is necessary in order to establish if the
network condition we are seeing is a result of the target being offline vs it
being the result of blocking.
class WhatsAppExperimentResult {
+float web_ok_confidence
+String web_blocking_detail
+float registration_ok_confidence
+String registration_blocking_detail
+float endpoints_ok_confidence
+String endpoints_blocking_detail
}
class BlockingEvent {
blocking_type: +BlockingType
blocking_subject: +String
blocking_detail: +String
blocking_meta: +json
confidence: +float
}
class BlockingType {
<<enumeration>>
OK
BLOCKED
NATIONAL_BLOCK
ISP_BLOCK
LOCAL_BLOCK
SERVER_SIDE_BLOCK
DOWN
THROTTLING
}
```

The data flow of the verdict generation pipeline looks as follows:
```mermaid
graph TD
IPInfoDB[(IPInfoDB)] --> VerdictGenerator
FingerprintDB[(FingerprintDB)] --> VerdictGenerator
Observations --> GrouperTimeTarget[/"GROUP BY time_interval, target"/]
GrouperTimeTarget --> BaselineGenerator{{"baseline_generator()"}}
GrouperTimeTarget --> GrouperSession[/"GROUP BY session_id"/]
BaselineGenerator --> Baselines
Baselines --> VerdictGenerator
GrouperSession --> VerdictGenerator{{"verdict_generator()"}}
VerdictGenerator --> Verdicts
graph
M{{Measurement}} --> OGEN[[observationGen]]
OGEN --> |many| O{{Observations}}
O --> CGEN[[controlGen]]
O --> ODB[(ObservationDB)]
ODB --> CGEN
CGEN --> |many| CTRL{{Controls}}
CTRL --> A[[Analysis]]
FDB[(FingerprintDB)] --> A
NDB[(NetInfoDB)] --> A
O --> A
A --> |one| ER{{ExperimentResult}}
ER --> |many| BE{{BlockingEvents}}
```

Some precautions need to be taken when running the `verdict_generator()` in
Expand Down
6 changes: 6 additions & 0 deletions oonidata/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
try:
import importlib.metadata as importlib_metadata
except ModuleNotFoundError:
import importlib_metadata

__version__ = importlib_metadata.version(__name__)
Loading

0 comments on commit 7b41136

Please sign in to comment.