Blocking events

Changes: * Refactors verdicts into the experiment_results and blocking_events model for the websites experiment * Add version information to CLI tool * Add support for older python versions (>=3.7) * Add support for running tests under CI
ooni · Nov 7, 2022 · 7b41136 · 7b41136
1 parent 0646d24
commit 7b41136
Show file tree

Hide file tree

Showing 23 changed files with 1,827 additions and 1,938 deletions.
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -0,0 +1,56 @@
+name: Tests
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - '**'
+jobs:
+  Tests:
+    name: ${{ matrix.os }} / ${{ matrix.python-version }}
+    runs-on: ${{ matrix.os }}-latest
+    strategy:
+      matrix:
+        os: [Ubuntu, MacOS]
+        python-version: [3.7, 3.8, 3.9, "3.10"]
+    defaults:
+      run:
+        shell: bash
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Get full Python version
+        id: full-python-version
+        run: |
+          echo ::set-output name=version::$(python -c "import sys; print('-'.join(str(v) for v in sys.version_info))")
+
+      - name: Install poetry
+        run: |
+          curl -fsS https://install.python-poetry.org | python - --preview -y
+
+      - name: Add poetry to PATH
+        run: echo "$HOME/.local/bin" >> $GITHUB_PATH
+
+      - name: Set up cache
+        uses: actions/cache@v3
+        id: cache
+        with:
+          path: .venv
+          key: venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('**/poetry.lock') }}
+
+      - name: Install dependencies
+        run: poetry install
+
+      - name: Run all tests
+        run: poetry run pytest --cov=./ --cov-report=xml -q tests
+
+      - name: Upload coverage to codecov
+        uses: codecov/codecov-action@v3
+
diff --git a/.gitignore b/.gitignore
@@ -1,7 +1,9 @@
 __pycache__
-/.coverage
-/output
-/asdata
-/geoip
+/.coverage*
+/coverage.xml
 /tests/data/datadir/*
 /tests/data/measurements/*
+/dist
+/datadir
+/output
+/attic
diff --git a/Readme.md b/Readme.md
@@ -49,35 +49,97 @@ graph TD
     MsmtProcessor --> HTTPObservations
 ```
 
+
 The `measurement_processor` stage can be run either in a streaming fashion as
 measurements are uploaded to the collector or in batch mode by reprocessing
 existing raw measurements.
 
-### Verdict generation
+```mermaid
+graph LR
+    P((Probe)) --> M{{Measurement}}
+    BE --> P
+    M --> PL[(Analysis)]
+    PL --> O{{Observations}}
+    O --> PL
+    PL --> BE{{ExperimentResult}}
+    BE --> E((Explorer))
+    O --> E
+```
+
+### ExperimentResult generation
 
-A verdict is the result of interpreting one or more network observations
-collected within a particular testing session. For example, a verdict could
-conceptually look like "we think there's interference with TLS handshakes
-to 8.8.4.4:443 using dns.google as the SNI in country ZZ and AS0".
+The data flow of the blocking event generation pipeline looks as follows:
+```mermaid
+classDiagram
+    direction RL
+
+    ExperimentResult --* WebsiteExperimentResult
+    ExperimentResult --* WhatsAppExperimentResult
+
+    ExperimentResult : +String measurement_uid
+    ExperimentResult : +datetime timestamp
+    ExperimentResult : +int probe_asn
+    ExperimentResult : +String probe_cc
+    ExperimentResult : +String network_type
+    ExperimentResult : +struct resolver
+    ExperimentResult : +List[str] observation_ids
+    ExperimentResult : +List[BlockingEvent] blocking_events
+    ExperimentResult : +float ok_confidence
+
+    ExperimentResult : +bool anomaly
+    ExperimentResult : +bool confirmed
+
+    class WebsiteExperimentResult {
+      +String domain_name
+      +String website_name
+    }
 
-An important component to verdict generation is having some form of baseline to
-establish some ground truth. This is necessary in order to establish if the
-network condition we are seeing is a result of the target being offline vs it
-being the result of blocking.
+    class WhatsAppExperimentResult {
+        +float web_ok_confidence
+        +String web_blocking_detail
+
+        +float registration_ok_confidence
+        +String registration_blocking_detail
+
+        +float endpoints_ok_confidence
+        +String endpoints_blocking_detail
+    }
+
+    class BlockingEvent {
+        blocking_type: +BlockingType
+        blocking_subject: +String
+        blocking_detail: +String
+        blocking_meta: +json
+        confidence: +float
+    }
+
+    class BlockingType {
+        <<enumeration>>
+        OK
+        BLOCKED
+        NATIONAL_BLOCK
+        ISP_BLOCK
+        LOCAL_BLOCK
+        SERVER_SIDE_BLOCK
+        DOWN
+        THROTTLING
+    }
+```
 
-The data flow of the verdict generation pipeline looks as follows:
 ```mermaid
-graph TD
-    IPInfoDB[(IPInfoDB)] --> VerdictGenerator
-    FingerprintDB[(FingerprintDB)] --> VerdictGenerator
-
-    Observations --> GrouperTimeTarget[/"GROUP BY time_interval, target"/]
-    GrouperTimeTarget --> BaselineGenerator{{"baseline_generator()"}}
-    GrouperTimeTarget --> GrouperSession[/"GROUP BY session_id"/]
-    BaselineGenerator --> Baselines
-    Baselines --> VerdictGenerator
-    GrouperSession --> VerdictGenerator{{"verdict_generator()"}}
-    VerdictGenerator --> Verdicts
+graph
+    M{{Measurement}} --> OGEN[[observationGen]]
+    OGEN --> |many| O{{Observations}}
+    O --> CGEN[[controlGen]]
+    O --> ODB[(ObservationDB)]
+    ODB --> CGEN
+    CGEN --> |many| CTRL{{Controls}}
+    CTRL --> A[[Analysis]]
+    FDB[(FingerprintDB)] --> A
+    NDB[(NetInfoDB)] --> A
+    O --> A
+    A --> |one| ER{{ExperimentResult}}
+    ER --> |many| BE{{BlockingEvents}}
 ```
 
 Some precautions need to be taken when running the `verdict_generator()` in

diff --git a/oonidata/__init__.py b/oonidata/__init__.py
@@ -0,0 +1,6 @@
+try:
+    import importlib.metadata as importlib_metadata
+except ModuleNotFoundError:
+    import importlib_metadata
+
+__version__ = importlib_metadata.version(__name__)