Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dac 524 baseline v2, v3, NLU demo #21

Merged
merged 37 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
bb2381f
first version of V2 NLU
Aug 2, 2023
914d467
nlu v2, v3, demo and corrections
TimeaBagosiCrim Nov 30, 2023
5b1c20f
prop and target vdbs
TimeaBagosiCrim Nov 30, 2023
0ef1f55
fix jsons
TimeaBagosiCrim Dec 4, 2023
d0f2b4a
squashed
TimeaBagosiCrim Dec 5, 2023
ebadeb2
add gitignore and root level init.py
TimeaBagosiCrim Dec 5, 2023
12929dd
move duckling url to config
TimeaBagosiCrim Dec 5, 2023
4706a47
typings + formatting + import fixes
fmigneault Dec 6, 2023
63641a0
add GitHub CI tests workflow
fmigneault Dec 6, 2023
c907360
fix ci cache
fmigneault Dec 6, 2023
760a6e9
fix typo in nlp conda env file
fmigneault Dec 6, 2023
31c3ea7
loosen nlp packages deps to support various python versions
fmigneault Dec 6, 2023
202fd80
fix ci logic
fmigneault Dec 6, 2023
01d0ff6
help EO resolve sat-search version
fmigneault Dec 6, 2023
67f8997
fix conda envs to allow over-limiting python-dateutils<2.8 in older s…
fmigneault Dec 6, 2023
e36d4dd
second pass resolve sat-stac/pandas conflict over python-dateutil
fmigneault Dec 6, 2023
e9fe1cb
attempt 3 to resolve deps
fmigneault Dec 6, 2023
3ca7aa2
more formatting
fmigneault Dec 6, 2023
7cab8ff
remove sat-search/sat-stac deprecated (https://github.com/sat-utils/s…
fmigneault Dec 6, 2023
a834672
update intake-stac to avoid older/deprecated sat-stac dependency
fmigneault Dec 6, 2023
4134a5e
fix module resolution for pytest
fmigneault Dec 6, 2023
3301b62
remove used typing
fmigneault Dec 6, 2023
704df69
fix eval metric to allow numeric value to consider float/int directly
fmigneault Dec 7, 2023
b190ca2
Merge pull request #22 from crim-ca/baseline-v2-tests
fmigneault Dec 7, 2023
472e46c
ensure spacy-transformers is installed
fmigneault Dec 9, 2023
6e35b6b
fix spacy auto-download + pin transformers==4.30.2 to resolve depende…
fmigneault Dec 9, 2023
a15d98e
fix config paths for nlp pipelines
fmigneault Dec 9, 2023
f97c666
remove chroma caches
fmigneault Dec 9, 2023
6ecba06
update notebook outputs
fmigneault Dec 9, 2023
fedaa20
move spacy transformers dependencies under pip to resolve
fmigneault Dec 9, 2023
0d93ef7
add call to duckling a subprocess instead of sibling docker
fmigneault Dec 9, 2023
792c8ac
add checks and workarounds for flaky breaking ipywidgets
fmigneault Dec 11, 2023
20913d4
fix invalid function used in V2 pipeline for Vdb ngrams creation
fmigneault Dec 11, 2023
b4a0651
add input param to select STAC catalog override without flaky ipywidget
fmigneault Dec 11, 2023
3674410
fix PAVICS typos
fmigneault Dec 11, 2023
e8228c0
handle NL query bbox convertion to STAC client + end2end notebook exe…
fmigneault Dec 11, 2023
c885432
update nlu demo to provide verbosity flag options explicitly
fmigneault Dec 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# run test suites

name: Tests
on:
- pull_request
- push

jobs:
# see: https://github.com/fkirc/skip-duplicate-actions
skip_duplicate:
continue-on-error: true
runs-on: ubuntu-latest
outputs:
should_skip: ${{ steps.skip_check.outputs.should_skip }}
steps:
- id: skip_check
uses: fkirc/skip-duplicate-actions@master
with:
concurrent_skipping: "same_content"
skip_after_successful_duplicate: "true"
do_not_skip: '["pull_request", "workflow_dispatch", "schedule"]'

# NOTE:
# Run all the steps even if there are no tests defined for a given domain sub-directory.
# This is to make sure that the environment definition is at the very least buildable.
tests:
needs: skip_duplicate
if: ${{ needs.skip_duplicate.outputs.should_skip != 'true' }}
runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.allow-failure }}
env:
CACHE_NUMBER: 0 # increment to reset cache

# ensure conda env activation is performed automatically
defaults:
run:
shell: bash -el {0}

strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
# somehow mamba with python 3.12 doesn't resolve spacy although available...
python-version: ["3.9", "3.10", "3.11"]
allow-failure: [false]
domain: ["eo", "nlp"]

steps:
- uses: actions/checkout@v2
with:
fetch-depth: "0"

- name: Setup Mamba
uses: conda-incubator/setup-miniconda@v3
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniforge-variant: Mambaforge
miniforge-version: latest
activate-environment: github-ci-test-python${{ matrix.python-version }}-${{ matrix.domain }}
use-mamba: true
use-only-tar-bz2: true

- name: Set cache date
run: echo "DATE=$(date +'%Y%m%d')" >> $GITHUB_ENV

- uses: actions/cache@v2
id: cache
with:
path: ${{ env.CONDA }}/envs/github-ci-test-python${{ matrix.python-version }}-${{ matrix.domain }}
key: conda-python${{ matrix.python-version }}-${{ matrix.domain }}-${{ hashFiles('${{ matrix.domain }}/environment.yml') }}-${{ env.DATE }}-${{ env.CACHE_NUMBER }}

- name: Display Python
run: which python

- name: Update environment
if: steps.cache.outputs.cache-hit != 'true'
run: |
echo "python=${{ matrix.python-version }}" > ${{ env.CONDA }}/envs/github-ci-test-python${{ matrix.python-version }}-${{ matrix.domain }}/conda-meta/pinned
mamba env update \
-n github-ci-test-python${{ matrix.python-version }}-${{ matrix.domain }} \
-f ${{ matrix.domain }}/environment.yml

- name: Display Packages
if: ${{ matrix.python-version != 'none' }}
run: pip freeze

- name: Display Environment Variables
run: |
hash -r
env | sort

- name: Check Tests
id: check_tests
run: |
echo "HAS_TEST_DIR=$(test -d ${{ matrix.domain }}/tests && echo 'true' || echo 'false')" >> $GITHUB_OUTPUT

- name: Install Tests Dependencies
if: ${{ steps.check_tests.outputs.HAS_TEST_DIR == 'true' }}
run: pip install -r requirements-dev.txt

- name: Run Tests
if: ${{ steps.check_tests.outputs.HAS_TEST_DIR == 'true' }}
run: |
cd ${{ matrix.domain }}/notebooks
python -m pytest -vvv ../tests
17 changes: 17 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,25 @@
### IDE
**/.idea/
**/.vscode/
**/*.code-workspace

### Caches
**/__pycache__/
**/cache/
**/*tree-tagger-linux*
**/.pytest_cache
**/condaenv.*.requirements.txt

## Chroma VDB caches
**/*.bin
**/*.pickle
**/*.sqlite3

### Binaries
**/*.jar

### Notebooks
# expect examples per domain
# disallow notebooks at root
./*.ipynb
**/.ipynb_checkpoints/
1 change: 0 additions & 1 deletion eo/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ dependencies:
- intake-stac
- pyproj
- rasterio
- sat-search
- shapely

# TODO: These next packages could possibly be added to a more generic 'vision' image, from which 'eo' would be built
Expand Down
10 changes: 9 additions & 1 deletion nlp/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,15 @@ RUN cd heideltime && \
rmdir heideltime-standalone && \
rm heideltime-standalone-2.2.1.tar.gz

# Setup Haskell for Duckling server
# https://github.com/facebook/duckling
RUN curl -sSL https://get.haskellstack.org/ | sh && \
git clone https://github.com/facebook/duckling && \
cd duckling && \
stack build && \
stack install && \
rm -fr duckling

# Give read&write permission to jenkins for config
RUN chown -R jenkins heideltime

Expand All @@ -46,4 +55,3 @@ RUN chown -R 1000:1000 /opt/conda/pkgs/cache

# specify user because of problem running start-notebook.sh when being root
USER jenkins

File renamed without changes.
33 changes: 20 additions & 13 deletions nlp/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,26 @@ channels:
- conda-forge

dependencies:
- intake-esm==2021.1.15
- intake-stac==0.3.0
- sat-search==0.3.0
- intake-esm
- intake-stac>=0.4.0
- threddsclient==0.4.2
- openjdk==8.0.152
# python-flair=0.8 only works with numpy<=1.19.5
- python-flair=0.8
- numpy<=1.19.5
- spacy==3.1.0
- python-dateutil==2.7.5
- python-levenshtein==0.12.2
- requests=2.25.1
- pip==20.3.3
- openjdk==8.0.152
- python-flair
- numpy
- pydantic<2
- python-levenshtein
- requests
- pip>=22
- pip:
- textsearch==0.0.21
- spacy==3.1.0
- osmnx
- langchain
- spacy>=3.5,<4
- spacy-transformers
- transformers<4.31
- sentence_transformers
- chromadb
- shapely
- ipywidgets
- nltk
- pystac_client
27,431 changes: 27,431 additions & 0 deletions nlp/notebooks/NLU_demo.ipynb

Large diffs are not rendered by default.

File renamed without changes.
Loading
Loading