Changelog

Current (in progress)

Nothing yet

2.1.0 (2025-01-13)

Refactor ParseError to enrich Sentry with context and to inquire about Sentry errors like #4096 #218
Remove legacy routes #203
More explicit error reporting when sending to udata without raising errors for udata responding with a 404 #213
Minor cleaning: remove unused arg in function #219
Fix type issue regarding resource_id #220
Use bump'X #226
Get actual resource URL in case of 404 (change since last catalog load) #225
Add CLI util to insert or update a resource into the catalog (change since last catalog load) #228
Fix deadlocks errors when purging CSV tables by refactoring purge_csv_tables to use atomic transactions #230
Improve timing of checks depending on changes since last check #163
Remove bad default value in CLI to insert a resource #235
Trigger GitLab deployment in CI when pushing on main #186
Fix GitLab deployment in CI #239
Add indexes to improve resource filtering and batch selection #240
Parallelize tests in CI #238
Refactor analysis logic to remove 5 non necessary queries, using the existing data in the code instead of re-querying it #227

2.0.5 (2024-11-08)

Fix minor types issues #204
Return resources statuses count in crawler status endpoint response #206
Fix deprecated CircleCI config #207
Fix Sentry issue #4195 #209
Clean doctrings for more consistent style #215
Fix some type hints #214
Add option to force analysis even if resource has not changed #205
Fix get all checks CRUD method #217
Deactivate parquet export for small CSVs #216

2.0.4 (2024-10-28)

Fix wrong resource status #196
Fix issue related to empty table_indexes column instead of default {} #197
Make the last_check column of the catalog table as a foreign key to checks table, in order to fix error when crawling resources with last checks that don't exist anymore #195
Fix analyse-csv CLI when using by url, and refactor errors for cases when resource or url not found #200
Fix errors when sending malformed request, and make API error responses more consistent #202

2.0.3 (2024-10-22)

Save git commit hash in CI and use it for health check #182 and #185
Add comment column/field to ressources exceptions #191
Add extra args and DB fields for parquet export #193
Fix CircleCI config for packaging version not to include commit hash when publishing #194

2.0.2 (2024-10-07)

Fix typos in README in curl commands examples #189
Bump csv-detective to 0.7.3 #192

2.0.1 (2024-10-04)

Refactor function to get no_backoff domains and add PostgreSQL indexes to improve DB queries perfs #171
Clean changelog and remove useless section in pyproject.toml #175
Refactor purge_checks CLI to use a date limit instead of a number #174
Fix resources exceptions routes responses, add resources exceptions tests #176
Fix CSV analysis CLI #181
Add a PUT /api/resources-exceptions/{id} route to update a resource exception #178
Add a quiet argument for purge_check and purge_csv_table CLIs #184
Fix wrong resource status #187
More informative error relative to check resource CLI #188

2.0.0 (2024-09-24)

Use Python 3.11 instead of 3.9 for performance improvements and future compatibility #101
Refactor and split code from crawl.py into separate files using refactored db.Resource class methods and static methods #135
Allow routes with or without trailing slashes #158
Delete resource as a CRUD method #161
Refactor routes URLs to be more RESTful and separate legacy routes code from new routes code #132
Display app version and environment in health check endpoint #164
Use ENVIRONMENT from config file instead of env var #165
Manage large resources exceptions differently #148
Add checks aggregate route #167

1.1.0 (2024-09-26)

Use profiling option from csv-detective #54
Remove csv_analysis, integrate into checks #52
Add new types for csv parsing: json, date and datetime #51
Notify udata of csv parsing #51
Allow None values in udata notifications #51
Add tests for udata-triggered checks #49
Include migration files in package
Allow to configure a dedicated PostgreSQL schema #56
Fix typo in handle_parse_exception and schema in CLI #57
Skip archived dataset when loading catalog #58
Update resources expected dates in API following udata refactoring #60
Download csv resource only if first check #61
Send content-type and content-length info from header to udata #64
Add timezone values to dates sent to udata #63
Rename analysis filesize to content-length #66
Sleep between all batches #67
Support having multiple crawlers by setting a status column in the catalog table #68
Add a health route #69
Make temporary folder configurable #70
Fix conflict on updating catalog with multiple entries for a resource #73
Set check:available to None in case of a 429 #75
Improve conditional analysis logic and readability #76 #80
Use latest csv-detective version #89
Compare content type / length to check if changed #78 #79
Create a list of exceptions to analyse despite larger size #85
Enable csv.gz analysis #84
Add worker default timeout config #86
Return None value early when casting in csv analysis #87
Ping udata after loading a csv to database #91
Allow for none value in resource schema #93
Handle other file formats #92
Add a quiet option on load catalog #95
Select distinct parsing tables to delete #96
Enable parquet export #97
Update documentation #98 and #106
Add linter and formatter with pyproject.toml config, add lint and formatting step in CI, add pre-commit hook to lint and format, update docs and lint and format the code #99
Update sentry-sdk dependency, and update Sentry logic to be able to send environment, app version and profiling/performance info #100
Basic cleaning: use Python 3.11 in CI, remove Pandas in project dependencies, add type hints, fix wrong type hints, remove deprecated version field in docker compose files, update .gitignore [#102] [#102] and #107
Add missing content-type for csv.gz #103
Remove deprecated pytz module #109
Refactor project structure to use DB classes for each DB table, with their factorized DB methods #112 and #55
Add tests coverage feature #122
Refactor routes #117
Fix Ruff configuration #125
Add some API tests to improve coverage #123
Fix health check endpoint route which was wrongly removed, and add test for API health check endpoint to make sure this endpoint is working as expected #128
Add basic authentication via API key using a bearer token auth for all POST/PUT/DELETE endpoints #130
Simplify getting Sentry info by loading pyproject.toml info in config #138
Add a POST /api/checks/ route for force crawling #118
Update csv-detective to 0.7.2 which doesn't include yanked version of requests anymore #142 and #144
Update resource statuses in DB when crawling and analysing, and add resource status route #119
Simplify save_as_parquet method, and fix type not compatible with Python 3.9; remove unused import #156
Fix and simplify project metadata loading #157
Pin Numpy version to 1.26.4 to avoid conflicts with pandas and csv-detective

1.0.1 (2023-01-04)

Packaging-fix release

1.0.0 (2023-01-04)

Initial version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

Current (in progress)

2.1.0 (2025-01-13)

2.0.5 (2024-11-08)

2.0.4 (2024-10-28)

2.0.3 (2024-10-22)

2.0.2 (2024-10-07)

2.0.1 (2024-10-04)

2.0.0 (2024-09-24)

1.1.0 (2024-09-26)

1.0.1 (2023-01-04)

1.0.0 (2023-01-04)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Current (in progress)

2.1.0 (2025-01-13)

2.0.5 (2024-11-08)

2.0.4 (2024-10-28)

2.0.3 (2024-10-22)

2.0.2 (2024-10-07)

2.0.1 (2024-10-04)

2.0.0 (2024-09-24)

1.1.0 (2024-09-26)

1.0.1 (2023-01-04)

1.0.0 (2023-01-04)