- Nothing yet
- Refactor ParseError to enrich Sentry with context and to inquire about Sentry errors like #4096 #218
- Remove legacy routes #203
- More explicit error reporting when sending to udata without raising errors for udata responding with a 404 #213
- Minor cleaning: remove unused arg in function #219
- Fix type issue regarding
resource_id
#220 - Use bump'X #226
- Get actual resource URL in case of 404 (change since last catalog load) #225
- Add CLI util to insert or update a resource into the catalog (change since last catalog load) #228
- Fix deadlocks errors when purging CSV tables by refactoring
purge_csv_tables
to use atomic transactions #230 - Improve timing of checks depending on changes since last check #163
- Remove bad default value in CLI to insert a resource #235
- Trigger GitLab deployment in CI when pushing on
main
#186 - Fix GitLab deployment in CI #239
- Add indexes to improve resource filtering and batch selection #240
- Parallelize tests in CI #238
- Refactor analysis logic to remove 5 non necessary queries, using the existing data in the code instead of re-querying it #227
- Fix minor types issues #204
- Return resources statuses count in crawler status endpoint response #206
- Fix deprecated CircleCI config #207
- Fix Sentry issue #4195 #209
- Clean doctrings for more consistent style #215
- Fix some type hints #214
- Add option to force analysis even if resource has not changed #205
- Fix get all checks CRUD method #217
- Deactivate parquet export for small CSVs #216
- Fix wrong resource status #196
- Fix issue related to empty
table_indexes
column instead of default{}
#197 - Make the
last_check
column of thecatalog
table as a foreign key tochecks
table, in order to fix error when crawling resources with last checks that don't exist anymore #195 - Fix
analyse-csv
CLI when using by url, and refactor errors for cases when resource or url not found #200 - Fix errors when sending malformed request, and make API error responses more consistent #202
- Save git commit hash in CI and use it for health check #182 and #185
- Add comment column/field to ressources exceptions #191
- Add extra args and DB fields for parquet export #193
- Fix CircleCI config for packaging version not to include commit hash when publishing #194
- Refactor function to get no_backoff domains and add PostgreSQL indexes to improve DB queries perfs #171
- Clean changelog and remove useless section in pyproject.toml #175
- Refactor purge_checks CLI to use a date limit instead of a number #174
- Fix resources exceptions routes responses, add resources exceptions tests #176
- Fix CSV analysis CLI #181
- Add a
PUT
/api/resources-exceptions/{id}
route to update a resource exception #178 - Add a
quiet
argument forpurge_check
andpurge_csv_table
CLIs #184 - Fix wrong resource status #187
- More informative error relative to check resource CLI #188
- Use Python 3.11 instead of 3.9 for performance improvements and future compatibility #101
- Refactor and split code from
crawl.py
into separate files using refactoreddb.Resource
class methods and static methods #135 - Allow routes with or without trailing slashes #158
- Delete resource as a CRUD method #161
- Refactor routes URLs to be more RESTful and separate legacy routes code from new routes code #132
- Display app version and environment in health check endpoint #164
- Use ENVIRONMENT from config file instead of env var #165
- Manage large resources exceptions differently #148
- Add checks aggregate route #167
- Use profiling option from csv-detective #54
- Remove csv_analysis, integrate into checks #52
- Add new types for csv parsing: json, date and datetime #51
- Notify udata of csv parsing #51
- Allow
None
values in udata notifications #51 - Add tests for udata-triggered checks #49
- Include migration files in package
- Allow to configure a dedicated PostgreSQL schema #56
- Fix typo in handle_parse_exception and schema in CLI #57
- Skip archived dataset when loading catalog #58
- Update resources expected dates in API following udata refactoring #60
- Download csv resource only if first check #61
- Send content-type and content-length info from header to udata #64
- Add timezone values to dates sent to udata #63
- Rename analysis filesize to content-length #66
- Sleep between all batches #67
- Support having multiple crawlers by setting a status column in the catalog table #68
- Add a health route #69
- Make temporary folder configurable #70
- Fix conflict on updating catalog with multiple entries for a resource #73
- Set check:available to None in case of a 429 #75
- Improve conditional analysis logic and readability #76 #80
- Use latest csv-detective version #89
- Compare content type / length to check if changed #78 #79
- Create a list of exceptions to analyse despite larger size #85
- Enable csv.gz analysis #84
- Add worker default timeout config #86
- Return None value early when casting in csv analysis #87
- Ping udata after loading a csv to database #91
- Allow for none value in resource schema #93
- Handle other file formats #92
- Add a quiet option on load catalog #95
- Select distinct parsing tables to delete #96
- Enable parquet export #97
- Update documentation #98 and #106
- Add linter and formatter with
pyproject.toml
config, add lint and formatting step in CI, add pre-commit hook to lint and format, update docs and lint and format the code #99 - Update
sentry-sdk
dependency, and update Sentry logic to be able to send environment, app version and profiling/performance info #100 - Basic cleaning: use Python 3.11 in CI, remove Pandas in project dependencies, add type hints, fix wrong type hints, remove deprecated version field in docker compose files, update
.gitignore
[#102] [#102] and #107 - Add missing content-type for csv.gz #103
- Remove deprecated
pytz
module #109 - Refactor project structure to use DB classes for each DB table, with their factorized DB methods #112 and #55
- Add tests coverage feature #122
- Refactor routes #117
- Fix Ruff configuration #125
- Add some API tests to improve coverage #123
- Fix health check endpoint route which was wrongly removed, and add test for API health check endpoint to make sure this endpoint is working as expected #128
- Add basic authentication via API key using a bearer token auth for all POST/PUT/DELETE endpoints #130
- Simplify getting Sentry info by loading pyproject.toml info in config #138
- Add a
POST
/api/checks/
route for force crawling #118 - Update
csv-detective
to 0.7.2 which doesn't include yanked version ofrequests
anymore #142 and #144 - Update resource statuses in DB when crawling and analysing, and add resource status route #119
- Simplify
save_as_parquet
method, and fix type not compatible with Python 3.9; remove unused import #156 - Fix and simplify project metadata loading #157
- Pin Numpy version to 1.26.4 to avoid conflicts with pandas and csv-detective
- Packaging-fix release
- Initial version