Skip to content

Latest commit

 

History

History
145 lines (123 loc) · 12.4 KB

CHANGELOG.md

File metadata and controls

145 lines (123 loc) · 12.4 KB

Changelog

Current (in progress)

  • Nothing yet

2.1.0 (2025-01-13)

  • Refactor ParseError to enrich Sentry with context and to inquire about Sentry errors like #4096 #218
  • Remove legacy routes #203
  • More explicit error reporting when sending to udata without raising errors for udata responding with a 404 #213
  • Minor cleaning: remove unused arg in function #219
  • Fix type issue regarding resource_id #220
  • Use bump'X #226
  • Get actual resource URL in case of 404 (change since last catalog load) #225
  • Add CLI util to insert or update a resource into the catalog (change since last catalog load) #228
  • Fix deadlocks errors when purging CSV tables by refactoring purge_csv_tables to use atomic transactions #230
  • Improve timing of checks depending on changes since last check #163
  • Remove bad default value in CLI to insert a resource #235
  • Trigger GitLab deployment in CI when pushing on main #186
  • Fix GitLab deployment in CI #239
  • Add indexes to improve resource filtering and batch selection #240
  • Parallelize tests in CI #238
  • Refactor analysis logic to remove 5 non necessary queries, using the existing data in the code instead of re-querying it #227

2.0.5 (2024-11-08)

  • Fix minor types issues #204
  • Return resources statuses count in crawler status endpoint response #206
  • Fix deprecated CircleCI config #207
  • Fix Sentry issue #4195 #209
  • Clean doctrings for more consistent style #215
  • Fix some type hints #214
  • Add option to force analysis even if resource has not changed #205
  • Fix get all checks CRUD method #217
  • Deactivate parquet export for small CSVs #216

2.0.4 (2024-10-28)

  • Fix wrong resource status #196
  • Fix issue related to empty table_indexes column instead of default {} #197
  • Make the last_check column of the catalog table as a foreign key to checks table, in order to fix error when crawling resources with last checks that don't exist anymore #195
  • Fix analyse-csv CLI when using by url, and refactor errors for cases when resource or url not found #200
  • Fix errors when sending malformed request, and make API error responses more consistent #202

2.0.3 (2024-10-22)

  • Save git commit hash in CI and use it for health check #182 and #185
  • Add comment column/field to ressources exceptions #191
  • Add extra args and DB fields for parquet export #193
  • Fix CircleCI config for packaging version not to include commit hash when publishing #194

2.0.2 (2024-10-07)

  • Fix typos in README in curl commands examples #189
  • Bump csv-detective to 0.7.3 #192

2.0.1 (2024-10-04)

  • Refactor function to get no_backoff domains and add PostgreSQL indexes to improve DB queries perfs #171
  • Clean changelog and remove useless section in pyproject.toml #175
  • Refactor purge_checks CLI to use a date limit instead of a number #174
  • Fix resources exceptions routes responses, add resources exceptions tests #176
  • Fix CSV analysis CLI #181
  • Add a PUT /api/resources-exceptions/{id} route to update a resource exception #178
  • Add a quiet argument for purge_check and purge_csv_table CLIs #184
  • Fix wrong resource status #187
  • More informative error relative to check resource CLI #188

2.0.0 (2024-09-24)

  • Use Python 3.11 instead of 3.9 for performance improvements and future compatibility #101
  • Refactor and split code from crawl.py into separate files using refactored db.Resource class methods and static methods #135
  • Allow routes with or without trailing slashes #158
  • Delete resource as a CRUD method #161
  • Refactor routes URLs to be more RESTful and separate legacy routes code from new routes code #132
  • Display app version and environment in health check endpoint #164
  • Use ENVIRONMENT from config file instead of env var #165
  • Manage large resources exceptions differently #148
  • Add checks aggregate route #167

1.1.0 (2024-09-26)

  • Use profiling option from csv-detective #54
  • Remove csv_analysis, integrate into checks #52
  • Add new types for csv parsing: json, date and datetime #51
  • Notify udata of csv parsing #51
  • Allow None values in udata notifications #51
  • Add tests for udata-triggered checks #49
  • Include migration files in package
  • Allow to configure a dedicated PostgreSQL schema #56
  • Fix typo in handle_parse_exception and schema in CLI #57
  • Skip archived dataset when loading catalog #58
  • Update resources expected dates in API following udata refactoring #60
  • Download csv resource only if first check #61
  • Send content-type and content-length info from header to udata #64
  • Add timezone values to dates sent to udata #63
  • Rename analysis filesize to content-length #66
  • Sleep between all batches #67
  • Support having multiple crawlers by setting a status column in the catalog table #68
  • Add a health route #69
  • Make temporary folder configurable #70
  • Fix conflict on updating catalog with multiple entries for a resource #73
  • Set check:available to None in case of a 429 #75
  • Improve conditional analysis logic and readability #76 #80
  • Use latest csv-detective version #89
  • Compare content type / length to check if changed #78 #79
  • Create a list of exceptions to analyse despite larger size #85
  • Enable csv.gz analysis #84
  • Add worker default timeout config #86
  • Return None value early when casting in csv analysis #87
  • Ping udata after loading a csv to database #91
  • Allow for none value in resource schema #93
  • Handle other file formats #92
  • Add a quiet option on load catalog #95
  • Select distinct parsing tables to delete #96
  • Enable parquet export #97
  • Update documentation #98 and #106
  • Add linter and formatter with pyproject.toml config, add lint and formatting step in CI, add pre-commit hook to lint and format, update docs and lint and format the code #99
  • Update sentry-sdk dependency, and update Sentry logic to be able to send environment, app version and profiling/performance info #100
  • Basic cleaning: use Python 3.11 in CI, remove Pandas in project dependencies, add type hints, fix wrong type hints, remove deprecated version field in docker compose files, update .gitignore [#102] [#102] and #107
  • Add missing content-type for csv.gz #103
  • Remove deprecated pytz module #109
  • Refactor project structure to use DB classes for each DB table, with their factorized DB methods #112 and #55
  • Add tests coverage feature #122
  • Refactor routes #117
  • Fix Ruff configuration #125
  • Add some API tests to improve coverage #123
  • Fix health check endpoint route which was wrongly removed, and add test for API health check endpoint to make sure this endpoint is working as expected #128
  • Add basic authentication via API key using a bearer token auth for all POST/PUT/DELETE endpoints #130
  • Simplify getting Sentry info by loading pyproject.toml info in config #138
  • Add a POST /api/checks/ route for force crawling #118
  • Update csv-detective to 0.7.2 which doesn't include yanked version of requests anymore #142 and #144
  • Update resource statuses in DB when crawling and analysing, and add resource status route #119
  • Simplify save_as_parquet method, and fix type not compatible with Python 3.9; remove unused import #156
  • Fix and simplify project metadata loading #157
  • Pin Numpy version to 1.26.4 to avoid conflicts with pandas and csv-detective

1.0.1 (2023-01-04)

  • Packaging-fix release

1.0.0 (2023-01-04)

  • Initial version