Skip to content

Latest commit

 

History

History
585 lines (387 loc) · 16.9 KB

CHANGELOG.md

File metadata and controls

585 lines (387 loc) · 16.9 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to semantic versioning.

[Unreleased]

Added

Changed

Deprecated

Fixed

Removed

[0.5.0] - 2025-01-09

Added

  • Added link and description of easy_pipeline_run repo to README.md.

Changed

  • Modified list_files function in cdp/helpers/s3_utils.py to use pagination when listing objects from S3 buckets, improving handling of large buckets.
  • Added test cases for new pagination functionality in list_files function in tests/cdp/helpers/test_s3_utils.py.

Deprecated

Fixed

Removed

[0.4.4] - 2024-12-13

Added

Changed

  • Modified insert_df_to_hive_table function in cdp/io/output.py. Added support for creating non-existent Hive tables, repartitioning by column or partition count, and handling missing columns with explicit type casting.

Deprecated

Fixed

Removed

[0.4.3] - 2024-12-05

Added

Changed

  • Update CODEOWNERS file, changed email to GitHub username.

Deprecated

Fixed

Removed

[0.4.2] - 2024-11-26

Added

Changed

  • Updated ons-mkdocs-theme version from 1.1.2 to 1.1.3 to fix issues with the crest not showing in the footer of documentation site.

Deprecated

Fixed

Removed

[0.4.1] - 2024-11-25

Added

Changed

  • Updated the ons-mkdocs-theme version number in doc requirements in setup.cfg.

Deprecated

Fixed

Removed

[0.4.0] - 2024-11-21

Added

Changed

  • Unpinned pandas version in setup.cfg to allow for more flexibility in dependency management.
  • Removed numpy from setup.cfg as it will be installed automatically by pandas.

Deprecated

Fixed

Removed

[v0.3.7] - 2024-11-20

Added

  • Added write_csv function inside cdp/helpers/s3_utils.py.

Changed

Deprecated

Fixed

Removed

[v0.3.6] - 2024-10-16

Added

Changed

Deprecated

Fixed

  • Changed cut_lineage function inside helpers/pyspark.py to make it compatible with newer PySpark versions.

Removed

[v0.3.5] - 2024-10-04

Added

Changed

  • Added "How the Project is Organised" section to README.md.
  • Fix docstring for test_load_json_with_encoding in test_s3_utils.py.

Deprecated

Fixed

Removed

[v0.3.4] - 2024-09-30

Added

  • Added load_json to s3_utils.py.

Changed

Deprecated

Fixed

Removed

[v0.3.3] - 2024-09-10

Added

  • Added InvalidS3FilePathError to exceptions.py.
  • Added validate_s3_file_path to s3_utils.py.

Changed

  • Fixed docstring for load_csv in helpers/pyspark.py.
  • Call validate_s3_file_path function inside save_csv_to_s3.
  • Call validate_bucket_name and validate_s3_file_path function inside cdp/helpers/s3_utils/load_csv.

Deprecated

Fixed

  • Improved truncate_external_hive_table to handle both partitioned and non-partitioned Hive tables, with enhanced error handling and support for table identifiers in <database>.<table> or <table> formats.

Removed

[v0.3.2] - 2024-09-02

Added

  • Added load_csv to helpers/pyspark.py with kwargs parameter.
  • Added truncate_external_hive_table to helpers/pyspark.py.
  • Added get_tables_in_database to cdp/io/input.py.
  • Added load_csv to cdp/helpers/s3_utils.py. This loads a CSV from S3 bucket into a Pandas DataFrame.

Changed

  • Removed .config("spark.shuffle.service.enabled", "true") from create_spark_session() not compatible with CDP. Added .config("spark.dynamicAllocation.shuffleTracking.enabled", "true") & .config("spark.sql.adaptive.enabled", "true").
  • Change mkdocs theme from mkdocs-tech-docs-template to ons-mkdocs-theme.
  • Added more parameters to load_and_validate_table() in cdp/io/input.py.

Deprecated

Fixed

Removed

[v0.3.1] - 2024-05-24

Added

  • Added zip_folder function to io/output.py.

Changed

  • Modified gcp_utils.py, added more helper functions for GCS.
  • Modified docstring for InvalidBucketNameError in exceptions.py.

Deprecated

Fixed

Removed

[v0.3.0] - 2024-05-20

Added

  • Added .isort.cfg to configure isort with the black profile and recognize rdsa-utils as a local repository.
  • Reformatted the entire codebase using black and isort.

Changed

  • Updated .pre-commit-config.yaml to include black and isort as pre-commit hooks for code formatting.
  • Updated setup.cfg to include black and isort in the dev requirements.
  • Updated README.md to include black formatting badge.
  • Updated ruff.toml to align with black's formatting rules.

Deprecated

Fixed

Removed

[v0.2.3] - 2024-05-20

Added

  • Added save_csv_to_s3 function in cdp/io/output.py.

Changed

  • Modified docstrings in cdp/helpers/s3_utils.py; remove type-hints from docstrings, type-hints already in function signatures.
  • Add Examples section in delete_folder function in s3_utils.py.
  • Modified docstrings in cdp/io/input.py & cdp/io/output.py; remove type-hints from docstrings, type-hints already in function signatures.
  • Updated .gitignore to exclude metastore_db/ directory.
  • Standardised parameter names for consistency across S3 utility functions s3_utils.py

Deprecated

Fixed

Removed

[v0.2.2] - 2024-05-14

Added

  • Added s3_utils.py module located in cdp/helpers/.

Changed

  • Updated reference.md; included s3_utils.py.
  • Updated README.md; added Ruff and Python versions badges.

Deprecated

Fixed

Removed

[v0.2.1] - 2024-05-10

Added

Changed

  • Revised the "Further Reading on Reproducible Analytical Pipelines" section in the README.md for clarity.

Deprecated

Fixed

Removed

[v0.2.0] - 2024-05-10

Added

Changed

  • Breaking Change: Renamed module cdsw to cdp (Cloudera Data Platform).
  • Added a "Further Reading on Reproducible Analytical Pipelines" section to README.md to enhance resources on RAP best practices.
  • Added section on synchronising the development branch with main to the branch_and_deploy_guide.md file.

Deprecated

Fixed

  • Updated contribution_guide.md; fix code block rendering issue in mkdocs by removing extra whitespaces.

Removed

[v0.1.10] - 2024-05-08

Added

  • Updated branch_and_deploy_guide.md, added section titled: "Merging Development to Main: A Guide for Maintainers"

Changed

  • Updated README.md to include new badges for Deployment Status and PyPI version.

Deprecated

Fixed

Removed

[v0.1.9] - 2024-04-03

Added

  • Added mkdocs-mermaid2-plugin to the doc extras_require in setup.cfg, enhancing documentation with MermaidJS diagram support.
  • Added gitleaks and local restrict-filenames hooks to .pre-commit-config.yaml.
  • Enhanced README.md headers with relevant emojis for improved readability and engagement.

Changed

  • Modified README.md: Added Installation section and Git Workflow Diagram section with a MermaidJS diagram.
  • Improved the branch_and_deploy_guide.md and contribution_guide.md documentation on branching strategy.
  • Updated python_requires in setup.cfg to support Python versions >=3.8 and <3.12, including all 3.11.x versions.
  • Modified pull_request_workflow.yaml to add Python 3.11 to the testing matrix.
  • Moved pyspark from primary dependencies to dev section in extras_require to streamline installation for users with pre-installed environments, requiring manual installation where necessary.
  • Renamed isdir function in cdsw/helpers/hdfs_utils to is_dir for improved compliance with PEP 8 naming conventions.
  • Removed line stopping existing SparkSession in create_spark_session to prevent Py4JError and enable seamless SparkContext management on GCP.
  • Refactor save_csv_to_hdfs to use functions in /cdsw/helpers/hdfs_utils.py
  • Add function delete_path in /cdsw/helpers/hdfs_utils.py, and refactor docstring for delete_file and delete_dir.
  • Modified CHANGELOG.md added note on missing pre-v0.1.8 releases due to deploy_pypi.yaml issues

Deprecated

Fixed

Removed

[v0.1.8] - 2024-02-28

Added

  • Added pyproject.toml and setup.cfg.

Changed

Deprecated

Fixed

Removed

  • Removed requirements.txt now in setup.cfg.

[v0.1.7] - 2024-02-28

Added

Changed

Deprecated

Fixed

  • Added build dependency in .github/workflows/deploy_pypi.yaml

Removed

[v0.1.6] - 2024-02-28

Added

Changed

  • Modified Workflow Trigger in .github/workflows/deploy_pypi.yaml

Deprecated

Fixed

Removed

  • Removed .github/workflows/version_check.yaml

[v0.1.5] - 2024-02-28

Added

Changed

Deprecated

Fixed

  • Fix GitHub Branch Reference for deployment.

[v0.1.4] - 2024-02-28

Added

Changed

Deprecated

Fixed

  • Remove check of branch for deployment.

[v0.1.3] - 2024-02-28

Added

Changed

  • Take workflows out of nested folder to have PyPI listing on merge to main branch.

Deprecated

Fixed

[v0.1.2] - 2024-02-28

Added

Changed

  • Workflows to have PyPI listing on merge to main branch.

Deprecated

Fixed

[v0.1.1] - 2024-02-28

Added

Changed

Deprecated

Fixed

  • Typo in the documentation to install Python.

Removed

[v0.1.0] - 2024-02-28

Added

  • parametrize_cases and Case code for use in test scripts.
  • Add in PR template.
  • README with additional information and guidelines for contributors.
  • Pull Request Workflow includes test job which installs Poetry and Run Tests.
  • Add .pre-commit-config.yaml for pre-commit hooks.
  • Add CODEOWNERS file to repository.
  • Add mkdocs; deploy_mkdocs.yaml and docs Folder.
  • Add the helpers_spark.py and test_helpers_spark.py modules from cprices-utils.
  • Add logging.py and test_logging.py module from cprices-utils.
  • Add the helpers_python.py and test_helpers_python.py modules from cprices-utils.
  • Add averaging_methods.py and test_averaging_methods.py.
  • Add init_logger_advanced in helpers/logging.py module.
  • Add in the general validation functions from cprices-utils.
  • Add invalidate_impala_metadata function to the cdsw/impala.py module.
  • Add "search" Plugin and mkdocs GOV UK Theme via mkdocs-tech-docs-template.
  • Add pipeline_runlog.py and hdfs_utils.py modules from epds_utils.
  • Add common custom exceptions.
  • Add config load class.
  • Add generic IO input functions.
  • Add docs/contribution_guide.md
  • Add functions from epds_utils into helpers/pyspark.py, io/input.py, io/output.py.
  • Add various I/O functions from the io.py module in cprices-utils.
  • Add modules to docs/reference.md
  • Add mkdocs Plugins: mkdocs-git-revision-date-localized-plugin, mkdocs-jupyter.
  • Add better navigation to mkdocs.yml.
  • Add save_csv_to_hdfs function to cdsw/io/output.py.
  • Add docs/branch_and_deploy_guide.md.
  • Add .github/workflows/deploy_pypi/version_check.yaml and .github/workflows/deploy_pypi/deploy_pypi.yaml.

Changed

  • Renamed _typing module to typing.
  • Renamed modules in helpers directory to remove helper_ from names.
  • Relocated logging.py and validation.py to root level.
  • Relocated Getting Started for Developers into docs/contribution_guide.md.
  • Migrated from poetry to setup.py for Python Code Packaging.
  • Upgrade mkdocs-tech-docs-template to 0.1.2.
  • Moved CDSW related from io/input.py & io/output.py into cdsw/io/input.py & cdsw/io/output.py
  • Pin pytest version <8.0.0 due to TvoroG/pytest-lazy-fixture#65
  • Updated the license information.

Deprecated

Fixed

  • Fix paths for get_window_spec in averaging_methods.py.
  • Fix deploy_mkdocs.yaml, changed mkdocs-material to mkdocs-tech-docs-template.
  • Fix module paths for unit test patches in tests/cdsw/.
  • Fix pull_request_workflow.yaml; ensured pytest failures are accurately reported in GitHub workflow by removing || true condition.
  • Fix deploy_mkdocs.yaml, fixed Python version to 3.10.
  • Fix deploy_mkdocs.yaml, missing quotes for Python version.

Removed

  • Remove _version.py.
  • Remove all references to Poetry.

Release Links

Note: Releases prior to v0.1.8 are not available on GitHub Releases and PyPI due to bugs in the GitHub Action deploy_pypi.yaml, which deploys to PyPI and GitHub Releases.