Skip to content

Releases: gretelai/gretel-synthetics

RTFD!

10 May 12:15
b3a769e
Compare
Choose a tag to compare

📖 Module docs now available at https://gretel-synthetics.readthedocs.io

🚧 Minor updates to internals to support better documentation

Colaboratory support

30 Apr 19:36
8def903
Compare
Choose a tag to compare

📚 Tutorial and doc improvements

  • Use installed Tensorflow library by default (Colab uses optimized Tensorflow version for TPU)
  • Optionally, install pinned version of Tensorflow with pip install gretel-synthetics[tf]

Improvements and Fixes

30 Apr 17:31
76ac70a
Compare
Choose a tag to compare

👍 Improvements

  • Calculate model perplexity per training epoch (metric for synthetic data set quality)
  • Added progress bar for SentencePiece tokenizer (can take a while on large datasets)
  • Cleaned up logging

📚 Tutorial and doc improvements

  • Automatically save model parameters and training history to model directory
  • Specify save_all_checkpoints config option to save best, or all checkpoints (save disk space)

Improvements and fixes

21 Apr 00:31
8ce0df4
Compare
Choose a tag to compare

👍 Improvements

  • Support CRLF newlines in training datasets
  • Commas & newlines treated by tokenizer as user defined symbols
  • Increased default vocabulary size from 200 to 15000. Increases number of successful record validations in most test sets

📚 Tutorial and doc improvements

  • Specify max_lines in configuration file vs. max_chars (more intuitive)

Sentencepiece Tokenization

24 Mar 16:33
Compare
Choose a tag to compare

Adding Sentencepiece tokenization (https://github.com/google/sentencepiece) to allow for fixed vocabulary sizes and character / token-based training.

Hello world!

02 Mar 16:26
1ce0a47
Compare
Choose a tag to compare

Initial release of Gretel's synthetic data generation project.