Releases: gretelai/gretel-synthetics
Releases · gretelai/gretel-synthetics
RTFD!
📖 Module docs now available at https://gretel-synthetics.readthedocs.io
🚧 Minor updates to internals to support better documentation
Colaboratory support
📚 Tutorial and doc improvements
- Use installed Tensorflow library by default (Colab uses optimized Tensorflow version for TPU)
- Optionally, install pinned version of Tensorflow with
pip install gretel-synthetics[tf]
Improvements and Fixes
👍 Improvements
- Calculate model perplexity per training epoch (metric for synthetic data set quality)
- Added progress bar for SentencePiece tokenizer (can take a while on large datasets)
- Cleaned up logging
📚 Tutorial and doc improvements
- Automatically save model parameters and training history to model directory
- Specify
save_all_checkpoints
config option to save best, or all checkpoints (save disk space)
Improvements and fixes
👍 Improvements
- Support CRLF newlines in training datasets
- Commas & newlines treated by tokenizer as user defined symbols
- Increased default vocabulary size from 200 to 15000. Increases number of successful record validations in most test sets
📚 Tutorial and doc improvements
- Specify max_lines in configuration file vs. max_chars (more intuitive)
Sentencepiece Tokenization
Adding Sentencepiece tokenization (https://github.com/google/sentencepiece) to allow for fixed vocabulary sizes and character / token-based training.
Hello world!
Initial release of Gretel's synthetic data generation project.