Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* fix interface of `get_sample_input` * save configuration parameters * ae wrapper implemented * fix import * add AEWrapper step * from set_model_to_dtype to prepare_model * fix eval mode during inference * fix clip onnx export. Now it trace ony the needed outputs * fix t5 wrapper * reorder input name flux * fix flux input format for text_ids and guidance * fix Flux imports and scale of inputs to prevent nan added `"latent": {0: "B", 1: "latent_dim"}` as additional dynamic axes * add torch inference while tracing * fix casting problem in onnx trace * solve optimization problem by removing cleanup steps * rename to notes * prevent nan due to large inputs * provide base implementation of `get_model` * format * add trt export step * add engine class for trt build * add `get_input_profile` and `get_minmax_dims` abstract methods * add `build_strongly_typed` attributed * implement `get_minmax_dims` and `get_input_profile` * remove `static_shape` from `get_sample_input` * remove static sharpe and batch flags * add typing * remove static shape and batch flags * offload to cpu * enable device offloading while tracing * check cuda is avaiable while building engines * clip trt engine build * add pinned transformer dependency * fix nan with onnx and trt when executed on CUDA * AE need to be traced in TF32 not FP16 * add `get_shape_dict` abstract method and device as a property * AE should be traced in TF32 * AE explicitly on TF32 and reactivate full pipeline * add input provile to flux to enable trt engine build * format and add input_profile to t5 for TRT build * add `TransformersModelWrapper` * add TransformersModelWrapper support * add `get_shape_dict` interface * add TransformersModelWrapper support * add shape_dict interface * t5 in TF32 for numerical reasons * remove unused options * remove unused code * add `get_shape_dict` * remove custom optimization * add garbage collector * return error * create wrapper specific to Onnx export operatio * user OnnxWrapper * create base wrapper for trt engines * moved to engine package * moved to engine package * forbit relative import of trt-builder * remove wrapper and create BaseExporter or BaseEngine * models not stored in builder class * _prepare_model_configs as pure function * _get_onnx_exporters as a private method to get onnx exporters * remove unused dependencies * from onnxwrapper to onnxengine * trt engine class * add `calculate_max_device_memory` to TRTBuilder * `get_shape_dict` moved to trt-engine interface * add common inference code * autoencder inference wrapper * add requirements.txt * support guidance for ev model * ad support for trt based on evn variables * format flux * remove stream from constructor * fix iterate over onnx-exporters * flux is not strongly type * move back for numerical stability * add logging * fix dtype casting for bfloat16 * fix default value * add version before merge * hacky get it building the engines * requirements.txt * adding a seperate _engine.py file for all the flux, t5 and clip engine * boilerroom and plating. getting parameters handle into setting up the trt engines * remove _version.py from git * create base mixin class to share parameters * clipmixin parameters * remove parameters as are part of mixin class * clip engine and exporter use common mixin for managing parameters * use mixin cass to build engine from exporter * ae-mixin for shared parameters * flux exporter and engine unified by mixin class * formatting * add common `get_latent_dims` method * add `get_latent_dims` common method * T5 based on mixin class * build strongly typed flux * enable load with shared device memory * remove boilderpart code to create engines * add tokenizer to trt engine * use static shape for reduce memory consumption * implemnet tokenizer into t5 engine * mix max_batch size to 8 * add licence * add licence * enable trt runtime tracking * add static-batch and static-shape options * add cuda steam to load method * add inference code * add inference code * enable static shape * add `static_shape` option to reduce memory and `_build_engine` as staticmethod * add `should_be_dtype` filed to handle output type conversion * from trtbuilder to trt_manager * from TRTBuilder to TRTManager * AE engine interface * `trt_to_torch_dtype_dict` as property * clip engine inference * implement flux trt engine inference process * add scale_factor and shift_factor * removed `should_be_dtype` * removed `should_be_dtype` * remove `should_be_dtype` from t5 * add scale and shift factor * `max_batch` to 8 * implement `TRTManager` * from ae to vae to match DD * remove autocast * `pooled_embeddings` to match DD naming for clip * rename `flux` to `transformer` engine * from flux to transformer mixin * from flux to transforemer exporter * fix trtmanger with naming * fix inputs names and dimentions. Nota that `img_ids` and `txt_ids` are without batch dim * fix shape of inputs according to `text_maxlen` and batch_size * reduce max_batch * fix stage naming * add support for DD model * add support for DD models * fix dtype configuration * fix enginge dtype * trensformers inference interface to match DD * vae inference script dtype mapping * remove dtype checks as multiples can be actives * by default tf32 always active * fix trt enginges names * add wrapper for fluxmodel to match DD onnx configuration * add autocast back in to match DD setup * fix dependencies for trt support * support trt * add explicit kwargs * vscode setup * add setup instructions for trt * `trt` dependencies not part of `all` * from onnx_exporter to exporter * hide onnx parameters * from onnx-exporter to exporter * exporter responsible to build trt engine and onnx exportr * hide onnx parameter * remove build function from engine class * remove unused import * remove space * manage t5 and vae separately * disable autocast * stronglytyped t5 * fix input type and max image size * max image size * T5 not strongly typed * testing * fix torch sycronize problem * don't build already present engines * remove torch save * removed onnx dependencies * add trt dependencies * remove trt dependencies from toml * rename requirements and fix readme * remove unused files * fix import format * remove comments * add gitignore * reset dependencies * add hidden setup files * solve ruff check * fix imports with rufs * run ruff formatter * update gitignore * simplify dependencies * remove gitignore * add cli formatting * fix import orders * simplify dependencies * solve vae quality issue * fix ruff format * fix merge changes * format and sort src/flux/cli * fix merge conflicts * add trt import * add static shape support (not completed) * remove fp8 support * add static shape * add static shape to t5 * add static shape to transformer * remove model opt code * enable offloading with trt engines * add `stream` as part of `init_runtime` * enable offloading * `allocate_buffers` moved to call * formatting * add capability to compute `img_dim` * enable dynamic or static-shape * split base-engine and engine class * clip as engine * t5 as engine * transformer as engine * VAEDEcoder as engine and VAEEngine as BaseEngine * from vae to vae_decoder, vae_encoder and vae * use `set_stream` and fix activate call * fix import and remove stages in TRTManager * from BaseEngine to BaseEngine and Engine * fix imports * add trt support to cli_controlnet * add vae_encoder to support controlnet * refactor vae engine to use load() and activate() functions * implement vae_encoder_exporter. Not tested * fix imports * add static_batch and static_shape to cli.py as additional option ? * update dependencies * revert formatting * from Self to Any to be compatible with pytorch 3.10 * from `vae_decoder` to `vae` for compatibility with oss engines * missing torch import * add `scale_factor` and `shift_factor` to VAE-encoder * add check if vae is traced * offload while tracing * default `text_maxlen` set to dev size instead of schnell * remove line * add warnign when text_maxlen is not read from t5 * fix imports --------- Co-authored-by: scavallari <[email protected]> Co-authored-by: ahohl <[email protected]>
- Loading branch information