All notable changes to this project will be documented in this file. The format is based on Keep a Changelog.
- Added an example for training
Trompt
on multiple GPUs (#474) - Added support for materializing dataset for train and test dataframe separately(#470)
- Added support for PyTorch 2.5 (#464)
- Added a benchmark script to compare PyTorch Frame with PyTorch Tabular (#398, #444)
- Added
is_floating_point
method toMultiNestedTensor
andMultiEmbeddingTensor
(#445) - Added support for inferring
stype.categorical
from boolean columns inutils.infer_series_stype
(#421) - Added
pin_memory()
toTensorFrame
,MultiEmbeddingTensor
, andMultiNestedTensor
(#437)
- Set
weights_only=True
intorch_frame.load
from PyTorch 2.4 (#423)
- Dropped support for Python 3.8 (#462)
- Fixed size mismatch
RuntimeError
intransforms.CatToNumTransform
(#446) - Removed CUDA synchronizations from
nn.LinearEmbeddingEncoder
(#432) - Removed CUDA synchronizations from N/A imputation logic in
nn.StypeEncoder
(#433, #434)
- Updated
ExcelFormer
implementation and related scripts (#391)
- Avoided for-loop in
EmbeddingEncoder
(#366) - Added
image_embedded
and one tabular image dataset (#344) - Added benchmarking suite for encoders (#360)
- Added dataframe text benchmark script (#354, #367)
- Added
DataFrameTextBenchmark
dataset (#349) - Added support for empty
TensorFrame
(#339)
- Changed a workflow of Encoder's
na_forward
method resulting in performance boost (#364) - Removed ReLU applied in
FCResidualBlock
(#368)
- Fixed bug in empty
MultiNestedTensor
handling (#369) - Fixed the split of
DataFrameTextBenchmark
(#358) - Fixed empty
MultiNestedTensor
col indexing (#355)
- Support more stypes in
LinearModelEncoder
(#325) - Added
stype_encoder_dict
to some models (#319) - Added
HuggingFaceDatasetDict
(#287)
- Supported decoder embedding model in
examples/transformers_text.py
(#333) - Removed implicit clones in
StypeEncoder
(#286)
- Fixed
TimestampEncoder
not applyingCyclicEncoder
to cyclic features (#311) - Fixed NaN masking in
multicateogrical
stype (#307)
- Added support for Boolean masks in
index_select
of_MultiTensor
334 - Added more text documentation (#291)
- Added
col_to_model_cfg
(#270) - Support saving/loading of GBDT models (#269)
- Added documentation on handling different stypes (#271)
- Added
TimestampEncoder
(#225) - Added
LightGBM
(#248) - Added time columns to the
MultimodalTextBenchmark
(#253) - Added
CyclicEncoding
(#251) - Added
PositionalEncoding
(#249) - Added optional
col_names
argument inStypeEncoder
(#247) - Added
col_to_text_embedder_cfg
and useMultiEmbeddingTensor
fortext_embedded
(#246) - Added
col_encoder_dict
inStypeWiseFeatureEncoder
(#244) - Added
LinearEmbeddingEncoder
forembedding
stype (#243) - Added support for
torch_frame.text_embedded
inGBDT
(#239) - Support
Metric
inGBDT
(#236) - Added auto-inference of
stype
(#221) - Enabled
list
input inmulticategorical
stype (#224) - Added
Timestamp
stype (#212) - Added
multicategorical
toMultimodalTextBenchmark
(#208) - Added support for saving and loading of
TensorFrame
with complexstypes
. (#197) - Added
stype.embedding
(#194) - Added
TensorFrame
concatenation of complex stypes. (#190) - Added
text_tokenized
example (#174) - Added Cohere embedding example (#186)
- Added
AmazonFineFoodReviews
dataset and OpenAI embedding example (#182) - Added save and load logic for
FittableBaseTransform
(#178) - Added
MultiEmbeddingTensor
(#181, #193, #198, #199, #217) - Added
to_dense()
forMultiNestedTensor
(#170) - Added example for
multicategorical
stype (#162) - Added
sequence_numerical
stype (#159) - Added
MultiCategoricalEmbeddingEncoder
(#155) - Added advanced indexing for
MultiNestedTensor
(#150, #161, #163, #165) - Added
multicategorical
stype (#128, #151) - Added
MultiNestedTensor
(#149)
- Set
stype.embedding
as the parent ofstype.text_embedded
and unifiedstype.text_embedded
with its parent in :obj:tensor_frame
(#277) - Renamed
torch_frame.stype
module totorch_frame._stype
(#275) - Renamed
text_tokenized_cfg
intocol_to_text_tokenized_cfg
(#257) - Made
Trompt
output 2-dim embeddings inforward
- Renamed
text_embedder_cfg
intocol_to_text_embedder_cfg
- No manual passing of
in_channels
toLinearEmbeddingEncoder
forstype.text_embedded
(#222)
- Added basic
text_tokenized
(#157) - Added
Mercari
dataset (#123) - Added the model performance benchmark script (#114)
- Added
DataFrameBenchmark
(#107) - Added concat and equal ops for
TensorFrame
(#100) - Use ROC-AUC for binary classification in GBDT (#98)
- Infer
task_type
in dataset (#97) - Added
text_embedded
example (#95) - Added
MultimodalTextBenchmark
(#92, #117) - Renamed
x_dict
tofeat_dict
inTensorFrame
(#86) - Added
TabTransformer
example (#82) - Added
TabNet
example (#85) - Added dataset
tensorframe
andcol_stats
caching (#84) - Added
TabTransformer
(#74) - Added
TabNet
(#35) - Added text embedded stype, mapper and encoder. (#78)
- Added
ExcelFormer
example (#46) - Added support for inductive
DataFrame
toTensorFrame
transformation (#75) - Added
CatBoost
baseline and tunedCatBoost
example. (#73) - Added
na_strategy
as argument inStypeEncoder
. (#69) - Added
NAStrategy
class and impute NaN values inMutualInformationSort
. (#68) - Added
XGBoost
baseline and updated tunedXGBoost
example. (#57) - Added
CategoricalCatBoostEncoder
andMutualInformationSort
transforms needed by ExcelFromer (#52) - Added tutorial example script (#54)
- Added
ResNet
(#48) - Added
ExcelFormerEncoder
(#42) - Made
FTTransformer
takeTensorFrame
as input (#45) - Added
Tompt
example (#39) - Added
post_module
inStypeEncoder
(#43) - Added
FTTransformer
(#40, #41) - Added
ExcelFormer
(#26) - Added
Yandex
collections (#37) - Added
TabularBenchmark
collections (#33) - Added the
Bank Marketing
dataset (#34) - Added the
Mushroom
,Forest Cover Type
, andPoker Hand
datasets (#32) - Added
PeriodicEncoder
(#31) - Added
NaN
handling inStypeEncoder
(#28) - Added
LinearBucketEncoder
(#22) - Added
Trompt
(#25) - Added
TromptDecoder
(#24) - Added
TromptConv
(#23) - Added
StypeWiseFeatureEncoder
(#16) - Added indexing/shuffling and column select functionality in
Dataset
(#18, #19) - Added
Adult Census Income
dataset (#17) - Added column-level statistics and dataset materialization (#15)
- Added
FTTransformerConvs
(#12) - Added
DataLoader
capabilities (#11) - Added
TensorFrame.index_select
(#10) - Added
Dataset.to_tensor_frame
(#9) - Added base classes
TensorEncoder
,FeatureEncoder
,TableConv
,Decoder
(#5) - Added
TensorFrame
(#4) - Added
Titanic
dataset (#3) - Added
Dataset
base class (#3)