ExpFamilyPCA.jl is a Julia package for exponential family principal component analysis (EPCA), a versatile generalization of PCA designed to handle non-Gaussian data, enabling dimensionality reduction and data analysis across a wide variety of distributions (e.g., binary, count, and compositional data). It is designed for applications in machine learning (belief compression, text analysis), signal processing (denoising), and data science (sample debiasing, clustering, dimensionality reduction), but can be applied to other fields with diverse data types.
- Website: https://sisl.github.io/ExpFamilyPCA.jl/dev/
- Math: https://sisl.github.io/ExpFamilyPCA.jl/dev/math/intro/
- API Documentation: https://sisl.github.io/ExpFamilyPCA.jl/dev/api/
- Implements exponential family PCA (EPCA)
- Supports multiple exponential family distributions
- Flexible constructors for custom distributions
- Fast symbolic differentiation and optimization
- Numerically stable scientific computation
To install the package, use the Julia package manager. In the Julia REPL, type:
using Pkg; Pkg.add("ExpFamilyPCA")
The following distributions are supported:
Distribution | Description |
---|---|
BernoulliEPCA |
For binary data |
BinomialEPCA |
For count data with a fixed number of trials |
ContinuousBernoulliEPCA |
For probabilities between 0 and 1 |
GammaEPCA |
For positive continuous data |
GaussianEPCA |
Standard PCA for real-valued data |
NegativeBinomialEPCA |
For over-dispersed count data |
ParetoEPCA |
For heavy-tailed distributions |
PoissonEPCA |
For count and discrete distribution data |
WeibullEPCA |
For life data and survival analysis |
Each EPCA object supports the following methods:
fit!
: Trains the model and returns compressed training data.compress
: Compresses new input data.decompress
: Reconstructs original data from the compressed representation.
X = sample_from_poisson(n1, indim)
Y = sample_from_poisson(n2, indim)
epca = PoissonEPCA(indim, outdim)
X_compressed = fit!(epca, X)
Y_compressed = compress(epca, Y)
Y_reconstructed = decompress(epca, Y_compressed)
The sample_from_poisson
function is a placeholder for generating random Poisson-distributed data. It is not implemented in the code snippet to maintain clarity and focus on the core functionality of the example. If you wish to implement it, you can use the Distributions.jl
package. For instance, you could define it as:
using Distributions
function sample_from_poisson(n::Int, dim::Int)
d = Poisson()
rand(d, n, dim)
end
When working with custom distributions, certain specifications are often more convenient and computationally efficient than others. For example, inducing the gamma EPCA objective from the log-partition
In ExpFamilyPCA.jl
, we would write:
G(θ) = -log(-θ)
g(θ) = -1 / θ
gamma_epca = EPCA(indim, outdim, G, g, Val((:G, :g)); options = NegativeDomain())
A lengthier discussion of the EPCA
constructors and math is provided in the documentation.
Contributions are welcome! If you want to contribute, please fork the repository, create a new branch, and submit a pull request. Before contributing, please make sure to update tests as appropriate.