-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
414cfe5
commit f648051
Showing
4 changed files
with
138 additions
and
186 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,182 +1,108 @@ | ||
Metadata-Version: 2.1 | ||
Name: EDAspy | ||
Version: 0.1.2 | ||
Summary: Estimation of Distribution Algorithms | ||
Version: 1.0.0 | ||
Summary: EDAspy is a Python package that implements Estimation of Distribution Algorithms. EDAspy allows toeither use already existing implementations or customize the EDAs baseline easily building it bymodules so new research can be easily developed. It also has several benchmarks for comparisons. | ||
Home-page: https://github.com/VicentePerezSoloviev/EDAspy | ||
Download-URL: https://github.com/VicentePerezSoloviev/EDAspy/archive/1.0.0.tar.gz | ||
Author: Vicente P. Soloviev | ||
Author-email: [email protected] | ||
License: LGPLv2+ | ||
Download-URL: https://github.com/VicentePerezSoloviev/EDAspy/archive/0.1.2.tar.gz | ||
Description: [![PyPI version fury.io](https://badge.fury.io/py/EDAspy.svg)](https://pypi.python.org/pypi/EDAspy/) | ||
[![PyPI license](https://img.shields.io/pypi/l/EDAspy.svg)](https://pypi.python.org/pypi/EDAspy/) | ||
[![PyPI license](https://img.shields.io/pypi/dm/EDAspy)](https://pypi.python.org/pypi/EDAspy/) | ||
|
||
# EDAspy | ||
|
||
## Description | ||
|
||
In this package some Estimation of Distribution Algorithms (EDAs) are implemented. EDAs are a type of evolutionary algorithms. Depending on the type of EDA, different dependencies among the variables can be considered. | ||
|
||
Three EDAs have been implemented: | ||
* Binary univariate EDA. It can be used as a simple example of EDA, or to use it for feature selection. | ||
* Continuous univariate EDA. | ||
* Continuous multivariate EDA. | ||
|
||
## Examples | ||
|
||
#### Binary univariate EDA | ||
It can be used as a simple example of EDA, or to use it for feature selection. The cost function to optimize is the metric of the model. An example is shown. | ||
```python | ||
from EDAspy.optimization.univariate import EDA_discrete as EDAd | ||
import pandas as pd | ||
|
||
def check_solution_in_model(dictionary): | ||
MAE = prediction_model(dictionary) | ||
return MAE | ||
|
||
vector = pd.DataFrame(columns=['param1', 'param2', 'param3']) | ||
vector.loc[0] = 0.5 | ||
|
||
EDA = EDAd(MAX_IT=200, DEAD_ITER=20, SIZE_GEN=30, ALPHA=0.7, vector=vector, | ||
cost_function=check_solution_in_model, aim='minimize') | ||
|
||
bestcost, solution, history = EDA.run(output=True) | ||
print(bestcost) | ||
print(solution) | ||
print(history) | ||
``` | ||
|
||
The example is an implementation for feature selection (FS) for a prediction model (prediction_model). prediction_model depends on some variables. The prediction model, receives a dictionary with keys 'param_N' (N from 1 to number of parameters) and values 1 or 0 depending if that variables should be included or not. The model returns a MAE which we intend to minimize. | ||
|
||
The EDA receives as input the maximum number of iterations, the number of iterations with no best global cost improvement, the generations size, the percentage of generations to select as best individuals, the initial vector of probabilities for each variable to be used, the cost functions to optimize, and the aim ('minimize' or 'maximize') of the optimizer. | ||
|
||
Vector probabilities are usually initialized to 0.5 to start from an equilibrium situation. EDA returns the best cost found, the best combination of variables, and the history os costs found to be plotted. | ||
|
||
#### Continuous univariate EDA | ||
|
||
This EDA is used when some continuous parameters must be optimized. | ||
```python | ||
from EDAspy.optimization.univariate import EDA_continuous as EDAc | ||
import pandas as pd | ||
import numpy as np | ||
|
||
wheights = [20,10,-4] | ||
|
||
def cost_function(dictionary): | ||
function = wheights[0]*dictionary['param1']**2 + wheights[1]*(np.pi/dictionary['param2']) - 2 - wheights[2]*dictionary['param3'] | ||
if function < 0: | ||
return 9999999 | ||
return function | ||
|
||
vector = pd.DataFrame(columns=['param1', 'param2', 'param3']) | ||
vector['data'] = ['mu', 'std', 'min', 'max'] | ||
vector = vector.set_index('data') | ||
vector.loc['mu'] = [5, 8, 1] | ||
vector.loc['std'] = 20 | ||
vector.loc['min'] = 0 | ||
vector.loc['max'] = 100 | ||
|
||
EDA = EDAc(SIZE_GEN=40, MAX_ITER=200, DEAD_ITER=20, ALPHA=0.7, vector=vector, | ||
aim='minimize', cost_function=cost_function) | ||
bestcost, params, history = EDA.run() | ||
print(bestcost) | ||
print(params) | ||
print(history) | ||
``` | ||
|
||
In this case, the aim is to optimize a cost function which we want to minimize. The three parameters to optimize are continuous. This EDA must be initialized with some initial values (mu), and an initial range to search (std). Optionally, a minimum and a maximum can be specified. | ||
|
||
As in the binary EDA, the best cost found, the solution and the cost evolution is returned. | ||
|
||
#### Continuous multivariate EDA | ||
|
||
In this case, dependencies among the variables are considered and managed with a Gaussian Bayesian Network. This EDA must be initialized with historical records in order to try to find the optimum result. A parameter (beta) is included to control the influence of the historical records in the final solution. Some of the variables can be evidences (fixed values for which we want to find the optimum of the other variables). | ||
The optimizer will find the optimum values of the non-evidence-variables based on the value of the evidences. This is widely used in problems where dependencies among variables must be considered. | ||
|
||
```python | ||
from EDAspy.optimization.multivariate import EDA_multivariate as EDAm | ||
import pandas as pd | ||
|
||
blacklist = pd.DataFrame(columns=['from', 'to']) | ||
aux = {'from': 'param1', 'to': 'param2'} | ||
blacklist = blacklist.append(aux, ignore_index=True) | ||
|
||
data = pd.read_csv(path_CSV) # columns param1 ... param5 | ||
evidences = [['param1', 2.0],['param5', 6.9]] | ||
|
||
def cost_function(dictionary): | ||
return dictionary['param1'] + dictionary['param2'] + dictionary['param3'] + dictionary['param4'] + dictionary['param5'] | ||
|
||
EDAm = EDAm(MAX_ITER=200, DEAD_ITER=20, data=data, ALPHA=0.7, BETA=0.4, cost_function=cost_function, | ||
evidences=evidences, black_list=blacklist, n_clusters=6, cluster_vars=['param1', 'param5']) | ||
output = EDAm.run(output=True) | ||
|
||
print('BEST', output.best_cost_global) | ||
``` | ||
This is the most complex EDA implemented. Bayesian networks are used to represent an abstraction of the search space of each iteration, where new individuals are sampled. As a graph is a representation with nodes and arcs, some arcs can be forbidden by the black list (pandas dataframe with the forbidden arcs). | ||
|
||
In this case the cost function is a simple sum of the parameters. The evidences are variables that have fixed values and are not optimized. In this problem, the output would be the optimum value of the parameters which are not in the evidences list. | ||
Due to the evidences, to help the structure learning algorithm to find the arcs, a clustering by the similar values is implemented. Thus, the number of clusters is an input, as well as the variables that are considered in the clustering. | ||
|
||
In this case, the output is the self class that can be saved as a pickle in order to explore the attributes. One of the attributes is the optimum structure of the optimum generation, from which the structure can be plotted and observe the dependencies among the variables. The function to plot the structure is the following: | ||
```python | ||
from EDAspy.optimization.multivariate import print_structure | ||
print_structure(structure=structure, var2optimize=['param2', 'param3', 'param4'], evidences=['param1', 'param5']) | ||
``` | ||
|
||
![Structure praph plot](/structure.PNG "Structure of the optimum generation found by the EDA") | ||
|
||
#### Another Continuous multivariate EDA approach | ||
|
||
In this EDA approach, new individuals are sampled from a multivariate normal distribution. Evidences are not allowed in the optimizer. If desired, the previous approach should be used. | ||
The EDA is initialized, as in the univariate continuous EDA, with univariate mus and sigma for the variables. In the execution, a multivariate gaussian is built to sample from it. As it is multivariate, correlation among variables is considered. | ||
|
||
```python | ||
import pandas as pd | ||
from EDAspy.optimization.multivariate import EDA_multivariate_gaussian as EDAmg | ||
|
||
|
||
def cost_function(dictionary): | ||
suma = dictionary['param1'] + dictionary['param2'] | ||
if suma < 0: | ||
return 999999999 | ||
return suma | ||
|
||
mus = pd.DataFrame(columns=['param1', 'param2']) | ||
mus.loc[0] = [10, 8] | ||
|
||
sigma = pd.DataFrame(columns=['param1', 'param2']) | ||
sigma.loc[0] = 5 | ||
|
||
EDAmulti = EDAmg(SIZE_GEN=40, MAX_ITER=1000, DEAD_ITER=50, ALPHA=0.6, aim='minimize', | ||
cost_function=cost_function, mus=mus, sigma=sigma) | ||
|
||
bestcost, params, history = EDAmulti.run(output=True) | ||
print(bestcost) | ||
print(params) | ||
print(history) | ||
``` | ||
|
||
The cost function to optimize is the minimization of two parameter sum. Both parameters are continuous, and to be initialized two pandas dataframes are needed: one with mus and another with sigmas (diagonal of the covariance matrix) | ||
|
||
The EDA returns the best cost, the combination and the history of costs if wanted to be plotted. | ||
|
||
## Getting started | ||
|
||
#### Prerequisites | ||
R must be installed to use the multivariate EDA with Bayesian networks, with the following installed libraries: c("bnlearn", "dbnR", "data.table") | ||
To manage R from python, rpy2 package must also be installed. | ||
|
||
#### Installing | ||
``` | ||
pip install EDAspy | ||
``` | ||
|
||
Keywords: EDA,estimation,bayesian,evolutionary,algorithm,optimization | ||
Platform: UNKNOWN | ||
License: bsd-3-clause | ||
Keywords: EDA,estimation,bayesian,evolutionary,algorithm,optimization,time_series,feature,selection,semiparametric,Gaussian | ||
Classifier: Development Status :: 5 - Production/Stable | ||
Classifier: Programming Language :: Python :: 3.6 | ||
Classifier: License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+) | ||
Classifier: Programming Language :: Python :: 3 | ||
Classifier: License :: OSI Approved :: BSD 3-Clause License | ||
Classifier: Operating System :: OS Independent | ||
Requires-Python: >=3.6 | ||
Requires-Python: >=3.0 | ||
Description-Content-Type: text/markdown | ||
License-File: LICENSE | ||
|
||
[![PyPI](https://img.shields.io/pypi/v/edaspy)](https://pypi.python.org/pypi/EDAspy/) | ||
[![PyPI license](https://img.shields.io/pypi/l/EDAspy.svg)](https://pypi.python.org/pypi/EDAspy/) | ||
[![Downloads](https://static.pepy.tech/personalized-badge/edaspy?period=total&units=none&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/edaspy) | ||
[![Documentation Status](https://readthedocs.org/projects/edaspy/badge/?version=latest)](https://edaspy.readthedocs.io/en/latest/?badge=latest) | ||
|
||
# EDAspy | ||
|
||
## Introduction | ||
|
||
EDAspy presents some implementations of the Estimation of Distribution Algorithms (EDAs). EDAs are a type of | ||
evolutionary algorithms. Depending on the type of the probabilistic model embedded in the EDA, and the type of | ||
variables considered, we will use a different EDA implementation. | ||
|
||
The pseudocode of EDAs is the following: | ||
|
||
1. Random initialization of the population. | ||
|
||
2. Evaluate each individual of the population. | ||
|
||
3. Select the top best individuals according to cost function evaluation. | ||
|
||
4. Learn a probabilistic model from the best individuals selected. | ||
|
||
5. Sampled another population. | ||
|
||
6. If stopping criteria is met, finish; else, go to 2. | ||
|
||
EDAspy allows to create a custom version of the EDA. Using the modular probabilistic models and the initializators, this can be embedded into the EDA baseline and used for different purposes. If this fits you, take a look on the examples section to the EDACustom example. | ||
|
||
EDAspy also incorporates a set of benchmarks in order to compare the algorithms trying to minimize these cost functions. | ||
|
||
The following implementations are available in EDAspy: | ||
|
||
* UMDAd: Univariate Marginal Distribution Algorithm binary. It can be used as a simple example of EDA where the variables are binary and there are not dependencies between variables. Some usages include feature selection, for example. | ||
|
||
|
||
* UMDAc: Univariate Marginal Distribution Algorithm continuous. In this EDA all the variables assume a Gaussian distribution and there are not dependencies considered between the variables. Some usages include hyperparameter optimization, for example. | ||
|
||
|
||
* EGNA: Estimation of Gaussian Distribution Algorithm. This is a complex implementation in which dependencies between the variables are considered during the optimization. In each iteration, a Gaussian Bayesian network is learned and sampled. The variables in the model are assumed to be Gaussian and also de dependencies between them. This implementation is focused in continuous optimization. | ||
|
||
|
||
* EMNA: Estimation of Multivariate Normal Algorithm. This is a similar implementation to EGNA, in which instead of using a Gaussian Bayesian network, a multivariate Gaussian distribution is iteratively learned and sampled. As in EGNA, the dependencies between variables are considered and assumed to be linear Gaussian. This implementation is focused in continuous optimization. | ||
|
||
|
||
* Categorical EDA. In this implementation we consider some independent categorical variables. Some usages include portfolio optimization, for exampled. | ||
|
||
## Examples | ||
|
||
Some examples are available in https://github.com/VicentePerezSoloviev/EDAspy/tree/master/notebooks | ||
|
||
## Getting started | ||
|
||
For installing EDAspy from Pypi execute the following command using pip: | ||
|
||
```bash | ||
pip install EDAspy | ||
``` | ||
|
||
## Build from Source | ||
|
||
### Prerequisites | ||
|
||
- Python 3.6, 3.7, 3.8 or 3.9. | ||
- Pybnesian, numpy, pandas. | ||
|
||
### Building | ||
|
||
Clone the repository: | ||
|
||
```bash | ||
git clone https://github.com/VicentePerezSoloviev/EDAspy.git | ||
cd EDAspy | ||
git checkout v1.0.0 # You can checkout a specific version if you want | ||
python setup.py install | ||
``` | ||
## Testing | ||
|
||
The library contains tests that can be executed using `pytest <https://docs.pytest.org/>`_. Install it using | ||
pip: | ||
|
||
```bash | ||
pip install pytest | ||
``` | ||
|
||
Run the tests with: | ||
|
||
```bash | ||
pytest | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,44 @@ | ||
LICENSE | ||
README.md | ||
setup.py | ||
EDAspy/__init__.py | ||
EDAspy.egg-info/PKG-INFO | ||
EDAspy.egg-info/SOURCES.txt | ||
EDAspy.egg-info/dependency_links.txt | ||
EDAspy.egg-info/requires.txt | ||
EDAspy.egg-info/top_level.txt | ||
EDAspy/benchmarks/__init__.py | ||
EDAspy/benchmarks/binary.py | ||
EDAspy/benchmarks/continuous.py | ||
EDAspy/optimization/__init__.py | ||
EDAspy/optimization/multivariate/EDA_multivariate.py | ||
EDAspy/optimization/multivariate/EDA_multivariate_gaussian.py | ||
EDAspy/optimization/multivariate/__BayesianNetwork.py | ||
EDAspy/optimization/multivariate/__clustering.py | ||
EDAspy/optimization/eda.py | ||
EDAspy/optimization/eda_result.py | ||
EDAspy/optimization/custom/__init__.py | ||
EDAspy/optimization/custom/eda_custom.py | ||
EDAspy/optimization/custom/initialization_models/__init__.py | ||
EDAspy/optimization/custom/initialization_models/_generation_init.py | ||
EDAspy/optimization/custom/initialization_models/multi_gauss_gininit.py | ||
EDAspy/optimization/custom/initialization_models/uni_bin_geninit.py | ||
EDAspy/optimization/custom/initialization_models/uni_gauss_geninit.py | ||
EDAspy/optimization/custom/initialization_models/uniform_geninit.py | ||
EDAspy/optimization/custom/probabilistic_models/__init__.py | ||
EDAspy/optimization/custom/probabilistic_models/_probabilistic_model.py | ||
EDAspy/optimization/custom/probabilistic_models/gaussian_bayesian_network.py | ||
EDAspy/optimization/custom/probabilistic_models/multivariate_gaussian.py | ||
EDAspy/optimization/custom/probabilistic_models/univariate_binary.py | ||
EDAspy/optimization/custom/probabilistic_models/univariate_gaussian.py | ||
EDAspy/optimization/multivariate/__init__.py | ||
EDAspy/optimization/multivariate/__matrix.py | ||
EDAspy/optimization/multivariate/egna.py | ||
EDAspy/optimization/multivariate/emna.py | ||
EDAspy/optimization/univariate/__init__.py | ||
EDAspy/optimization/univariate/continuous.py | ||
EDAspy/optimization/univariate/discrete.py | ||
EDAspy/tests/__init__.py | ||
EDAspy/tests/test_EDA_multivariate_gaussian.py | ||
EDAspy/tests/test___matrix.py | ||
EDAspy/tests/test_continuous.py | ||
EDAspy/tests/test_discrete.py | ||
EDAspy/optimization/univariate/umda_binary.py | ||
EDAspy/optimization/univariate/umda_continuous.py | ||
EDAspy/timeseries/TS_transformations.py | ||
EDAspy/timeseries/TransformationsFeatureSelection.py | ||
EDAspy/timeseries/__init__.py | ||
tests/__init__.py | ||
tests/test_egna.py | ||
tests/test_emna.py | ||
tests/test_geninit.py | ||
tests/test_umdac.py | ||
tests/test_umdad.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
pandas>=1.2.0 | ||
numpy>1.15.0 | ||
pybnesian>=0.3.4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
EDAspy | ||
tests |