Skip to content

Commit

Permalink
Doc Update - whats new v1.0.2 + TimeSeriesByCategory (#1190)
Browse files Browse the repository at this point in the history
* Added RST pages

* updated doc for ensemble

* changed version + addressed comments
  • Loading branch information
mail4umar authored Mar 8, 2024
1 parent 8b9bd81 commit 8236e90
Show file tree
Hide file tree
Showing 5 changed files with 267 additions and 9 deletions.
54 changes: 54 additions & 0 deletions docs/source/machine_learning_vertica_time_series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,60 @@
Time Series
===============


Multi-Timeseries Model (Beta)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. currentmodule:: verticapy.machine_learning.vertica.tsa

.. autosummary::
:toctree: api/

ensemble.TimeSeriesByCategory

.. currentmodule:: verticapy.machine_learning.vertica.tsa.ensemble

**Methods:**

.. autosummary::
:toctree: api/

TimeSeriesByCategory.contour
TimeSeriesByCategory.deploySQL
TimeSeriesByCategory.does_model_exists
TimeSeriesByCategory.drop
TimeSeriesByCategory.export_models
TimeSeriesByCategory.features_importance
TimeSeriesByCategory.fit
TimeSeriesByCategory.get_attributes
TimeSeriesByCategory.get_match_index
TimeSeriesByCategory.get_params
TimeSeriesByCategory.get_plotting_lib
TimeSeriesByCategory.get_vertica_attributes
TimeSeriesByCategory.import_models
TimeSeriesByCategory.plot
TimeSeriesByCategory.predict
TimeSeriesByCategory.register
TimeSeriesByCategory.regression_report
TimeSeriesByCategory.report
TimeSeriesByCategory.score
TimeSeriesByCategory.set_params
TimeSeriesByCategory.summarize
TimeSeriesByCategory.to_binary
TimeSeriesByCategory.to_pmml
TimeSeriesByCategory.to_python
TimeSeriesByCategory.to_sql
TimeSeriesByCategory.to_tf


**Attributes:**

.. autosummary::
:toctree: api/

TimeSeriesByCategory.object_type


______

ARIMA
Expand Down
55 changes: 55 additions & 0 deletions docs/source/whats_new_v1_0_2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. _whats_new_v1_0_2:

===============
Version 1.0.2
===============

This minor release has some significant feature additions with other changes. Some salient ones are listed below:

Pipelines (Beta)
-----------------

VerticaPy now has **Pipelines**!

- ``Pipelines`` is a YAML-based configuration for defining machine learning workflows, simplifying the process of setting up and managing machine learning pipelines.
- For beginners, it provides an easy-to-learn alternative to Python and SQL reducing the initial barriers to entry for creating models.
- For more experienced users, it offers templating features to enhance modularity, minimize errors, and promote efficient code reuse in machine learning projects.


Performance
------------

- We have enhanced the QueryProfiler to improve its robustness. :py:func:`~verticapy.performance.vertica.QueryProfiler`.
- Introducing a completely new **Query Profiler Interface**, enabling users to navigate through various queries and access them without the need to re-enter all the code. All of this can be accomplished using only your mouse within Jupyter Notebook environments. For more information please look at :py:func:`~verticapy.performance.vertica.QueryProfilerInterface`.

These updates significantly enhance the accessibility, debugging, and enhancement capabilities of your queries.

OAuth Refresh Tokens
---------------------

- We have updated the connector to accept OAuth refresh tokens.
- Additioanlly we have added a ``prompt`` option for :py:func:`~verticapy.connection.new_connection`. This allows the user to enter the secrets discretly with a masked display.

Multi-TimeSeries (Beta)
-----------------------

We added a new Time Series class: ``TimeSeriesByCategory``. This allows the users to build multiple models based off on a category. The number of models created
are equal to the categories. This saves users time to create multiple models separately. For more inofrmation please see :py:func:`~verticapy.machine_learning.vertica.tsa.ensemble.TimeSeriesByCategory`.

Plots
------

- Two new plots have been added for plotly that were previously missing:

- :py:func:`~verticapy.machine_learning.vertica.decomposition.plot_scree`
- :py:func:`~verticapy.machine_learning.vertica.decomposition.plot_var`

Unit Tests
-----------

- We continue to shift our old tests to the new more robust format.

Examples
---------

- Most of the `examples <https://github.com/vertica/VerticaPy/tree/master/examples>`_ have been updated with the latest verticapy format.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

setuptools.setup(
name="verticapy",
version="1.0.1",
version="1.0.2",
author="Badr Ouali",
author_email="[email protected]",
url="https://github.com/vertica/VerticaPy",
Expand Down
2 changes: 1 addition & 1 deletion verticapy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
)
__url__: str = "https://github.com/vertica/verticapy/"
__license__: str = "Apache License, Version 2.0"
__version__: str = "1.0.1"
__version__: str = "1.0.2"
__iteration__: int = 1
__date__: str = "03082024"
__last_commit__: str = "7def2745ffa5bbca9c30b3b08f52dd85c7d9675f"
Expand Down
163 changes: 156 additions & 7 deletions verticapy/machine_learning/vertica/tsa/ensemble.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
See the License for the specific language governing
permissions and limitations under the License.
"""

import copy
from typing import Literal, Optional, Union

Expand Down Expand Up @@ -42,10 +43,156 @@ class TimeSeriesByCategory(TimeSeriesModelBase):
You should look at the source models to see entire
examples.
:py:class:`~verticapy.machine_learning.vertica.tsa.ARIMA`;
:py:class:`~verticapy.machine_learning.vertica.tsa.ARMA`;
:py:class:`~verticapy.machine_learning.vertica.tsa.AR`;
:py:class:`~verticapy.machine_learning.vertica.tsa.MA`;
.. important:: This is still Beta.
Parameters
----------
name: str, optional
Name of the model. The model is stored in the
database.
overwrite_model: bool, optional
If set to ``True``, training a
model with the same name as an
existing model overwrites the
existing model.
base_model: TimeSeriesModelBase
The user should provide a base model which will
be used for each category. It could be
- :py:class:`~verticapy.machine_learning.vertica.tsa.ARIMA`
- :py:class:`~verticapy.machine_learning.vertica.tsa.ARMA`
- :py:class:`~verticapy.machine_learning.vertica.tsa.AR`
- :py:class:`~verticapy.machine_learning.vertica.tsa.MA'
Attributes
----------
Many attributes are created
during the fitting phase.
distinct: list
This provides a sequential list of the categories
used to build the different models.
ts: str
The column name for time stamp.
y: str
The column name used for building the model.
_is_already_stored: bool
This tells us whether a model is stored in the Vertica
database.
_get_model_names: list
This returns the list of names of the models created.
Examples
--------
The following examples provide a
basic understanding of usage.
Initialization
^^^^^^^^^^^^^^
For this example, we will use
a subset of the amazon dataset.
.. code-block:: python
import verticapy.datasets as vpd
amazon_full = vpd.load_amazon()
.. raw:: html
:file: /project/data/VerticaPy/docs/figures/datasets_loaders_load_amazon.html
.. ipython:: python
:suppress:
from verticapy.datasets import load_amazon
amazon_full = load_amazon()
We can reduce the number of states for the sake
of ease in this example:
.. ipython:: python
amazon = amazon_full[(amazon_full["state"] == "PERNAMBUCO") | (amazon_full["state"] == "SERGIPE")]
Now we can setup a base model that will be
created for each unique state inside the dataset.
For this example, we use ARIMA.
.. ipython:: python
from verticapy.machine_learning.vertica.tsa import ARIMA
base_model = ARIMA(order = (2, 1, 2))
Finally we can now initiate our multiple models
in one go:
.. ipython:: python
from verticapy.machine_learning.vertica.tsa.ensemble import TimeSeriesByCategory
model = TimeSeriesByCategory(base_model = base_model)
Model Fitting
^^^^^^^^^^^^^^^
We can now fit the model:
.. ipython:: python
:okwarning:
model.fit(amazon, ts = "date", y = "number", by = "state")
.. important::
To train a model, you can directly use the
:py:class:`~vDataFrame` or the name of the
relation stored in the database. The test
set is optional and is only used to compute
the test metrics. In :py:mod:`verticapy`, we
don't work using ``X`` matrices and ``y``
vectors. Instead, we work directly with lists
of predictors and the response name.
Plots
^^^^^^
We can conveniently plot the
predictions on a line plot to
observe the efficacy of our
model. We need to provide the
``idx`` which represents the model number.
.. code-block:: python
model.plot(idx = 0, npredictions = 5)
.. ipython:: python
:suppress:
:okwarning:
vp.set_option("plotting_lib", "plotly")
fig = model.plot(idx = 0, npredictions = 5)
fig.write_html("/project/data/VerticaPy/docs/figures/machine_learning_vertica_tsa_ensemble_timeseriesbycategory_1.html")
.. raw:: html
:file: /project/data/VerticaPy/docs/figures/machine_learning_vertica_tsa_ensemble_timeseriesbycategory_1.html
.. note::
You can find out the name of the category by
the ``distinct`` attribute. The sequential list of
categories correspond to ``idx = 0, 1 ...``.
``model.distinct``.
"""

# Properties.
Expand Down Expand Up @@ -394,9 +541,11 @@ def deploySQL(
all_predictions += [
extract_subquery(
model.predict(
vdf=None
if isinstance(vdf, NoneType)
else vdf.search(f"{self.by} = '{category}'"),
vdf=(
None
if isinstance(vdf, NoneType)
else vdf.search(f"{self.by} = '{category}'")
),
ts=ts,
y=y,
start=start,
Expand Down

0 comments on commit 8236e90

Please sign in to comment.