diff --git a/docs/source/examples_business_base_station.rst b/docs/source/examples_business_base_station.rst index daa9e732b..926d9df4e 100644 --- a/docs/source/examples_business_base_station.rst +++ b/docs/source/examples_business_base_station.rst @@ -4,8 +4,8 @@ Base Station Positions ================================== This example uses the Telecom Dataset, provided by Shanghai Telecom, to predict the optimal positions for base radio stations. -This dataset contains more than 7.2 million records about people's -Internet access through 3,233 base stations from 9,481 mobile phones +This dataset contains more than ``7.2`` million records about people's +Internet access through ``3,233`` base stations from ``9,481`` mobile phones over period of six months. The dataset can be found `here `_. It consists of: diff --git a/docs/source/examples_business_churn.rst b/docs/source/examples_business_churn.rst index f5816d081..15546a536 100644 --- a/docs/source/examples_business_churn.rst +++ b/docs/source/examples_business_churn.rst @@ -23,7 +23,7 @@ This example uses the following version of VerticaPy: vp.__version__ -Connect to Vertica. This example uses an existing connection called "VerticaDSN". +Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. You can skip the below cell if you already have an established connection. diff --git a/docs/source/examples_business_insurance.rst b/docs/source/examples_business_insurance.rst index 3f8d17d77..2333b6a59 100644 --- a/docs/source/examples_business_insurance.rst +++ b/docs/source/examples_business_insurance.rst @@ -30,7 +30,7 @@ This example uses the following version of VerticaPy: vp.__version__ -Connect to Vertica. This example uses an existing connection called "VerticaDSN". +Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. You can skip the below cell if you already have an established connection. diff --git a/docs/source/examples_business_spam.rst b/docs/source/examples_business_spam.rst index f9c6fddb0..e07ee482a 100644 --- a/docs/source/examples_business_spam.rst +++ b/docs/source/examples_business_spam.rst @@ -21,7 +21,7 @@ This example uses the following version of VerticaPy: vp.__version__ -Connect to Vertica. This example uses an existing connection called "VerticaDSN". +Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. You can skip the below cell if you already have an established connection. diff --git a/docs/source/examples_learn_iris.rst b/docs/source/examples_learn_iris.rst index 43e152904..67b8c657e 100644 --- a/docs/source/examples_learn_iris.rst +++ b/docs/source/examples_learn_iris.rst @@ -24,7 +24,7 @@ This example uses the following version of VerticaPy: vp.__version__ -Connect to Vertica. This example uses an existing connection called "VerticaDSN". +Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. You can skip the below cell if you already have an established connection. diff --git a/docs/source/examples_learn_pokemon.rst b/docs/source/examples_learn_pokemon.rst index ac7b2ffbc..c4cc46150 100644 --- a/docs/source/examples_learn_pokemon.rst +++ b/docs/source/examples_learn_pokemon.rst @@ -38,7 +38,7 @@ This example uses the following version of VerticaPy: vp.__version__ -Connect to Vertica. This example uses an existing connection called "VerticaDSN". +Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. You can skip the below cell if you already have an established connection. diff --git a/docs/source/examples_learn_titanic.rst b/docs/source/examples_learn_titanic.rst index 06c37ce9e..268847de2 100644 --- a/docs/source/examples_learn_titanic.rst +++ b/docs/source/examples_learn_titanic.rst @@ -18,7 +18,7 @@ This example uses the following version of VerticaPy: vp.__version__ -Connect to Vertica. This example uses an existing connection called "VerticaDSN". +Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. You can skip the below cell if you already have an established connection. @@ -69,9 +69,9 @@ Let's explore the data by displaying descriptive statistics of all the columns. .. raw:: html :file: SPHINX_DIRECTORY/figures/examples_titanic_table_describe.html -The columns "body" (passenger ID), "home.dest" (passenger origin/destination), "embarked" (origin port) and "ticket" (ticket ID) shouldn't influence survival, so we can ignore these. +The columns ``body`` (passenger ID), ``home.dest`` (passenger origin/destination), ``embarked`` (origin port) and ``ticket`` (ticket ID) shouldn't influence survival, so we can ignore these. -Let's focus our analysis on the columns "name" and "cabin". We'll begin with the passengers' names. +Let's focus our analysis on the columns ``name`` and ``cabin``. We'll begin with the passengers' names. .. code-block:: python @@ -217,7 +217,7 @@ The "sibsp" column represents the number of siblings for each passenger, while t titanic["family_size"] = titanic["parch"] + titanic["sibsp"] + 1 -Let's move on to outliers. We have several tools for locating outliers (:py:mod:`~verticapy.machine_learning.vertica.LocalOutlierFactor`, :py:mod:`~verticapy.machine_learning.vertica.DBSCAN`, :py:mod:`~verticapy.machine_learning.vertica.cluster.KMeans`...), but we'll just use winsorization in this example. Again, "fare" has many outliers, so we'll start there. +Let's move on to outliers. We have several tools for locating outliers (:py:mod:`~verticapy.machine_learning.vertica.LocalOutlierFactor`, :py:mod:`~verticapy.machine_learning.vertica.cluster.DBSCAN`, :py:mod:`~verticapy.machine_learning.vertica.cluster.KMeans`...), but we'll just use winsorization in this example. Again, "fare" has many outliers, so we'll start there. .. code-block:: python diff --git a/docs/source/examples_understand_africa_education.rst b/docs/source/examples_understand_africa_education.rst index 3fd5ac61a..70924ed4d 100644 --- a/docs/source/examples_understand_africa_education.rst +++ b/docs/source/examples_understand_africa_education.rst @@ -628,7 +628,7 @@ The same applies to the regions. Let's look at student age. .. code-block:: python - africa["PAGE"].bar( + africa["PAGE"].barh( method = "50%", of = "pred_zmalocp", max_cardinality = 50, @@ -639,7 +639,7 @@ The same applies to the regions. Let's look at student age. :okwarning: :okexcept: - fig = africa["PAGE"].bar( + fig = africa["PAGE"].barh( method = "50%", of = "pred_zmalocp", max_cardinality = 50, diff --git a/docs/source/performance_vertica.rst b/docs/source/performance_vertica.rst index dfd5bb808..1c7359f75 100644 --- a/docs/source/performance_vertica.rst +++ b/docs/source/performance_vertica.rst @@ -44,6 +44,8 @@ Query Profiler QueryProfiler.previous QueryProfiler.step QueryProfiler.to_html + QueryProfiler.get_activity_time + QueryProfiler.get_qplan_explain Query Profiler Interface ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -84,6 +86,21 @@ Query Profiler Interface QueryProfilerInterface.set_position QueryProfilerInterface.step QueryProfilerInterface.to_html + QueryProfilerInterface.client_data_test + QueryProfilerInterface.clock_exec_time_test + QueryProfilerInterface.exec_time_test + QueryProfilerInterface.get_activity_time + QueryProfilerInterface.get_qplan_explain + QueryProfilerInterface.get_qsteps_ + QueryProfilerInterface.get_resource_acquisition + QueryProfilerInterface.import_profile + QueryProfilerInterface.pool_queue_wait_time_test + QueryProfilerInterface.qsteps_clicked + QueryProfilerInterface.query_events_test + QueryProfilerInterface.segmentation_test + QueryProfilerInterface.update_cpu_time + QueryProfilerInterface.update_qsteps + QueryProfilerInterface.update_step Query Profiler Comparison ^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -102,4 +119,6 @@ Query Profiler Comparison .. autosummary:: :toctree: api/ - QueryProfilerComparison.get_qplan_tree \ No newline at end of file + QueryProfilerComparison.get_qplan_tree + QueryProfilerComparison.sync_all_checkboxes + QueryProfilerComparison.unsync_all_checkboxes \ No newline at end of file diff --git a/docs/source/user_guide_data_preparation_outliers.rst b/docs/source/user_guide_data_preparation_outliers.rst index 1168aef5b..229926cab 100644 --- a/docs/source/user_guide_data_preparation_outliers.rst +++ b/docs/source/user_guide_data_preparation_outliers.rst @@ -140,7 +140,7 @@ Generally, you can identify global outliers with the ``Z-Score``. We'll consider .. raw:: html :file: SPHINX_DIRECTORY/figures/ug_dp_plot_outliers_5.html -Other techniques like :py:mod:`~verticapy.machine_learning.vertica.DBSCAN` or local outlier factor (``LOF``) can be to used to check other data points for outliers. +Other techniques like :py:mod:`~verticapy.machine_learning.vertica.cluster.DBSCAN` or local outlier factor (``LOF``) can be to used to check other data points for outliers. .. code-block:: python diff --git a/docs/source/user_guide_full_stack_complex_data_vmap.rst b/docs/source/user_guide_full_stack_complex_data_vmap.rst index f0ab99cd8..6e9eb13e5 100644 --- a/docs/source/user_guide_full_stack_complex_data_vmap.rst +++ b/docs/source/user_guide_full_stack_complex_data_vmap.rst @@ -15,7 +15,7 @@ In order to work with complex data types in VerticaPy, you'll need to complete t import verticapy as vp -- Connect to Vertica. This example uses an existing connection called "VerticaDSN". For details on how to create a connection, see the :ref:`connection` tutorial. +- Connect to Vertica. This example uses an existing connection called ``VerticaDSN`` . For details on how to create a connection, see the :ref:`connection` tutorial. .. note:: You can skip the below cell if you already have an established connection. diff --git a/docs/source/user_guide_full_stack_linear_regression.rst b/docs/source/user_guide_full_stack_linear_regression.rst index 82a9ae04d..2b5b3fd08 100644 --- a/docs/source/user_guide_full_stack_linear_regression.rst +++ b/docs/source/user_guide_full_stack_linear_regression.rst @@ -292,7 +292,7 @@ We can use a cross-validation to test our model. .. raw:: html :file: SPHINX_DIRECTORY/figures/ug_fs_table_lr_9.html -The model isn't bad. We're just using a few variables to get a median absolute error of 47; that is, our score has a distance of 47 from the true value. This seems high, but if we keep in mind that the final score is over 1000, our predictions are quite good. +The model isn't bad. We're just using a few variables to get a median absolute error of ``47``; that is, our score has a distance of ``47`` from the true value. This seems high, but if we keep in mind that the final score is over ``1000``, our predictions are quite good. Let's compare the importance of our features. @@ -388,7 +388,7 @@ We see a high heteroscedasticity, indicating that we can't trust the ``p-value`` model.coef_ -Let's look at the model's analysis of variance (ANOVA) table. +Let's look at the model's analysis of variance (``ANOVA``) table. .. code-block:: ipython diff --git a/docs/source/user_guide_full_stack_to_json.rst b/docs/source/user_guide_full_stack_to_json.rst index f872732e2..0b5be40a5 100644 --- a/docs/source/user_guide_full_stack_to_json.rst +++ b/docs/source/user_guide_full_stack_to_json.rst @@ -1,8 +1,8 @@ .. _user_guide.full_stack.to_json: -========================= -Example: XGBoost.to_json -========================= +================ +XGBoost.to_json +================ Connect to Vertica -------------------- @@ -160,7 +160,7 @@ Evaluate the model with :py:func:`~verticapy.machine_learning.vertica.ensemble.X .. raw:: html :file: SPHINX_DIRECTORY/figures/ug_fs_to_json_report.html -Use to_json() to export the model to a JSON file. If you omit a filename, VerticaPy prints the model: +Use :py:func:`~verticapy.machine_learning.vertica.ensemble.XGBClassifier.to_json` to export the model to a JSON file. If you omit a filename, VerticaPy prints the model: .. ipython:: python @@ -194,7 +194,7 @@ This exported model can be used with the Python XGBoost API right away, and expo result = result.sum() / len(result); assert result == pytest.approx(0.0, abs = 1.0E-14) -For multiclass classifiers, the probabilities returned by the VerticaPy and the exported model may differ slightly because of normalization; while Vertica uses multinomial logistic regression, XGBoost Python uses Softmax. Again, this difference does not affect the model's final predictions. Categorical predictors must be encoded. +For multiclass classifiers, the probabilities returned by the VerticaPy and the exported model may differ slightly because of normalization; while Vertica uses multinomial logistic regression, ``XGBoost`` Python uses Softmax. Again, this difference does not affect the model's final predictions. Categorical predictors must be encoded. Clean the Example Environment @@ -211,8 +211,8 @@ Drop the ``xgb_to_json`` schema, using CASCADE to drop any database objects stor Conclusion ----------- -VerticaPy lets you to create, train, evaluate, and export Vertica machine learning models. There are some notable nuances when importing a Vertica XGBoost model into Python XGBoost, but these do not affect the accuracy of the model or its predictions: +VerticaPy lets you to create, train, evaluate, and export Vertica machine learning models. There are some notable nuances when importing a Vertica ``XGBoost`` model into Python ``XGBoost``, but these do not affect the accuracy of the model or its predictions: Some information computed during the training phase may not be stored (e.g. ``sum_hessian`` and ``loss_changes``). -The exact probabilities of multiclass classifiers in a Vertica model may differ from those in Python, but bot ``h`` will make the same predictions. Python XGBoost does not support categorical predictors, so you must encode them before training the model in VerticaPy. \ No newline at end of file +The exact probabilities of multiclass classifiers in a Vertica model may differ from those in Python, but bot ``h`` will make the same predictions. Python ``XGBoost`` does not support categorical predictors, so you must encode them before training the model in VerticaPy. \ No newline at end of file diff --git a/docs/source/user_guide_machine_learning_clustering.rst b/docs/source/user_guide_machine_learning_clustering.rst index 76c60e881..ec1842612 100644 --- a/docs/source/user_guide_machine_learning_clustering.rst +++ b/docs/source/user_guide_machine_learning_clustering.rst @@ -57,7 +57,7 @@ While there aren't any real metrics for evaluating unsupervised models, metrics print(model.get_vertica_attributes("metrics")["metrics"][0]) -You can add the prediction to your vDataFrame. +You can add the prediction to your :py:mod:`~verticapy.vDataFrame`. .. code-block:: diff --git a/docs/source/user_guide_machine_learning_introduction.rst b/docs/source/user_guide_machine_learning_introduction.rst index c1413c4bc..5a0d06a48 100644 --- a/docs/source/user_guide_machine_learning_introduction.rst +++ b/docs/source/user_guide_machine_learning_introduction.rst @@ -100,7 +100,7 @@ When we have more than two categories, we use the expression ``Multiclass Classi Unsupervised Learning ---------------------- -These algorithms are to used to segment the data (:py:mod:`~verticapy.machine_learning.vertica.cluster.KMeans`, :py:mod:`~verticapy.machine_learning.vertica.DBSCAN`, etc.) or to detect anomalies (:py:mod:`~verticapy.machine_learning.vertica.LocalOutlierFactor`, ``Z-Score`` Techniques...). In particular, they're useful for finding patterns in data without labels. For example, let's use a :py:mod:`~verticapy.machine_learning.vertica.cluster.KMeans` algorithm to create different clusters on the Iris dataset. Each cluster will represent a flower's species. +These algorithms are to used to segment the data (:py:mod:`~verticapy.machine_learning.vertica.cluster.KMeans`, :py:mod:`~verticapy.machine_learning.vertica.cluster.DBSCAN`, etc.) or to detect anomalies (:py:mod:`~verticapy.machine_learning.vertica.LocalOutlierFactor`, ``Z-Score`` Techniques...). In particular, they're useful for finding patterns in data without labels. For example, let's use a :py:mod:`~verticapy.machine_learning.vertica.cluster.KMeans` algorithm to create different clusters on the Iris dataset. Each cluster will represent a flower's species. .. code-block:: python diff --git a/docs/source/user_guide_machine_learning_model_tracking.rst b/docs/source/user_guide_machine_learning_model_tracking.rst index 4161b039d..0afa9b15d 100644 --- a/docs/source/user_guide_machine_learning_model_tracking.rst +++ b/docs/source/user_guide_machine_learning_model_tracking.rst @@ -103,16 +103,14 @@ So far we have only added three models to the experiment, but we could add many top_model = my_experiment_1.load_best_model(metric = "auc") -The experiment object facilitates not only model tracking but also makes cleanup super easy, especially in real-world -scenarios where there is often a large number of leftover models. The :py:func:`~verticapy.machine_learning.vertica.LogisticRegression.drop` method drops from the database the info of the experiment and all associated models other than those specified in the keeping_models list. +The experiment object facilitates not only model tracking but also makes cleanup super easy, especially in real-world scenarios where there is often a large number of leftover models. The :py:func:`~verticapy.machine_learning.vertica.LogisticRegression.drop` method drops from the database the info of the experiment and all associated models other than those specified in the keeping_models list. .. ipython:: python :okwarning: - my_experiment_1.drop(keeping_models=[top_model.model_name]) + my_experiment_1.drop(keeping_models = [top_model.model_name]) -Experiments are also helpful for performing grid search on hyper-parameters. The following example shows how they can -be used to study the impact of the max_iter parameter on the prediction performance of :py:mod:`~verticapy.machine_learning.vertica.linear_model.LogisticRegression` models. +Experiments are also helpful for performing grid search on hyper-parameters. The following example shows how they can be used to study the impact of the ``max_iter`` parameter on the prediction performance of :py:mod:`~verticapy.machine_learning.vertica.linear_model.LogisticRegression` models. .. ipython:: python :suppress: diff --git a/docs/source/user_guide_performance_qprof.rst b/docs/source/user_guide_performance_qprof.rst index 8c741ce53..933e93555 100644 --- a/docs/source/user_guide_performance_qprof.rst +++ b/docs/source/user_guide_performance_qprof.rst @@ -210,7 +210,7 @@ Once the :py:mod:`~verticapy.performance.vertica.qprof.QueryProfiler` object is .. raw:: html :file: SPHINX_DIRECTORY/figures/user_guides_performance_qprof_get_queries.html -To visualize the query plan, run :py:func:`verticapy.QueryProfilerInterface.get_qplan_tree`, +To visualize the query plan, run :py:func:`~verticapy.performance.vertica.qprof.QueryProfilerInterface.get_qplan_tree`, which is customizable, allowing you to specify certain metrics or focus on a specified tree path: .. image:: /_static/website/user_guides/performance/user_guide_performance_qprof_get_qplan_tree.png @@ -277,7 +277,7 @@ You can export and import :py:mod:`~verticapy.performance.vertica.qprof.QueryPro Export +++++++ -To export a :py:mod:`~verticapy.performance.vertica.qprof.QueryProfiler` object, use the :py:func:`~verticapy.performance.vertica.QueryProfiler.export_profile` method: +To export a :py:mod:`~verticapy.performance.vertica.qprof.QueryProfiler` object, use the :py:func:`~verticapy.performance.vertica.qprof.QueryProfiler.export_profile` method: .. code-block:: python