-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Improved docs on Transforms #2655
base: main
Are you sure you want to change the base?
Changes from 5 commits
7ceec5a
cb79d5d
7f82821
50ad1a5
aa6b486
783a1f0
baf808f
0a94934
5fdd170
be65149
7f66e23
78a07db
97c036b
f0bbc8c
d1fc997
796a86e
85e9d95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -8,7 +8,7 @@ There are two ways to aggregate data within Altair: within the encoding itself, | |||||||||||||||||||
or using a top level aggregate transform. | ||||||||||||||||||||
|
||||||||||||||||||||
The aggregate property of a field definition can be used to compute aggregate | ||||||||||||||||||||
summary statistics (e.g., median, min, max) over groups of data. | ||||||||||||||||||||
summary statistics (e.g., :code:`median`, :code:`min`, :code:`max`) over groups of data. | ||||||||||||||||||||
|
||||||||||||||||||||
If at least one fields in the specified encoding channels contain aggregate, | ||||||||||||||||||||
dsmedia marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||
the resulting visualization will show aggregate data. In this case, all | ||||||||||||||||||||
|
@@ -43,9 +43,9 @@ is made available for convenience, and is equivalent to the longer form:: | |||||||||||||||||||
# ... | ||||||||||||||||||||
|
||||||||||||||||||||
For more information on shorthand encodings specifications, see | ||||||||||||||||||||
:ref:`encoding-aggregates`. | ||||||||||||||||||||
:ref:`shorthand-description`. | ||||||||||||||||||||
dangotbanned marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||
|
||||||||||||||||||||
The same plot can be shown using an explicitly computed aggregation, using the | ||||||||||||||||||||
The same plot can be shown via an explicitly computed aggregation, using the | ||||||||||||||||||||
:meth:`~Chart.transform_aggregate` method: | ||||||||||||||||||||
|
||||||||||||||||||||
.. altair-plot:: | ||||||||||||||||||||
|
@@ -58,7 +58,96 @@ The same plot can be shown using an explicitly computed aggregation, using the | |||||||||||||||||||
groupby=["Cylinders"] | ||||||||||||||||||||
) | ||||||||||||||||||||
|
||||||||||||||||||||
For a list of available aggregates, see :ref:`encoding-aggregates`. | ||||||||||||||||||||
The alternative to using aggregate functions is to preprocess the data with | ||||||||||||||||||||
Pandas, and then plot the resulting DataFrame: | ||||||||||||||||||||
|
||||||||||||||||||||
.. altair-plot:: | ||||||||||||||||||||
|
||||||||||||||||||||
cars_df = data.cars() | ||||||||||||||||||||
source = ( | ||||||||||||||||||||
cars_df.groupby('Cylinders') | ||||||||||||||||||||
.Acceleration | ||||||||||||||||||||
.mean() | ||||||||||||||||||||
.reset_index() | ||||||||||||||||||||
.rename(columns={'Acceleration': 'mean_acc'}) | ||||||||||||||||||||
) | ||||||||||||||||||||
|
||||||||||||||||||||
alt.Chart(source).mark_bar().encode( | ||||||||||||||||||||
y='Cylinders:O', | ||||||||||||||||||||
x='mean_acc:Q' | ||||||||||||||||||||
) | ||||||||||||||||||||
|
||||||||||||||||||||
**Note:** As mentioned in :doc:`../data`, this approach of transforming the | ||||||||||||||||||||
data with Pandas is preferable if we already have the DataFrame at hand. | ||||||||||||||||||||
Comment on lines
+80
to
+81
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider 1) being more explicit about what exactly is meant by the term "at hand" and 2) being upfront in this sentence about the reason or reasons for Pandas transformations being preferable when the DataFrame is "at hand" (automatic type inference? something else also?) Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it should be referencing data-transformations |
||||||||||||||||||||
|
||||||||||||||||||||
Because :code:`Cylinders` is of type :code:`int64` in the :code:`source` | ||||||||||||||||||||
DataFrame, Altair would have treated it as a :code:`qualitative` --instead of | ||||||||||||||||||||
:code:`ordinal`-- type, had we not specified it. Making the type of data | ||||||||||||||||||||
explicit is important since it affects the resulting plot; see | ||||||||||||||||||||
:ref:`type-legend-scale` and :ref:`type-axis-scale` for two illustrated | ||||||||||||||||||||
examples. As a rule of thumb, it is better to make the data type explicit, | ||||||||||||||||||||
instead of relying on an implicit type conversion. | ||||||||||||||||||||
|
||||||||||||||||||||
Functions Without Arguments | ||||||||||||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||||||||||
|
||||||||||||||||||||
It is possible for aggregate functions to not | ||||||||||||||||||||
have an argument. In this case, aggregation will be performed on the column | ||||||||||||||||||||
used in the other axis. | ||||||||||||||||||||
dangotbanned marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||
|
||||||||||||||||||||
The following chart demonstrates this by counting the number of cars with | ||||||||||||||||||||
respect to their country of origin. | ||||||||||||||||||||
|
||||||||||||||||||||
.. altair-plot:: | ||||||||||||||||||||
|
||||||||||||||||||||
alt.Chart(cars).mark_bar().encode( | ||||||||||||||||||||
y='Origin:N', | ||||||||||||||||||||
# shorthand form of alt.Y(aggregate='count') | ||||||||||||||||||||
x='count()' | ||||||||||||||||||||
) | ||||||||||||||||||||
Comment on lines
+103
to
+107
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment seems like it meant
Suggested change
|
||||||||||||||||||||
|
||||||||||||||||||||
**Note:** The :code:`count` aggregate function is of type | ||||||||||||||||||||
:code:`quantitative` by default, it does not matter if the source data is a | ||||||||||||||||||||
DataFrame, URL pointer, CSV file or JSON file. | ||||||||||||||||||||
Comment on lines
+109
to
+111
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||
|
||||||||||||||||||||
Functions that handle categorical data (such as :code:`count`, | ||||||||||||||||||||
:code:`missing`, :code:`distinct` and :code:`valid`) are the ones that get | ||||||||||||||||||||
the most out of this feature. | ||||||||||||||||||||
|
||||||||||||||||||||
Argmin / Argmax | ||||||||||||||||||||
dangotbanned marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||
^^^^^^^^^^^^^^^ | ||||||||||||||||||||
Both :code:`argmin` and :code:`argmax` aggregate functions can only be used | ||||||||||||||||||||
with the :meth:`~Chart.transform_aggregate` method. Trying to use their | ||||||||||||||||||||
respective shorthand notations will result in an error. This is due to the fact | ||||||||||||||||||||
that either :code:`argmin` or :code:`argmax` functions return an object, not | ||||||||||||||||||||
values. This object then specifies the values to be selected from other | ||||||||||||||||||||
columns when encoding. One can think of the returned object as being a | ||||||||||||||||||||
dictionary, while the column serves the purpose of being a key, which then | ||||||||||||||||||||
obtains its respective value. | ||||||||||||||||||||
|
||||||||||||||||||||
The true value of these functions is appreciated when we want to compare the | ||||||||||||||||||||
most **distinctive** samples from two sets of data with respect to another set | ||||||||||||||||||||
of data. | ||||||||||||||||||||
|
||||||||||||||||||||
As an example, suppose we want to compare the weight of the strongest cars, | ||||||||||||||||||||
with respect to their country/region of origin. This can be done using | ||||||||||||||||||||
:code:`argmax`: | ||||||||||||||||||||
|
||||||||||||||||||||
.. altair-plot:: | ||||||||||||||||||||
|
||||||||||||||||||||
alt.Chart(cars).mark_bar().encode( | ||||||||||||||||||||
x='greatest_hp[Weight_in_lbs]:Q', | ||||||||||||||||||||
y='Origin:N' | ||||||||||||||||||||
).transform_aggregate( | ||||||||||||||||||||
greatest_hp='argmax(Horsepower)', | ||||||||||||||||||||
groupby=['Origin'] | ||||||||||||||||||||
) | ||||||||||||||||||||
|
||||||||||||||||||||
It is clear that Japan's strongest car is also the lightest, while that of USA | ||||||||||||||||||||
is the heaviest. | ||||||||||||||||||||
dangotbanned marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||
|
||||||||||||||||||||
See :ref:`gallery_line_chart_with_custom_legend` for another example that uses | ||||||||||||||||||||
:code:`argmax`. The case of :code:`argmin` is completely similar. | ||||||||||||||||||||
|
||||||||||||||||||||
Transform Options | ||||||||||||||||||||
^^^^^^^^^^^^^^^^^ | ||||||||||||||||||||
|
@@ -70,3 +159,39 @@ class, which has the following options: | |||||||||||||||||||
The :class:`~AggregatedFieldDef` objects have the following options: | ||||||||||||||||||||
|
||||||||||||||||||||
.. altair-object-table:: altair.AggregatedFieldDef | ||||||||||||||||||||
|
||||||||||||||||||||
.. _agg-func-table: | ||||||||||||||||||||
|
||||||||||||||||||||
List of Aggregation Functions | ||||||||||||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||||||||||
|
||||||||||||||||||||
In addition to ``count`` and ``average``, there are a large number of available | ||||||||||||||||||||
aggregation functions built into Altair; they are listed in the following table: | ||||||||||||||||||||
|
||||||||||||||||||||
========= =========================================================================== ===================================== | ||||||||||||||||||||
Aggregate Description Example | ||||||||||||||||||||
========= =========================================================================== ===================================== | ||||||||||||||||||||
Comment on lines
+170
to
+172
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The vega-lite docs appear to list these in a more logical (if implicit) order, starting with count-related functions (including There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree on changing the order. I'd probably need to see the end result of adding categories though. |
||||||||||||||||||||
argmin An input data object containing the minimum field value. N/A | ||||||||||||||||||||
argmax An input data object containing the maximum field value. :ref:`gallery_line_chart_with_custom_legend` | ||||||||||||||||||||
average The mean (average) field value. Identical to mean. :ref:`gallery_layer_line_color_rule` | ||||||||||||||||||||
count The total count of data objects in the group. :ref:`gallery_simple_heatmap` | ||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Vega-Lite docs also state
Just mentioning in case it's worth adding here as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe that phrasing could replace
|
||||||||||||||||||||
distinct The count of distinct field values. N/A | ||||||||||||||||||||
max The maximum field value. :ref:`gallery_boxplot` | ||||||||||||||||||||
mean The mean (average) field value. :ref:`gallery_scatter_with_layered_histogram` | ||||||||||||||||||||
median The median field value :ref:`gallery_boxplot` | ||||||||||||||||||||
min The minimum field value. :ref:`gallery_boxplot` | ||||||||||||||||||||
missing The count of null or undefined field values. N/A | ||||||||||||||||||||
q1 The lower quartile boundary of values. :ref:`gallery_boxplot` | ||||||||||||||||||||
q3 The upper quartile boundary of values. :ref:`gallery_boxplot` | ||||||||||||||||||||
ci0 The lower boundary of the bootstrapped 95% confidence interval of the mean. :ref:`gallery_sorted_error_bars_with_ci` | ||||||||||||||||||||
ci1 The upper boundary of the bootstrapped 95% confidence interval of the mean. :ref:`gallery_sorted_error_bars_with_ci` | ||||||||||||||||||||
stderr The standard error of the field values. N/A | ||||||||||||||||||||
stdev The sample standard deviation of field values. N/A | ||||||||||||||||||||
stdevp The population standard deviation of field values. N/A | ||||||||||||||||||||
sum The sum of field values. :ref:`gallery_streamgraph` | ||||||||||||||||||||
product The product of field values. N/A | ||||||||||||||||||||
valid The count of field values that are not null or undefined. N/A | ||||||||||||||||||||
values ?? N/A | ||||||||||||||||||||
dangotbanned marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||
variance The sample variance of field values. N/A | ||||||||||||||||||||
variancep The population variance of field values. N/A | ||||||||||||||||||||
========= =========================================================================== ===================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think these should have some markup, but since they aren't functions -
median
etc seems like the wrong choice.Something like
"median(...)"
would link more closely to how you'd use it