-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Improved docs on Transforms #2655
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days. |
@mattijn, @joelostblom I'm combing through old issues and came across this PR that apparently closes #2645 Obviously we'd need to get this branch up-to-date, but I wanted to check-in to see if this had actually resolved #2645? UpdateI think I've got this conflict-free now with |
Previous merge was super messy, due to 2 year old PR
@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before? |
Sure. Will have a look this evening. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great doc additions! I've made some recommendations / edits here for consideration.
**Note:** As mentioned in :doc:`../data`, this approach of transforming the | ||
data with Pandas is preferable if we already have the DataFrame at hand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider 1) being more explicit about what exactly is meant by the term "at hand" and 2) being upfront in this sentence about the reason or reasons for Pandas transformations being preferable when the DataFrame is "at hand" (automatic type inference? something else also?)
Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.
I think it should be referencing data-transformations
argmin An input data object containing the minimum field value. N/A | ||
argmax An input data object containing the maximum field value. :ref:`gallery_line_chart_with_custom_legend` | ||
average The mean (average) field value. Identical to mean. :ref:`gallery_layer_line_color_rule` | ||
count The total count of data objects in the group. :ref:`gallery_simple_heatmap` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vega-Lite docs also state
Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.
Just mentioning in case it's worth adding here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vega-Lite docs also state
Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.
Just mentioning in case it's worth adding here as well?
Maybe that phrasing could replace
"... in the other axis" (#2655 (comment))
========= =========================================================================== ===================================== | ||
Aggregate Description Example | ||
========= =========================================================================== ===================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vega-lite docs appear to list these in a more logical (if implicit) order, starting with count-related functions (including count
, valid
, values
, missing
, and distinct
), moving to basic mathematical operations (sum
, product
), then to central tendency measures (mean
/average
, variance
/variancep
, stdev
/stdevp
, stderr
, median
), followed by distribution statistics (q1
, q3
, ci0
, ci1
), and finally ending with range functions (min
/argmin
, max
/argmax
). The ordering here appears to be in alphabetial order, though it's not strictly so (e.g. ci01
). I would have a slight preference for the vega-lite-style functional organization scheme (and with explicit headings for the categories).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on changing the order.
I'd probably need to see the end result of adding categories though.
The naive approach of just adding a category field would add a lot of repetition
@dsmedia thanks so much for reviewing so quickly! |
@dsmedia Co-authored-by: Daniel Sorid <[email protected]>
Co-authored-by: Daniel Sorid <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for doing the heavy lifting on this review @dsmedia
I think I've responded to/applied all of your suggestions and added a few I spotted
@@ -8,7 +8,7 @@ There are two ways to aggregate data within Altair: within the encoding itself, | |||
or using a top level aggregate transform. | |||
|
|||
The aggregate property of a field definition can be used to compute aggregate | |||
summary statistics (e.g., median, min, max) over groups of data. | |||
summary statistics (e.g., :code:`median`, :code:`min`, :code:`max`) over groups of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think these should have some markup, but since they aren't functions - median
etc seems like the wrong choice.
Something like "median(...)"
would link more closely to how you'd use it
**Note:** As mentioned in :doc:`../data`, this approach of transforming the | ||
data with Pandas is preferable if we already have the DataFrame at hand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.
I think it should be referencing data-transformations
alt.Chart(cars).mark_bar().encode( | ||
y='Origin:N', | ||
# shorthand form of alt.Y(aggregate='count') | ||
x='count()' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment seems like it meant alt.X(aggregate='count')
; but I think we can do without
alt.Chart(cars).mark_bar().encode( | |
y='Origin:N', | |
# shorthand form of alt.Y(aggregate='count') | |
x='count()' | |
) | |
alt.Chart(cars).mark_bar().encode( | |
x='count()', | |
y='Origin:N' | |
) |
**Note:** The :code:`count` aggregate function is of type | ||
:code:`quantitative` by default, it does not matter if the source data is a | ||
DataFrame, URL pointer, CSV file or JSON file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note:** The :code:`count` aggregate function is of type | |
:code:`quantitative` by default, it does not matter if the source data is a | |
DataFrame, URL pointer, CSV file or JSON file. | |
.. note:: | |
The :code:`count` aggregate function is of type :code:`quantitative` by default, | |
it does not matter if the source data is a DataFrame, URL pointer, CSV file or JSON file. |
========= =========================================================================== ===================================== | ||
Aggregate Description Example | ||
========= =========================================================================== ===================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on changing the order.
I'd probably need to see the end result of adding categories though.
The naive approach of just adding a category field would add a lot of repetition
argmin An input data object containing the minimum field value. N/A | ||
argmax An input data object containing the maximum field value. :ref:`gallery_line_chart_with_custom_legend` | ||
average The mean (average) field value. Identical to mean. :ref:`gallery_layer_line_color_rule` | ||
count The total count of data objects in the group. :ref:`gallery_simple_heatmap` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vega-Lite docs also state
Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.
Just mentioning in case it's worth adding here as well?
Maybe that phrasing could replace
"... in the other axis" (#2655 (comment))
Notes:
Sorry for messing up the other pull request (#2654), but I finally fixed my commits and branches.