-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SITCOM-1786: Add CountRows command #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of implementing the AggregateQuery
. It's something that people have requested that doesn't exist yet and I think that it will be useful in a number of ways in the future.
But I'd like to see the PR (and the one for rubintv_visualization) make more use of it. For example, instead of having CountRowsCommand
I think that it would be better to have something like AggregateQueryCommand
that works for a general query. I don't think too much needs to change since you already designed AggregateQuery
to handle more general commands.
Also, please implement tests for new features on the python side. I would test at least two commands, maybe count
and sum
(or max
or min
, or something like that). There is a mock database that is already setup during the tests so that should be doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for the one comment that I made. Fix that (and the test) and it should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to hate me for this, but I'm wondering yet again if there's a better way. We might just be able to update LoadColumnsCommand
by giving it an option aggregator : str | None = None
attribute (so add that line after https://github.com/lsst-ts/rubintv_analysis_service/blob/main/python/lsst/rubintv/analysis/service/commands/db.py#L40-L75). Then update the database.query
method to take the same optional argument and if it is non-null, update https://github.com/lsst-ts/rubintv_analysis_service/blob/main/python/lsst/rubintv/analysis/service/database.py#L477 to
aggregate_func = getattr(sqlalchemy.func, aggregateor.lower())
query_model = sqlalchemy.select(*[aggregate_func(column) for column in table_columns]).select_from(select_from).where(query_model)
The nice thing about this is that you won't have to add any new commands or query options, on either the server side or client side, and it minimizes the duplication of code. It also put loading column data from queries in a single command so that if we make changes to the API later we only have one palce to do it.
# Initialize a dictionary to store aggregate results | ||
result = {} | ||
|
||
for column in self.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can modify this to run the same function on multiple columns (see my comments below)
base_query = sqlalchemy.select(aggregate_func(column).label(self.aggregate.lower())).select_from( | ||
table | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can actually run this on multiple columns to make this more efficient
base_query = sqlalchemy.select(aggregate_func(column).label(self.aggregate.lower())).select_from( | |
table | |
) | |
base_query = sqlalchemy.select( | |
*[aggregate_func(column).label(self.aggregate.lower())for column in columns] | |
).select_from(table) |
Remove AggregatedQueryCommand
There's a small amount of snail trail code in there from passing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes Guy! I left a minor comment about docstrings and a request to updates the tests and creation of the select_from
object. I'm worried that as is it won't quite work with queries using columns that are not in the columns
parameter list.
) -> tuple[set[sqlalchemy.Column], set[str], list[sqlalchemy.Column]]: | ||
"""Return the sqlalchemy models for a list of columns. | ||
|
||
Parameters | ||
---------- | ||
columns : | ||
The names of the columns in the database. | ||
using_aggregator : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and other docstring updates in this module: there's no need to include the the type in the docstring since we're using type hints. The documenteer build system should be able to build the docs from the type hints,
query_model = sqlalchemy.and_(*[col.isnot(None) for col in table_columns]) | ||
select_from = self.joins.build_join(table_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this select clause can be here. It's possible that a query might use columns that were not in the original list of columns, so I'm not sure if the joins can be properly built in that case (for example imagine the query was over ra/dec but the column being searched is something else. That's why if query is not None we have to add the table names before querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how this might be a problem for the aggregator. Maybe cache the table names before the query and data_id table names are optionally added and use that in your select clause and return dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the table joining to below the custom query logic so that all tables and columns are now included in the FROM clause.
@@ -37,6 +37,60 @@ def execute_command(self, command: dict, response_type: str) -> dict: | |||
return result["content"] | |||
|
|||
|
|||
class TestLoadColumnsWithAggregatorCommand(TestCommand): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a test. Since all of the tests use the same columns and command, maybe you can include a setUp
method to create them there once. Also, maybe run once without a query and once with, with a query that uses columns that you are not aggregating over, that way we can verify that the query
method is working as expected.
3104bff
to
3ccdabd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your patience and work!
Checklist
doc/changes