Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimator error message unclear #120

Open
hangqianjun opened this issue Jun 17, 2024 · 9 comments
Open

Estimator error message unclear #120

hangqianjun opened this issue Jun 17, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@hangqianjun
Copy link
Contributor

I encountered the following erorr message when using KNN estimator:

File /opt/desc/py/lib/python3.10/site-packages/rail/estimation/algos/k_nearneigh.py:189, in KNearNeighEstimator._process_chunk(self, start, end, data, first)
    187         knn_df.loc[np.isnan(knn_df[col]), col] = self.config.mag_limits[col]
    188     else:
--> 189         knn_df.loc[np.isclose(knn_df[col], self.config.nondetect_val), col] = self.config.mag_limits[col]
    191 testcolordata = _computecolordata(knn_df, self.config.ref_band, self.config.bands)
    192 dists, idxs = self.kdtree.query(testcolordata, k=self.numneigh)

File /opt/desc/py/lib/python3.10/site-packages/numpy/core/numeric.py:2348, in isclose(a, b, rtol, atol, equal_nan)
   2345     dt = multiarray.result_type(y, 1.)
   2346     y = asanyarray(y, dtype=dt)
-> 2348 xfin = isfinite(x)
   2349 yfin = isfinite(y)
   2350 if all(xfin) and all(yfin):

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The error is actually due to the fact that the column names in the informer is different from the test data. However this error message is not helpful in identifying the issue. It would be good to add some checks and give more informative error messages.

@aimalz's student has encountered similar issues.

@hangqianjun hangqianjun added the enhancement New feature or request label Jun 17, 2024
@sschmidt23
Copy link
Collaborator

Ok, I think we just need to add checks that the bands specified are all present in the data, and check that the ref_band and nondetect_val bands dict keys are also present (I bet it's the latter actually causing the error). That is simple to add, I can do it. KNN is actually in rail_sklearn, though, so I'll probably create an issue there and close this one.

@hangqianjun
Copy link
Contributor Author

Hi @sschmidt23! Thanks for addressing this. I was wondering if this is a general issue with estimators with a similar structure (i.e. they need to check if the input columns etc. match up with the model). Do you think there could be some more general function to check those things? Or is it easier to implement in a per-estimator base?

@aimalz
Copy link
Collaborator

aimalz commented Jun 17, 2024

I agree with @hangqianjun that it would be ideal to add this into the estimator base class.

@sschmidt23
Copy link
Collaborator

I've also thought about this, and I think that there are sometimes some slight differences in how the inputs are handled, so implementation may be slightly awkward. But, maybe not. I'm going to add a fix for KNearNeigh and maybe we can discuss further on the Slack channel about any difficulties in implementation at the estimation level (e.g. maybe a function call that we add to all informers and estimators that has a default that can be overridden for subclasses that do something non-standard with the data).

@eacharles
Copy link
Collaborator

eacharles commented Jun 17, 2024

The tricking thing about doing this with a function call is that you don't actually know what is in the file until you open it... When you are running in a parallel you need to be careful about how to handle this. I could imagine only calling the function if you are in the first chunk.

@sschmidt23
Copy link
Collaborator

sschmidt23 commented Jun 17, 2024

I was just about to bring that up here: I added checks in rail_sklearn to the informer stage for KNearNeighInformer, but did not add the equivalent checks to the estimator stage because I was wondering what the best way to do that was given the process_chunk call there, I wasn't sure of the best way to do it without checking for every chunk.

@hangqianjun
Copy link
Contributor Author

@eacharles Agree with calling it in the first chunk only.

@sschmidt23
Copy link
Collaborator

So looking at how we do things now, in the informer parent class we open the input data files in the run method, and because each estimator class has its own custom run that overrides the base class run, we would need to move the data read in and the checks that we want to add so that they are called before run. This seems pretty straightforward for inform, I'm less sure how to do it for estimate generically (other than opening the input file twice, once just for the checks and then leaving the current estimate as-is). If anyone has more clever and efficient ideas for how to modify estimate posting them here for discussion would be great.

And it does seem like this would touch pretty much all of the estimators and informers, so someone should probably put together one of the design docs and present it to everyone and such so that we're keeping everyone in the loop and making sure that we gather input from everyone.

@hangqianjun
Copy link
Contributor Author

Drafting a design doc here: link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants