Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCP max_cost and max_cummulative_cost parameters do nothing #7604

Open
helvince opened this issue Nov 10, 2024 · 13 comments
Open

MCP max_cost and max_cummulative_cost parameters do nothing #7604

helvince opened this issue Nov 10, 2024 · 13 comments

Comments

@helvince
Copy link

Description:

I was hoping to speed up MCP by confining the paths using max_cost and max_cumulative_cost. Neither parameter displays the expected effects (output is not effected by either).

I expect the max_cost to either a) clip the cost to some maximum value or b) do not allow paths through nodes with a cost value above max_cost. The latter ofcourse being the most logical, as I could clip the input myself.
I expect the max_cumulative_cost to restrict paths up to some total cost. I can simply apply the threshold afterwards, but I expect the algorithm to converge quicker if this is taken into account.

Way to reproduce:

# original code from Juan: https://stackoverflow.com/questions/62135639/mcp-geometrics-for-calculating-marketsheds
import numpy as np
from skimage import graph
from skimage.graph import _mcp
from scipy import sparse


image = np.array(
    [[1, 1, 2, 2],
     [2, 1, 1, 3],
     [3, 2, 1, 2],
     [2, 2, 2, 1]]
)
destinations = [[0, 0], [3, 3]]

mcp = graph.MCP(2*image)
costs, traceback = mcp.find_costs(destinations, max_cumulative_cost=3., find_all_ends=False)
print(costs)
costs, traceback = mcp.find_costs(destinations, max_cost=3., find_all_ends=False)
print(costs)
# both return: 
#[[ 2.  4.  8. 10.]
# [ 6.  4.  6. 10.]
# [10.  8.  4.  6.]
# [12.  8.  6.  2.]]

# Finding the corresponding endpoints:
offsets = _mcp.make_offsets(2, True)
offsets.append(np.array([0, 0]))
offsets_arr = np.array(offsets)

indices = np.indices(traceback.shape)
offset_to_neighbor = offsets_arr[traceback]
neighbor_index = indices - offset_to_neighbor.transpose((2, 0, 1))
ids = np.arange(traceback.size).reshape(image.shape)
neighbor_ids = np.ravel_multi_index(
    tuple(neighbor_index), traceback.shape
)

g = sparse.coo_matrix((
    np.ones(traceback.size),
    (ids.flat, neighbor_ids.flat)),
    shape=(ids.size, ids.size),
).tocsr()

n, components = sparse.csgraph.connected_components(g)
basins = components.reshape(image.shape)

print(basins)
# both (or without max_cost/max_cumulative_cost) return
#[[0 0 0 0]
# [0 0 0 1]
# [0 0 1 1]
# [1 1 1 1]]

### Version information:

```Shell
3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
Linux-6.1.85+-x86_64-with-glibc2.35
scikit-image version: 0.24.0
numpy version: 1.26.4
@michaelbratsch
Copy link
Contributor

Looking at the code, it seems to me that the parameters max_cumulative_cost and max_cost of find_costs are not used at all. Furthermore, the parameter max_coverage is missing documentation.

https://github.com/scikit-image/scikit-image/blob/707d937c43041737b562c3fb8d1fa88a5d7e13af/skimage/graph/_mcp.pyx#L395L396

The parameters have been introduced with the pull request #854. The pull request was quite big so it might just have been forgotten.

@stefanv Do you still remember anything from this pull request and whether there was any intention behind it? I know it was already 10 years ago 😲

@stefanv
Copy link
Member

stefanv commented Nov 17, 2024

Hrm, wow, no but maybe @almarklein does 😅

@almarklein
Copy link
Contributor

Looking at the history of _mcp.pyx, it looks like I added the args to find_cost() here, but never implemented them.

From what I remember, I had an implementation of the MCP algorithm that I had used locally in my research. I refactored it a bit as I added the code to skimage. I think that these args represent a feature that I had in my own version, but these were not really necessary because the new class had hooks to implement things like that.

As to their intended meaning, I think max_cost would be a cost above which the front would refuse to move to that voxel. It allows you to basically create walls. The max_cumulative_cost was likely a way to stop the front from growing when it failed to find an end-point. This can be achieved by overloading goal_reached().

Funny how this was found only now. I'm sorry if I confused users by having introduced arguments that do nothing.

@lagru
Copy link
Member

lagru commented Nov 21, 2024

@helvince everyone else, thanks for catching and diagnosing this. It sounds unlikely that anyone is planning to fill the unused parameters with life anytime soon. So I suggest to just deprecate them.

[...] because the new class had hooks to implement things like that.

@almarklein, that sounds intriguing. In case anyone is interested in things like that, how would they go about using these class hooks? A few pointers could be helpful in documenting this or even implementing this if someone wants to add that feature. :)

@almarklein
Copy link
Contributor

There's goal_reached, examine_neighbor, update_node, travel_cost, create_connection. They are in principal documented: https://scikit-image.org/docs/stable/api/skimage.graph.html, just search for "overload". But perhaps this can be made more explicit, e.g. the class docstring can list the methods that can be overloaded.

@michaelbratsch
Copy link
Contributor

As to their intended meaning, I think max_cost would be a cost above which the front would refuse to move to that voxel. It allows you to basically create walls.

It seems to me that this condition can be easily checked right here to skip that voxel:

if new_cost < 0 or new_cost == inf:

@almarklein Do you agree?

I can open a merge request to implement max_cost and to deprecate max_cumulative_cost because it is not needed.

@almarklein
Copy link
Contributor

Yeah, probably just replace new_cost == inf with new_cost > max_cost.

@michaelbratsch
Copy link
Contributor

I opened a merge request for the discussed changes

cumulative_costs = np.asarray(flat_cumulative_costs)

What still irritates me is that find_costs returns the internal state of cumulative_costs and traceback and not a copy. I think we should change np.asarray to np.array in:

cumulative_costs = np.asarray(flat_cumulative_costs)

@almarklein Is there a special reasoning behind not copying the cumulative_costs and traceback? I would assume to always get a new object.

@lagru
Copy link
Member

lagru commented Dec 8, 2024

What still irritates me is that find_costs returns the internal state of cumulative_costs and traceback and not a copy.

That does seem weird. It makes sense that the method modifies the attribute in place but might be a different story for arrays returned by the method. Doesn't seem like a case that should be especially memory sensitive.

@almarklein
Copy link
Contributor

@almarklein Is there a special reasoning behind not copying the cumulative_costs and traceback?

I don't remember, maybe performance. If returning a copy (or a readonly view?) is more skimage-like, go ahead! 👍

@michaelbratsch
Copy link
Contributor

@almarklein Thanks for the info.

Another question, the parameter max_coverage is missing documentation

max_coverage=1.0, max_cumulative_cost=None, max_cost=None):

I wanted to add some but it seems like it was only useful while developing:

cdef INDEX_T maxiter = int(max_coverage * flat_costs.size)

@almarklein Do you agree that we can deprecate that one, too? Or might it be useful for some users?

@almarklein
Copy link
Contributor

I guess it could still be useful, e.g. if you know that the explored area should never make up more than 20% of the total image, it allows you to stop failure-cases 5x faster. Definitely a bit of a niche, but you never know if someone is using it, and the cost for maintaining is is near-zero 🤷

@almarklein
Copy link
Contributor

near-zero

Well, except for writing a docstring 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants