Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does ERXStringUtilities.distance() return a double? #312

Closed
paulhoadley opened this issue Nov 7, 2012 · 13 comments
Closed

Why does ERXStringUtilities.distance() return a double? #312

paulhoadley opened this issue Nov 7, 2012 · 13 comments
Assignees

Comments

@paulhoadley
Copy link
Contributor

As far as I can see, this method is calculating the Levenshtein distance between its arguments, an algorithm which returns an integer. Eyeballing it, I assume the reason is for convenience of fuzzyMatch(), a method in which distance() is called six times, where it seems to be used among some other double values, though it's not even clear whether all of those should be or need to be double values.

If distance() is to be a serious method in its own right, and it is public, then it should return an integer. At the very least, fuzzyMatch() could then cast the result to a double, though I don't immediately see a good reason for even doing that.

Of course, there will be backwards compatibility implications, but I'm not sure that's a good enough reason to let this stand. The algorithm returns an integer, and so should this method.

@darkv
Copy link
Member

darkv commented Nov 7, 2012

Could be a change for Wonder v6 and one point of a migration guide, though I doubt that many are using this method anyway so most of them won't notice.

paulhoadley added a commit to paulhoadley/wonder that referenced this issue Nov 7, 2012
darkv added a commit that referenced this issue Nov 7, 2012
Adds tests for ERXStringUtilities.distance(). #312
@nullterminated
Copy link
Member

I would object to changing the return type on principal. One does not change return types of established public methods. One deprecates the old method, replaces it with a new method that has the desired behavior, and documents the change in the deprecation javadoc.

http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc/deprecation/deprecation.html

@paulhoadley
Copy link
Contributor Author

Doing this via deprecation sounds just fine—for one thing, the method should probably be called levenshteinDistance() anyway, so as to distinguish it from, say, hammingDistance(). In that case, we should still talk about a timeline for removal of distance(), though. As we're discussing elsewhere, the contract for backwards compatibility should be made explicit. I don't think we should mark it deprecated and then leave it indefinitely.

(Having said all of that, it's not like I have much invested in this. I've never used the method, I just noticed it last night. That it returns a double just stuck out like a sore thumb, and offended the mathematician in me.)

@ghost ghost assigned paulhoadley Nov 7, 2012
@hprange
Copy link
Contributor

hprange commented Nov 7, 2012

We could remove every deprecated method on major releases (based on the major.minor.bug_fix version schema).

@darkv
Copy link
Member

darkv commented Nov 8, 2012

Calling the new method levenshteinDistance() and deprecating the current one seems reasonably, where is the pull request? ;)

Removing every deprecated method on major releases is perhaps a little too much, but those we catch and think should go away...

@paulhoadley
Copy link
Contributor Author

Working on it right now... :)

paulhoadley added a commit to paulhoadley/wonder that referenced this issue Nov 8, 2012
This is really just a cut and paste of the implementation of 'double
distance(String, String)' into 'int levenshteinDistance(String,
String)', and a replacement of the body of the former method with a
call to the latter. (We don't even need a cast to go from int to
double.) We also take the opportunity to deprecate distance(), and
mark it for potential future removal. Note that distance() still
passes all the tests in ERXStringUtilitiesTest.testDistance().
paulhoadley added a commit to paulhoadley/wonder that referenced this issue Nov 8, 2012
darkv added a commit that referenced this issue Nov 8, 2012
Minor refactoring of ERXStringUtilities.distance(). #312
@paulhoadley
Copy link
Contributor Author

No need to merge the pull request straight away. I am more than happy to leave it open for discussion for now. Ramsey, are you happy with this?

@paulhoadley
Copy link
Contributor Author

Ha! Johann, you're too fast! Happy to roll it back if Ramsey or anyone else doesn't like it.

@darkv
Copy link
Member

darkv commented Nov 8, 2012

Ups sorry, I was too fast ;) But keeping the old method with the exact API preserves compatibility with old code.

@paulhoadley
Copy link
Contributor Author

Yeah, I believe this makes absolutely no functional change.

@ishimoto
Copy link
Contributor

ishimoto commented Nov 8, 2012

Yes you are to fast. http://youtu.be/839DO8-0cY8

@nullterminated
Copy link
Member

On Nov 7, 2012, at 2:40 PM, Paul Hoadley wrote:

Doing this via deprecation sounds just fine—for one thing, the method should probably be called levenshteinDistance() anyway, so as to distinguish it from, say, hammingDistance(). In that case, we should still talk about a timeline for removal of distance(), though. As we're discussing elsewhere, the contract for backwards compatibility should be made explicit. I don't think we should mark it deprecated and then leave it indefinitely.

(Having said all of that, it's not like I have much invested in this. I've never used the method, I just noticed it last night. That it returns a double just stuck out like a sore thumb, and offended the mathematician in me.)

Sounds good to me

@darkv
Copy link
Member

darkv commented Nov 8, 2012

So everyone agrees and as the patch is already committed I close this issue.

@darkv darkv closed this as completed Nov 8, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants