Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove document similarity alternative ranking script #427

Open
marekhorst opened this issue Jul 19, 2017 · 0 comments
Open

Remove document similarity alternative ranking script #427

marekhorst opened this issue Jul 19, 2017 · 0 comments

Comments

@marekhorst
Copy link
Member

marekhorst commented Jul 19, 2017

Some time ago an alternative approach to ranking operation was introduced:

https://github.com/CeON/CoAnSys/blob/298863befc2f0e3a96b25a9ee53f6b53b41090a6/document-similarity/document-similarity-logic/src/main/pig/document-similarity-s1-ship-rank_filter.pig

involving custom rank operation written in rank.py script introduced in 318d88c commit.

An alternative oozie execution path could be selected by enabling load_filterTerms_calcTfidf_filter_ship_ranked flag.

This was a solution to memory related issues related to PIG embedded rank operation. In fact this may have been caused by the very same reason as the one causing #425.

The thing is as soon as #425 is fixed and PIG embedded rank operator works properly we can get rid of this alternative path.

It is useless anyway because it causes failure at later docsim stage. Probably both ranking related PIG scripts diverged at some point and an alternative one is not fully compliant with main one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant