-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad pub_date_sort
for some really old objects from SDR
#1319
Comments
@cbeer you added a comment here indicating that searchworks_traject_indexer/lib/traject/config/sdr_config.rb Lines 170 to 172 in cdad30e
It looks like Can we copy this code https://github.com/sul-dlss/searchworks_traject_indexer/blob/cdad30e01b4e44880af5c68923464adad8a74e2e/lib/traject/config/folio_config.rb#L831C11-L831C31 to a |
Well, we're very much still using it: My vague memory is |
This is happening because what is being returned from https://github.com/sul-dlss/stanford-mods/blob/19af5eb4e1be528f3ee71a18c19a3a99fd435d8c/lib/stanford-mods/concerns/origin_info.rb#L47-L49 the stanford mods method is a Stanford::Mods::Imprint::DateRange. And apparently sorting that class returns a 6000. Super odd but okay. The quickest fix would be to update the
pub_year_int not sure if there are implications of this but it looks like it is the exact same method except it makes sure there is actually a date and if there is a daterange it will pull the data out of it. https://github.com/sul-dlss/stanford-mods/blob/19af5eb4e1be528f3ee71a18c19a3a99fd435d8c/lib/stanford-mods/concerns/origin_info.rb#L19-L38.
|
I think the problem with |
@cbeer are you saying we want pub_date_sort to return a string and pub_year_int is always going to return an int and it won't allow for something like 195x? I am not sure I understand what specifically what in the code for pub_year_int is going to cause a problem. From what I can tell it will it checks to see if it is a daterange. If it is it will get the start or stop date. From there it will it will get the date value of the date from earliest_preferred_date (which is what pub_year_sort_str) is getting. From there is checks that the date is a value, then it will check if the date is an interval if it is it will get the from.year value, otherwise it will get the year value. So I guess I am wondering what part of that will cause problems? Because at the very least we need to integrate some of that into the pub_year_sort_str function. |
Check out the searchworks_traject_indexer/lib/traject/config/folio_config.rb Lines 726 to 735 in 2720faa
I'm not sure how we handle that kind of requirement with a |
from what I can tell we aren't doing that at all with |
Sure, but the MARC and MODS data are both indexing into the same field so we can give a consistent sort across all records. I'm not sure we can sort by |
Do you have time for a huddle on this latter? I get we can't use pub_year_int for this but I would like some context around which part we can't use so I an update the pub_date_sort method so it works correctly. |
Okay I thought about it. The only other way I could think of was to allow ints but then fix what we are getting from the MARC records so that they are numbers. i.e. if we got 19xx we would convert to 1900 but marc is so variable that I think that would be way more messy and not provide us with a better result. I have updated the BCE conversion to -10000 and I change *-1 to abs just because I think its neater. |
E.g. https://searchworks.stanford.edu/view/qb122dq4313.json has a pub_date_sort of
6000
and ends up sorting wrong.The text was updated successfully, but these errors were encountered: