-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change regular expression to allow both upper and lower case letters in language code #780
Conversation
…for the part after the dash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me. We can deploy this to Dev or Stg to test.
Jing
@sfisher Hi Scott, there are two language fields in the Advanced DOI registration form:
@adambuttrick Hi Adam, do we need to apply this change to both fields or only one (which one)? Also, should we make the check case insensitive or only allows some combinations such as:
Jing |
DataCite staff confirmed that use of mixed case across language tagging in fields should not result in validation errors. The change should be applied to all fields that make use of language tagging, as either is valid. |
I believe the language is using the correct constraint everywhere we allow it. There is one constant for this and it is re-used in the places that there is an When you look at DataCite 4.5, section 9 where it talks about the other Language that you mention it is a recommended value and we are currently not enforcing any value at all that I can see when I look at the code (whereas the other places, the enforcement is clear). As I understand it, DataCite doesn't enforce that it has to meet this controlled format and neither do we in our code, so there is not need to change this unless we begin enforcing it when DataCite doesn't. From the the spec for DataCite 4.5, these are the places that the
About
I also wasn't sure if somehow this was required, even though it is marked "recommended." ChatGPT also seemed to agree with me that it's not mandatory to fit in one of these code schemes (though it may be wrong). As I understand it, if we want to enforce this language fits the schemes then we would be stricter than DataCite is. If that's a feature we want, then we can change it but it would be a new feature change and not a correction to the problem that this ticket is about. |
@sfisher Thanks for this! The scope of the ticket is simply aligning with DataCite, where language is recommended, but not required. There is a typology for the levels of obligation with specific meanings:
More details here: https://datacite-metadata-schema.readthedocs.io/en/4.5/properties/overview/ |
After the dash like
en-US
it used to only allow upper case letters.This could be either.
I looked at the DataCite docs and it mentions IETF BCP 47, ISO 639-1 language codes. I looked over these codes and it appears that this regex will cover it (I don't see any non-alphabetic characters in either part).
It looks like the error message is ok:
ERR_LANGUAGE = _("Must be a valid language code (IETF BCP 47 or ISO 639-1)")
I didn't see it come up in the documentation elsewhere when I searched.