-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransformerCLI fails for two records #84
Comments
I am not able to reproduce the errors above. The second one looks similar to the previous fixed issue #81 . |
Ok, I tried got the latest version and tried with the files I sent, and it didn't fail. I tried again with the large zip source files (ipg121030.zip and ipg120417.zip) and it did fail. I made small xml files of the previous patent numbers (US8299092B2 and USPP022671P2) and their respective next patent in the large source zip files, and they failed again. The new xml files are attached. |
Here's one more place where the latest transformer code fails, file attached. |
Fixed this current issue with Index Out Of Bounds error on small document-numbers, early patent numbers, with length below 3. Still need to look at the trailing document issue you noted above, believe it may be due to enclosed xml tags. |
Hello,
I found a couple more bugs, TransformerCLI failed for these patents and dropped out to the command prompt. The two source XML files are attached.
patents.zip
2019-03-15 17:49:05,394 INFO [main] TransformerCli - Record: 'US8299092B2' from D:\patents\ipg121030.zip:2659 Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end 2, length 1 at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source) at java.base/java.lang.String.substring(Unknown Source) at gov.uspto.patent.doc.xml.items.DocumentIdNode.read(DocumentIdNode.java:60) at gov.uspto.patent.doc.xml.fragments.CitationNode.readPatCitations(CitationNode.java:144) at gov.uspto.patent.doc.xml.fragments.CitationNode.read(CitationNode.java:63) at gov.uspto.patent.doc.xml.GrantParser.parse(GrantParser.java:113) at gov.uspto.parser.dom4j.Dom4JParser.parse(Dom4JParser.java:90) at gov.uspto.patent.PatentReader.read(PatentReader.java:82) at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:187) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:129) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:307)
and
2019-03-16 09:34:18,090 INFO [main] TransformerCli - Record: 'USPP022671P2' from D:\patents\ipg120417.zip:435 Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.base/java.lang.StringLatin1.charAt(Unknown Source) at java.base/java.lang.String.charAt(Unknown Source) at gov.uspto.common.text.StringCaseUtil.toTitleCase(StringCaseUtil.java:102) at gov.uspto.patent.doc.xml.GrantParser.parse(GrantParser.java:69) at gov.uspto.parser.dom4j.Dom4JParser.parse(Dom4JParser.java:90) at gov.uspto.patent.PatentReader.read(PatentReader.java:82) at gov.uspto.patent.TransformerCli.processDumpFile(TransformerCli.java:187) at gov.uspto.patent.TransformerCli.process(TransformerCli.java:129) at gov.uspto.patent.TransformerCli.main(TransformerCli.java:307)
The text was updated successfully, but these errors were encountered: