-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating TextGrid object fails for large TextGrid files #6
Comments
Hi, I m also a user.
But there are many limitations to the things I can do to try and bypass the issue: |
That’s curious. That line only defines a function; I could have used the longer form with However, I could understand if calling the function caused a crash unless the string supplied as a parameter is of the form
Part of the difficulty is that the Praat developers didn’t think it worth while to have distinct headers for the long and short-form textgrid files; both have file type |
Dear Tommi,
This is very unlikely because the package works ceteris paribus for small-size textgrids
I've tried to see how "grab" works, seems simple enough, but I am not very familiar as to why the error is spotted line 372, where the grab tool is defined, and not when it is used. But again I am not a pro with debugging packages / functions
I myself don't really understand why there should be a need for two types of files, but anyway. I don't have the history. Cheers anyway :) |
That might be helpful… I think there might be a third option too. As the value is converted to I didn’t actually expect anyone to come up with a huge textgrid file as they tend to be significantly smaller than the sound files. The script basically just reads them in full instead of carefully reading line-by-line or chunk-by-chunk. In my setting (using a physical computer instead of, say, a virtual environment) it could gobble whatever I tried to feed it.
I quite agree. In fact, both text file formats are poorly designed. The short form was probably meant for conserving some space but retaining legibility for humans: it’s actually quite close to the binary format in how it’s structured. The long form, on the other hand, is very readable for humans but a pain in the a*se to parse programmatically. No doubt the creators of Praat cannot change the format to JSON or XML or whatever any longer because everyone already has so many text-form textgrids lying around. |
I'm running my script on bare metal too. My TextGrid files are transcriptions of sociolinguistic interviews of about an hour in length. With four transcription tiers I end up with files between 1 and 3+MB. For the time being, I'll try to split the files into smaller parts.
I guess only the praat developers know that. |
I looked at it in the weekend and it’s a tough call.
So yeah, it’s not optimized in any way, but then again, I’m not a real programmer myself :) Huge files might need an altogether different kind of implementation. |
I agree, It probably isn't a matter of global size, but a temporary variable.
|
I'm creating a TextGrid object from a file like so
try: grid = textgrids.TextGrid(arg)
Which works well for smaller files (~300KB), but fails with the following error message for longer files ~(1.7MB-3MB)
Traceback (most recent call last): File "***", line 9, in <module> grid = textgrids.TextGrid(arg) File "***/.virtualenvs/test/lib/python3.9/site-packages/textgrids/__init__.py", line 151, in __init__ self.read(self.filename) File "***/.virtualenvs/test/lib/python3.9/site-packages/textgrids/__init__.py", line 402, in read self.parse(data) File "***/.virtualenvs/test/lib/python3.9/site-packages/textgrids/__init__.py", line 288, in parse self._parse_long(buff) File "***/.virtualenvs/test/lib/python3.9/site-packages/textgrids/__init__.py", line 359, in _parse_long x0, x1 = [float(grab(s)) for s in data[p:p + 2]] File "***/.virtualenvs/test/lib/python3.9/site-packages/textgrids/__init__.py", line 359, in <listcomp> x0, x1 = [float(grab(s)) for s in data[p:p + 2]] File "***/.virtualenvs/test/lib/python3.9/site-packages/textgrids/__init__.py", line 339, in <lambda> grab = lambda s: s.split(' = ')[1] IndexError: list index out of range
Might that be a bug in the library, or some Python limitation?
The text was updated successfully, but these errors were encountered: