praat-textgrids -- Praat TextGrid manipulation in Python

Description

textgrids is a module for handling Praat TextGrid files in any format (short text, long text, or binary). The module implements five classes, from largest to smallest:

TextGrid -- a dict with tier names as keys and Tiers as values
Tier -- a list of either Interval or Point objects
Interval -- an object representing Praat intervals
Point -- a namedtuple representing Praat points
Transcript -- a str with special methods for transcription handling

All Praat text objects are represented as Transcript objects.

The module also exports the following variables:

diacritics -- a dict of all diacritics with their Unicode counterparts
inline_diacritics -- a dict of inline (symbol-like) diacritics
index_diacritics -- a dict of over/understrike diacritics
symbols -- a dict of special Praat symbols with their Unicode counterparts
version -- module version as string
vowels -- a list of all vowels in either Praat or Unicode notation

Version

This file documents praat-textgrids version 1.2.0.

Copyright

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Module contents

0. Module properties

Besides textgrids.version, which contains the module version number as string, the module exports the following properties:

0.1. symbols

symbols is a dict that contains all the Praat special notation symbols (as keys) and their Unicode counterparts (as values).

0.2. vowels

vowels is a list of all vowel symbols in either Praat notation (e.g., "\as") or in Unicode. It is used by Interval methods containsvowel() and startswithvowel(), so changing it, for example, adding new symbols to it or removing symbols used for other purposes in a specific case, will change how those methods function.

0.3. diacritics, inline_diacritics, and index_diacritics

diacritics is a dict of all diacritics in Praat notation (as keys) and their Unicode counterparts (as values).

inline_diacritics and index_diacritics are subsets of diacritics. The former are semantically diacritics but appear as inline symbols, the latter are the “true” diacritics (i.e., under- or overstrikes) that need special handling when transcoding.

1. TextGrid

TextGrid is a dict whose keys are tier names (strings) and values are Tier objects. The constructor takes an optional filename argument for easy loading and parsing textgrid files.

1.1. Properties

All the properties of dicts plus:

filename holds the textgrid filename, if any. read() and write() methods both set or update it.

1.2. Methods

All the methods of dicts plus:

parse() -- parse string data into a TextGrid
read() -- read a TextGrid file name
write() -- write a TextGrid file name
tier_from_csv() -- read a textgrid tier from a CSV file
tier_to_csv() -- write a textgrid tier into a CSV file

parse() takes an obligatory string argument (Praat-format textgrid data) and an optional argument binary=BOOLEAN. If passed binary data, the argument has to be given.

read() and write() each take an obligatory filename argument. read() can take an optional argument binary=BOOLEAN. Opening a binary file the argument has to be given.

tier_from_csv() and tier_to_csv() both take two obligatory arguments, the tier name and the filename, in that order.

2. Tier

Tier is a list of either Interval or Point objects.

NOTE: Tier only allows adding Interval or Point objects. Adding anything else or mixing Intervals and Points will trigger an exception.

2.2. Properties

All the properties of lists plus:

is_point_tier -- Boolean value: True for point tier, False for interval tier.

2.3. Methods

All the methods of lists plus:

concat() -- concatenate intervals
to_csv() -- convert tier data into a CSV-like list

concat() returns a TypeError if used with a point tier. It takes two optional arguments, first= and last=, both of which are integer indexes with the usual Python semantics: 0 stands for the first element, -1 for the last element, these being also the defaults.

to_csv() returns a CSV-like list. It’s mainly intended to be used from the TextGrid level method tier_to_csv() but can be called directly if writing to a file is not desired.

3. Interval

Interval is an object class.

3.1. Properties

dur -- interval duration (float)
mid -- interval midpoint (float)
text -- text label (Transcript)
xmax -- interval end time (float)
xmin -- interval start time (float)

3.3. Methods

containsvowel() -- Boolean: does the interval contain a vowel?
startswithvowel() -- Boolean: does the interval start with a vowel?
timegrid() -- create a grid of even time slices

containsvowel() and startswithvowel() check for possible vowels in both Praat notation and Unicode but can of course make an error if symbols are used in an unexpected way. They don’t take arguments.

NOTE: At the moment there is no endswithvowel() as might perhaps be expected. This is a result of an early implementation bug and might get corrected in the future.

timegrid() returns a list of timepoints (in float) evenly distributed from xmin to xmax. It takes an optional integer argument specifying the number of timepoints desired; the default is 3. It raises a ValueError if the argument is not an integer or is less than 1.

4. Point

Point is a namedtuple with two properties: text and xpos.

4.1. Properties

text -- text label (Transcript)
xpos -- temporal position (float)

5. Transcript

Transcript is a str-derived class with one special method: transcode().

5.1. Properties

All the properties of strs.

5.2. Methods

All the methods of strs plus:

transcode() -- convert Praat notation to Unicode or vice versa.

Without arguments, transcode() assumes its input to be in Praat notation and converts it to Unicode; no check is made as to whether the input really is in Praat notation but nothing should happen if it isn’t. User should take care and handle any exceptions.

Optional to_unicode=False argument inverts the direction of the transcoding from Unicode to Praat. Again, it is not checked whether input is in Unicode.

With optional retain_diacritics=True argument the transcoding does not remove over- and understrike diacritics from the result.

Example code

import sys
import textgrids

for arg in sys.argv[1:]:
    # Try to open the file as textgrid
    try:
        grid = textgrids.TextGrid(arg)
    # Discard and try the next one
    except:
        continue

    # Assume "syllables" is the name of the tier
    # containing syllable information
    for syll in grid['syllables']:
        # Convert Praat to Unicode in the label
        label = syll.text.transcode()
        # Print label and syllable duration, CSV-like
        print('"{}";{}'.format(label, syll.dur))

Plans for the future

TextGrid.read() and TextGrid.parse() should analyze the file or data and automatically select either text or binary handling as needed. (After all, when parsing text files, short or long format is automatically recognized.)
TextGrid.__str()__ will continue to produce long text format in the future too, but TextGrid.write() should be able to produce any of the three formats.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
textgrids		textgrids
.gitignore		.gitignore
HISTORY.txt		HISTORY.txt
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

praat-textgrids -- Praat TextGrid manipulation in Python

Description

Version

Copyright

Module contents

0. Module properties

0.1. symbols

0.2. vowels

0.3. diacritics, inline_diacritics, and index_diacritics

1. TextGrid

1.1. Properties

1.2. Methods

2. Tier

2.2. Properties

2.3. Methods

3. Interval

3.1. Properties

3.3. Methods

4. Point

4.1. Properties

5. Transcript

5.1. Properties

5.2. Methods

Example code

Plans for the future

About

Releases

Packages

Languages

License

cmmario94/Praat-textgrids

Folders and files

Latest commit

History

Repository files navigation

praat-textgrids -- Praat TextGrid manipulation in Python

Description

Version

Copyright

Module contents

0. Module properties

0.1. symbols

0.2. vowels

0.3. diacritics, inline_diacritics, and index_diacritics

1. TextGrid

1.1. Properties

1.2. Methods

2. Tier

2.2. Properties

2.3. Methods

3. Interval

3.1. Properties

3.3. Methods

4. Point

4.1. Properties

5. Transcript

5.1. Properties

5.2. Methods

Example code

Plans for the future

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages