Since we want to have a consistent abstraction of different formats and datasets, it is important that all data and information is mapped correctly into the python classes.
The issuer holds information about the source of the audio content. Depending on the audio content different attributes are important. Therefore different types of issuers can be used.
In the corpus data structures an utterance can have multiple label-lists. In order to access a label-list a key is used.
utterance = ... label_list = utterance.label_lists['word-transcription']
The used key should be consistent for all datasets. Therefore the identifiers/keys should be selected from below
if possible. For these predefined keys, constants are defined in
A high-level category for a given audio excerpt. Should be one of the following values:
- Non-aligned transcription of speech.
- Non-aligned transcription of speech. Used for unprocessed transcriptions (e.g. containing punctuation, …).
- Aligned transcription of speech. The begin and end of the words is defined. Every word is a single label in the label-list.
- Non-aligned transcription of phones.
- Aligned transcription of phones. Begin and end of phones is defined.
- The genre of the music.
- Labels defining any sound-event, acoustic-scene, environmental noise, … e.g. siren, dog_bark, train, car, snoring …
This list isn’t complete. Please open an issue for any additional domains/classes that maybe needed.