Data Mapping

Since we want to have a consistent abstraction of different formats and datasets, it is important that all data and information is mapped correctly into the python classes.

Issuer

The issuer holds information about the source of the audio content. Depending on the audio content different attributes are important. Therefore different types of issuers can be used.

Speech
For audio content that mainly contains spoken content the audiomate.issuers.Speaker has to be used. This is most common for datasets regarding speech recognition/synthesis etc.
Music
For audio content that contains music, the audiomate.issuers.Artist has to be used.

Labels

In the corpus data structures an utterance can have multiple label-lists. In order to access a label-list a key is used.

utterance = ...
label_list = utterance.label_lists['word-transcription']

The used key should be consistent for all datasets. Therefore the identifiers/keys should be selected from below if possible. For these predefined keys, constants are defined in audiomate.corpus.

general

domain

A high-level category for a given audio excerpt. Should be one of the following values:

  • speech
  • music
  • noise

speech

word-transcript
Non-aligned transcription of speech.
word-transcript-raw
Non-aligned transcription of speech. Used for unprocessed transcriptions (e.g. containing punctuation, …).
word-transcript-aligned
Aligned transcription of speech. The begin and end of the words is defined. Every word is a single label in the label-list.
phone-transcript
Non-aligned transcription of phones.
phone-transcript-aligned
Aligned transcription of phones. Begin and end of phones is defined.

music

genre
The genre of the music.

noise

sound-class
Labels defining any sound-event, acoustic-scene, environmental noise, … e.g. siren, dog_bark, train, car, snoring …

This list isn’t complete. Please open an issue for any additional domains/classes that maybe needed.