audiomate.corpus¶

This module contains all parts needed for using a corpus. Aside the main corpus class audiomate.Corpus, there are different loaders in the audiomate.corpus.io and the assets used in a corpus in audiomate.corpus.assets.

CorpusView¶

class audiomate.corpus.CorpusView[source]¶

This class defines the basic interface of a corpus. It is not meant to be instantiated directly. It only describes the methods for accessing data of the corpus.

Notes

All paths to files should be held as absolute paths in memory.

all_label_values(label_list_ids=None)[source]¶

Return a set of all label-values occurring in this corpus.

Parameters:	label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:	A set of distinct label-values.
Return type:	set

feature_containers¶

Return the feature-containers in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.FeatureContainer` objects with the feature-idx as key.
Return type:	dict

files¶

Return the files in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.File` objects with the file-idx as key.
Return type:	dict

issuers¶

Return the issuers in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.Issuer` objects with the issuer-idx as key.
Return type:	dict

label_count(label_list_ids=None)[source]¶

Return a dictionary containing the number of times, every label-value in this corpus is occurring.

Parameters:	label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:	A dictionary containing the number of occurrences with the label-value as key.
Return type:	dict

label_durations(label_list_ids=None)[source]¶

Return a dictionary containing the total duration, every label-value in this corpus is occurring.

Parameters:	label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:	A dictionary containing the total duration with the label-value as key.
Return type:	dict

name¶: Return the name of the dataset (Equals basename of the path, if not None).

num_feature_containers¶: Return the number of feature-containers in the corpus.

num_files¶: Return number of files.

num_issuers¶: Return the number of issuers in the corpus.

num_subviews¶: Return the number of subviews in the corpus.

num_utterances¶: Return number of utterances.

stats()[source]¶

Return statistics calculated overall samples of all utterances in the corpus.

Returns:	A DataStats object containing statistics overall samples in the corpus.
Return type:	DataStats

stats_per_utterance()[source]¶

Return statistics calculated for all samples of each utterance in the corpus.

Returns:	A dictionary containing a DataStats object for each utt.
Return type:	dict

subviews¶

Return the subviews of the corpus.

Returns:	A dictionary containing `audiomate.corpus.Subview` objects with the subview-idx as key.
Return type:	dict

total_duration¶: Return the total amount of audio summed over all utterances in the corpus in seconds.

utterances¶

Return the utterances in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.Utterance` objects with the utterance-idx as key.
Return type:	dict

Corpus¶

class audiomate.corpus.Corpus(path=None)[source]¶

The Corpus class represents a single corpus. It extends audiomate.corpus.CorpusView with the functionality for loading and saving. Furthermore it provides the functionality for adding/modifying assets of the corpus like files and utterances.

Parameters:	path (str) – Path where the corpus is stored. (Optional)

feature_containers¶

Return the feature-containers in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.FeatureContainer` objects with the feature-idx as key.
Return type:	dict

files¶

Return the files in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.File` objects with the file-idx as key.
Return type:	dict

classmethod from_corpus(corpus)[source]¶

Create a new modifiable corpus from any other CorpusView. This for example can be used to create a independent modifiable corpus from a subview.

Parameters:	corpus (CorpusView) – The corpus to create a copy from.
Returns:	A new corpus with the same data as the given one.
Return type:	Corpus

import_files(files)[source]¶

Add the given files/file to the corpus. If any of the given file-ids already exists, a suffix is appended so it is unique.

Parameters:	files (list) – Either a list of or a single `audiomate.corpus.assets.File`.
Returns:	A dictionary containing file idx mappings (old-file-idx/file-instance). If a file is imported, whose id already exists this mapping can be used to check the new id.
Return type:	dict

import_issuers(issuers)[source]¶

Add the given issuers/issuer to the corpus. If any of the given issuer-ids already exists, a suffix is appended so it is unique.

Parameters:	issuers (list) – Either a list of or a single `audiomate.corpus.assets.Issuer`.
Returns:	A dictionary containing file idx mappings (old-issuer-idx/issuer-instance). If a issuer is imported, whose id already exists this mapping can be used to check the new id.
Return type:	dict

import_subview(idx, subview)[source]¶

Add the given subview to the corpus.

Parameters:	idx (str) – An idx that is unique in the corpus for identifying the subview. If already a subview exists with the given id it will be overridden. subview (Subview) – The subview to add.

import_utterances(utterances)[source]¶

Add the given utterances/utterance to the corpus. If any of the given utterance-ids already exists, a suffix is appended so it is unique.

Parameters:	utterances (list) – Either a list of or a single `audiomate.corpus.assets.Utterance`.
Returns:	A dictionary containing file idx mappings (old-utterance-idx/utterance-instance). If a utterance is imported, whose id already exists this mapping can be used to check the new id.
Return type:	dict

issuers¶

Return the issuers in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.Issuer` objects with the issuer-idx as key.
Return type:	dict

classmethod load(path, reader=None)[source]¶

Loads the corpus from the given path, using the given reader. If no reader is given the audiomate.corpus.io.DefaultReader is used.

Parameters:	path (str) – Path to load the corpus from. reader (str, CorpusReader) – The reader or the name of the reader to use.
Returns:	The loaded corpus.
Return type:	Corpus

classmethod merge_corpora(corpora)[source]¶

Merge a list of corpora into one.

Parameters:	corpora (Iterable) – An iterable of `audiomate.corpus.CorpusView`.
Returns:	A corpus with the data from all given corpora merged into one.
Return type:	Corpus

merge_corpus(corpus)[source]¶

Merge the given corpus into this corpus. All assets (files, utterances, issuers, …) are copied into this corpus. If any ids (utt-idx, file-idx, issuer-idx, subview-idx, …) are occurring in both corpora, the ids from the merging corpus are suffixed by a number (starting from 1 until no other is matching).

Parameters:	corpus (CorpusView) – The corpus to merge.

name¶: Return the name of the dataset (Equals basename of the path, if not None).

new_feature_container(idx, path=None)[source]¶

Add a new feature container with the given data.

Parameters:	idx (str) – An unique identifier within the dataset. path (str) – The path to store the feature file. If None a default path is used.
Returns:	The newly added feature-container.
Return type:	FeatureContainer

new_file(path, file_idx, copy_file=False)[source]¶

Adds a new file to the corpus with the given data.

Parameters:	path (str) – Path of the file to add. file_idx (str) – The id to associate the file with. copy_file (bool) – If True the file is copied to the data set folder, otherwise the given path is used directly.
Returns:	The newly added File.
Return type:	File

new_issuer(issuer_idx, info=None)[source]¶

Add a new issuer to the dataset with the given data.

Parameters:	issuer_idx (str) – The id to associate the issuer with. If None or already exists, one is generated. info (dict, list) – Additional info of the issuer.
Returns:	The newly added issuer.
Return type:	Issuer

new_utterance(utterance_idx, file_idx, issuer_idx=None, start=0, end=-1)[source]¶

Add a new utterance to the corpus with the given data.

Parameters:	file_idx (str) – The file id the utterance is in. utterance_idx (str) – The id to associate with the utterance. If None or already exists, one is generated. issuer_idx (str) – The issuer id to associate with the utterance. start (float) – Start of the utterance within the file [seconds]. end (float) – End of the utterance within the file [seconds]. -1 equals the end of the file.
Returns:	The newly added utterance.
Return type:	Utterance

save(writer=None)[source]¶: If self.path is defined, it tries to save the corpus at the given path.

save_at(path, writer=None)[source]¶

Save this corpus at the given path. If the path differs from the current path set, the path gets updated.

Parameters:	path (str) – Path to save the data set to. writer (str, CorpusWriter) – The writer or the name of the reader to use.

subviews¶

Return the subviews of the corpus.

Returns:	A dictionary containing `audiomate.corpus.Subview` objects with the subview-idx as key.
Return type:	dict

utterances¶

Return the utterances in the corpus.

Returns:	A dictionary containing `audiomate.corpus.assets.Utterance` objects with the utterance-idx as key.
Return type:	dict