audiomate.corpus.assets

The assets module contains data-structures that are contained in a corpus.

File

class audiomate.corpus.assets.File(idx, path)[source]

The file object is used to hold any data/infos about a file contained in a corpus.

Parameters:
  • idx (str) – A unique identifier within a corpus for the file.
  • path (str) – The path to the file.
duration

Return the duration in seconds.

num_channels

Return the number of channels.

num_samples

Return the total number of samples.

read_samples(sr=None, offset=0, duration=None)[source]

Return the samples from the file. Uses librosa for loading (see http://librosa.github.io/librosa/generated/librosa.core.load.html).

Parameters:
  • sr (int) – If None uses the sampling rate given by the file, otherwise resamples to the given sampling rate.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A numpy array containing the samples as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate

Return the sampling rate.

Utterance

class audiomate.corpus.assets.Utterance(idx, file, issuer=None, start=0, end=-1, label_lists=None)[source]

An utterance defines a sample of audio. It is part of a file or can span over the whole file.

Parameters:
  • idx (str) – A unique identifier for the utterance within a dataset.
  • file (File) – The file this utterance is belonging to.
  • issuer (Issuer) – The issuer this utterance was created from.
  • start (float) – The start of the utterance within the audio file in seconds. (default 0)
  • end (float) – The end of the utterance within the audio file in seconds. -1 indicates that the utterance ends at the end of the file. (default -1)
  • label_lists (LabelList, list) – A single or multiple label-lists.
Variables:

label_lists (dict) – A dictionary containing label-lists with the label-list-idx as key.

all_label_values(label_list_ids=None)[source]

Return a set of all label-values occurring in this utterance.

Parameters:label_list_ids (list) – If not None, only label-values from label-lists with an id contained in this list are considered.
Returns:A set of distinct label-values.
Return type:set
duration

Return the absolute duration in seconds.

end_abs

Return the absolute end of the utterance relative to the signal.

label_count(label_list_ids=None)[source]

Return a dictionary containing the number of times, every label-value in this utterance is occurring.

Parameters:label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:A dictionary containing the number of occurrences with the label-value as key.
Return type:dict
label_total_duration(label_list_ids=None)[source]

Return a dictionary containing the number of seconds, every label-value is occurring in this utterance.

Parameters:label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:A dictionary containing the number of seconds with the label-value as key.
Return type:dict
num_samples(sr=None)[source]

Return the number of samples.

Parameters:sr (int) – Calculate the number of samples with the given sampling-rate. If None use the native sampling-rate.
Returns:Number of samples
Return type:int
read_samples(sr=None, offset=0, duration=None)[source]

Read the samples of the utterance.

Parameters:
  • sr (int) – If None uses the sampling rate given by the file, otherwise resamples to the given sampling rate.
  • offset (float) – Offset in seconds to read samples from.
  • duration (float) – If not None read only this number of seconds in maximum.
Returns:

A numpy array containing the samples as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate

Return the sampling rate.

set_label_list(label_lists)[source]

Set the given label-list for this utterance. If the label-list-idx is not set, default is used. If there is already a label-list with the given idx, it will be overriden.

Parameters:label_list (LabelList, list) – A single or multiple label-lists to add.

Issuer

class audiomate.corpus.assets.Issuer(idx, info={})[source]

The issuer represents a person, object or something that produced an utterance. Technically the issuer can be used to group utterances which came from the same source.

Parameters:
  • idx (str) – An unique identifier for this issuer within a dataset.
  • info (dict) – Any additional info for this issuer as dict.
Variables:

utterances (list) – List of utterances that this issuer owns.

Speaker

class audiomate.corpus.assets.Speaker(idx, gender=<Gender.UNKNOWN: 'unknown'>, age_group=<AgeGroup.UNKNOWN: 'unknown'>, native_language=None, info={})[source]

The speaker is the person who spoke in a utterance.

Parameters:
  • idx (str) – An unique identifier for this speaker within a dataset.
  • info (dict) – Any additional info for this speaker as dict.
  • age_group (AgeGroup) – The age-group of the speaker (child, adult, …)
  • native_language (str) – The native language of the speaker. (ISO 639-3)
Variables:

utterances (list) – List of utterances that this issuer owns.

Artist

class audiomate.corpus.assets.Artist(idx, name, info={})[source]

The artist is the person/group who have produced a musical segment in a utterance.

Parameters:
  • idx (str) – An unique identifier for this speaker within a dataset.
  • name (str) – The name of the artist/band/…
  • info (dict) – Any additional info for this speaker as dict.
Variables:

utterances (list) – List of utterances that this issuer owns.

Label

class audiomate.corpus.assets.Label(value, start=0, end=-1, meta=None)[source]

Represents a label that describes some part of an utterance.

Parameters:
  • value (str) – The text of the label.
  • start (float) – Start of the label within the utterance in seconds. (default: 0)
  • end (float) – End of the label within the utterance in seconds. (default: -1) (-1 defines the end of the utterance)
  • meta (dict) – A dictionary containing additional information for the label.
Variables:

label_list (LabelList) – The label-list this label is belonging to.

duration

Return the duration of the label in seconds.

end_abs

Return the absolute end of the label in seconds relative to the signal. If the label isn’t linked to any utterance via label-list, it is assumed self.end is relative to the start of the signal, hence self.end == self.end_abs.

read_samples(sr=None)[source]

Read the samples of the utterance.

Parameters:sr (int) – If None uses the sampling rate given by the file, otherwise resamples to the given sampling rate.
Returns:A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type:np.ndarray
start_abs

Return the absolute start of the label in seconds relative to the signal. If the label isn’t linked to any utterance via label-list, it is assumed self.start is relative to the start of the signal, hence self.start == self.start_abs.

LabelList

class audiomate.corpus.assets.LabelList(idx='default', labels=[])[source]

Represents a list of labels which describe an utterance. An utterance can have multiple label-lists.

Parameters:
  • idx (str) – An unique identifier for the label-list within a corpus for one utterance.
  • labels (list) – The list containing the audiomate.corpus.assets.Label.
Variables:

utterance (Utterance) – The utterance this label-list is belonging to.

Example:

>>> label_list = LabelList(idx='transcription', labels=[
>>>     Label('this', 0, 2),
>>>     Label('is', 2, 4),
>>>     Label('timmy', 4, 8)
>>> ])
append(label)[source]

Add a label to the end of the list.

Parameters:label (Label) – The label to add.
apply(fn)[source]

Apply the given function fn to every label in this label list. fn is a function of one argument that receives the current label which can then be edited in place.

Parameters:fn (func) – Function to apply to every label

Example

>>> ll = LabelList(labels=[
...     Label('a_label', 1.0, 2.0),
...     Label('another_label', 2.0, 3.0)
... ])
>>> def shift_labels(label):
...     label.start += 1.0
...     label.end += 1.0
...
>>> ll.apply(shift_labels)
>>> ll.labels
[Label(a_label, 2.0, 3.0), Label(another_label, 3.0, 4.0)]
extend(labels)[source]

Add a list of labels to the end of the list.

Parameters:labels (list) – Labels to add.
label_count()[source]

Return for each label the number of occurrences within the list.

Returns:A dictionary containing for every label-value (key) the number of occurrences (value).
Return type:dict

Example

>>> ll = LabelList(labels=[
>>>     Label('a', 3.2, 4.5),
>>>     Label('b', 5.1, 8.9),
>>>     Label('a', 7.2, 10.5),
>>>     Label('b', 10.5, 14),
>>>     Label('a', 15, 18)
>>> ])
>>> ll.label_count()
{'a': 3 'b': 2}
label_total_duration()[source]

Return for each distinct label value the total duration of all occurrences.

Returns:A dictionary containing for every label-value (key) the total duration in seconds (value).
Return type:dict

Example

>>> ll = LabelList(labels=[
>>>     Label('a', 3, 5),
>>>     Label('b', 5, 8),
>>>     Label('a', 8, 10),
>>>     Label('b', 10, 14),
>>>     Label('a', 15, 18.5)
>>> ])
>>> ll.label_total_duration()
{'a': 7.5 'b': 7.0}
label_values()[source]

Return a list of all occuring label values.

Returns:Lexicographically sorted list (str) of label values.
Return type:list

Example

>>> ll = LabelList(labels=[
>>>     Label('a', 3.2, 4.5),
>>>     Label('b', 5.1, 8.9),
>>>     Label('c', 7.2, 10.5),
>>>     Label('d', 10.5, 14),
>>>     Label('d', 15, 18)
>>> ])
>>> ll.label_values()
['a', 'b', 'c', 'd']
ranges(yield_ranges_without_labels=False, include_labels=None)[source]

Generate all ranges of the label-list. A range is defined as a part of the label-list for which the same labels are defined.

Parameters:
  • yield_ranges_without_labels (bool) – If True also yields ranges for which no labels are defined.
  • include_labels (list) – If not empty, only the label values in the list will be considered.
Returns:

A generator which yields one range (tuple start/end/list-of-labels) at a

time.

Return type:

generator

Example

>>> ll = LabelList(labels=[
>>>     Label('a', 3.2, 4.5),
>>>     Label('b', 5.1, 8.9),
>>>     Label('c', 7.2, 10.5),
>>>     Label('d', 10.5, 14)
>>> ])
>>> ranges = ll.ranges()
>>> next(ranges)
(3.2, 4.5, [<audiomate.corpus.assets.label.Label at 0x1090527c8>])
>>> next(ranges)
(4.5, 5.1, [])
>>> next(ranges)
(5.1, 7.2, [<audiomate.corpus.assets.label.Label at 0x1090484c8>])

FeatureContainer

class audiomate.corpus.assets.FeatureContainer(path)[source]

A feature-container holds matrix-like data. The data is stored as HDF5 file. The feature-container provides functionality to access this data. For each utterance a hdf5 data set is created within the file, if there is feature-data for a given utterance.

Parameters:path (str) – Path to where the HDF5 file is stored. If the file doesn’t exist, one is created.
Examples::
>>> fc = FeatureContainer('/path/to/hdf5file')
>>> with fc:
>>>     fc.set('utt-1', np.array([1,2,3,4]))
>>>     data = fc.get('utt-1')
array([1, 2, 3, 4])
close()[source]

Close the feature container file if its open.

frame_size

The number of samples used per frame.

get(utterance_idx, mem_map=True)[source]

Read and return the features stored for the given utterance-id.

Parameters:
  • utterance_idx (str) – The ID of the utterance to get the feature-matrix from.
  • mem_map (bool) – If True returns the features as memory-mapped array, otherwise a copy is returned.

Note

The feature container has to be opened in advance.

Returns:The stored data.
Return type:numpy.ndarray
hop_size

The number of samples between two frames.

keys()[source]

Return all keys available in the feature-container.

Returns:List of identifiers available in the feature-container.
Return type:keys (list)

Note

The feature container has to be opened in advance.

open()[source]

Open the feature container file in order to read/write to it.

remove(utterance_idx)[source]

Remove the features stored for the given utterance-id.

Parameters:utterance_idx (str) – ID of the utterance.

Note

The feature container has to be opened in advance.

sampling_rate

The sampling-rate of the signal these frames are based on.

set(utterance_idx, features)[source]

Add the given feature matrix to the feature container for the utterance with the given id. Any existing features of the utterance in this container are discarded/overwritten.

Parameters:
  • utterance_idx (str) – The ID of the utterance to store the features for.
  • features (numpy.ndarray) – A np.ndarray with the features.

Note

The feature container has to be opened in advance.

stats()[source]

Return statistics calculated overall features in the container.

Note

The feature container has to be opened in advance.

Returns:Statistics overall data points of all features.
Return type:DataStats
stats_per_utterance()[source]

Return statistics calculated for each utterance in the container.

Note

The feature container has to be opened in advance.

Returns:A dictionary containing a DataStats object for each utterance.
Return type:dict

PartitioningFeatureIterator

class audiomate.corpus.assets.PartitioningFeatureIterator(hdf5file, partition_size, shuffle=True, seed=None, includes=None, excludes=None)[source]

Iterates over all features in the given HDF5 file.

Before iterating over the features, the iterator slices the file into one or more partitions and loads the data into memory. This leads to significant speed-ups even with moderate partition sizes, regardless of the type of disk (spinning or flash). Pseudo random access is supported with a negligible impact on performance and randomness: The data is randomly sampled (without replacement) within each partition and the partitions are loaded in random order, too.

The features are emitted as triplets in the form of (utterance name, index of the feature within the utterance, feature).

When calculating the partition sizes only the size of the features itself is factored in, overhead of data storage is ignored. This overhead is usually negligible even with partition sizes of multiple gigabytes because the data is stored as numpy ndarrays in memory (one per utterance). The overhead of a single ndarray is 96 bytes regardless of its size. Nonetheless the partition size should be chosen to be lower than the total available memory.

Parameters:
  • hdf5file (h5py.File) – HDF5 file containing the features
  • partition_size (str) – Size of the partitions in bytes. The units k (kibibytes), m (mebibytes) and g (gibibytes) are supported, i.e. a partition_size of 1g equates \(2^{30}\) bytes.
  • shuffle (bool) – Indicates whether the features should be returned in random order (True) or not (False).
  • seed (int) – Seed to be used for the random number generator.
  • includes (iterable) – Iterable of names of data sets that should be included when iterating over the feature container. Mutually exclusive with excludes. If both are specified, only includes will be considered.
  • excludes (iterable) – Iterable of names of data sets to skip when iterating over the feature container. Mutually exclusive with includes. If both are specified, only includes will be considered.

Example

>>> import h5py
>>> from audiomate.corpus.assets import PartitioningFeatureIterator
>>> hdf5 = h5py.File('features.h5', 'r')
>>> iterator = PartitioningFeatureIterator(hdf5, '12g', shuffle=True)
>>> next(iterator)
('music-fma-0100', 227, array([-0.15004082, -0.30246958, -0.38708138, ..., -0.93471956,
       -0.94194776, -0.90878332], dtype=float32))
>>> next(iterator)
('music-fma-0081', 2196, array([-0.00207647, -0.00101351, -0.00058832, ..., -0.00207647,
       -0.00292684, -0.00292684], dtype=float32))
>>> next(iterator)
('music-hd-0050', 1026, array([-0.57352495, -0.63049972, -0.63049972, ...,  0.82490814,
        0.84680521,  0.75517786], dtype=float32))