audiomate.feeding¶

The audiomate.feeding module provides tools for a simple access to data stored in different audiomate.corpus.assets.Container.

Datasets¶

class audiomate.feeding.Dataset(corpus_or_utt_ids, feature_containers)[source]¶

An abstract class representing a dataset. A dataset provides indexable access to data. An implementation of a concrete dataset should override the methods __len__ and __getitem.

A sample returned from a dataset is a tuple containing the data for this sample from every container. The data from different containers is ordered in the way the containers were passed to the Dataset.

Parameters:	corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for iterating. containers (list, Container) – A single container or a list of containers.

class audiomate.feeding.FrameDataset(corpus_or_utt_ids, container)[source]¶

A dataset wrapping frames of a corpus. A single sample represents a single frame.

Parameters:	corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for iterating. container (list, Container) – A single container or a list of containers.

Note

For a frame dataset it is expected that every container contains exactly one value/vector for every frame. So the first dimension of every array in every container have to match.

Example

>>> corpus = audiomate.Corpus.load('/path/to/corpus')
>>> container_inputs = containers.FeatureContainer('/path/to/features.hdf5')
>>> container_outputs = containers.Container('/path/to/targets.hdf5')
>>>
>>> ds = FrameDataset(corpus, [container_inputs, container_outputs])
>>> len(ds) # Number of frames in the dataset
2938
>>> ds[293] # Frame (inputs, outputs) with index 293
(
    array([0.58843831, 0.18128443, 0.19718328, 0.25284105]),
    array([0.0, 1.0])
)

get_utt_regions()¶

Return the regions of all utterances, assuming all utterances are concatenated. It is assumed that the utterances are sorted in ascending order for concatenation.

A region is defined by offset (in chunks), length (num-chunks) and a list of references to the utterance datasets in the containers.

Returns:	List of with a tuple for every utterances containing the region info.
Return type:	list

partitioned_iterator(partition_size, shuffle=True, seed=None)[source]¶

Return a partitioning audiomate.feeding.FrameIterator for the dataset.

Parameters:	partition_size (str) – Size of the partitions in bytes. The units `k` (kibibytes), `m` (mebibytes) and `g` (gibibytes) are supported, i.e. a `partition_size` of `1g` equates \(2^{30}\) bytes. shuffle (bool) – Indicates whether the data should be returned in random order (`True`) or not (`False`). seed (int) – Seed to be used for the random number generator.
Returns:	A partition iterator over the dataset.
Return type:	FrameIterator

class audiomate.feeding.MultiFrameDataset(corpus_or_utt_ids, container, frames_per_chunk, return_length=False, pad=False)[source]¶

A dataset wrapping chunks of frames of a corpus. A single sample represents a chunk of frames.

A chunk doesn’t overlap an utterances boundaries. So if the utterance length is not divisible by the chunk length, the last chunk of an utterance may be smaller than the chunk size.

Parameters:

corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for iterating.
container (list, Container) – A single container or a list of containers.
frames_per_chunk (int) – Number of subsequent frames in a single sample.
return_length (bool) – If True, the length of the chunk is returned as well. (default False) The length is appended to tuple as the last element. (e.g. [container1-data, container2-data, length])
pad (bool) – If True, samples that are shorter are padded with zeros to match frames_per_chunk. If padding is enabled, the lengths are always returned return_length = True.

Note

For a multi-frame dataset it is expected that every container contains exactly one value/vector for every frame. So the first dimension of every array in every container have to match.

Examples

>>> corpus = audiomate.Corpus.load('/path/to/corpus')
>>> container_inputs = containers.FeatureContainer('/path/to/features.hdf5')
>>> container_outputs = containers.Container('/path/to/targets.hdf5')
>>>
>>> ds = MultiFrameDataset(corpus, [container_inputs, container_outputs], 5)
>>> len(ds) # Number of chunks in the dataset
355
>>> ds[20] # Chunk (inputs, outputs) with index 20
(
    array([[0.72991909, 0.20258683, 0.30574747, 0.53783217],
           [0.38875413, 0.83611128, 0.49054591, 0.15710017],
           [0.35153358, 0.40051009, 0.93647765, 0.29589257],
           [0.97465772, 0.80160451, 0.81871436, 0.4892925 ],
           [0.59310933, 0.8565602 , 0.95468696, 0.07933512]]),
    array([[0.0, 1.0], [0.0, 1.0],[0.0, 1.0],[0.0, 1.0], [0.0, 1.0]])
)

If the length should be returned, pass True to return_length (Except for chunks at the of utterances the length will be equal to frames_per_chunk.)

>>> corpus = audiomate.Corpus.load('/path/to/corpus')
>>> container_inputs = containers.FeatureContainer('/path/to/features.hdf5')
>>> container_outputs = containers.Container('/path/to/targets.hdf5')
>>>
>>> ds = MultiFrameDataset(corpus, [container_inputs, container_outputs], 5)
>>> len(ds) # Number of chunks in the dataset
355
>>> ds[20] # Chunk (inputs, outputs) with index 20
(
    array([[0.72991909, 0.20258683, 0.30574747, 0.53783217],
           [0.38875413, 0.83611128, 0.49054591, 0.15710017],
           [0.35153358, 0.40051009, 0.93647765, 0.29589257],
           [0.97465772, 0.80160451, 0.81871436, 0.4892925 ],
           [0.59310933, 0.8565602 , 0.95468696, 0.07933512]]),
    array([[0.0, 1.0], [0.0, 1.0],[0.0, 1.0],[0.0, 1.0], [0.0, 1.0]]),
    5
)

get_utt_regions()[source]¶

Return the regions of all utterances, assuming all utterances are concatenated. It is assumed that the utterances are sorted in ascending order for concatenation.

A region is defined by offset (in chunks), length (num-chunks) and a list of references to the utterance datasets in the containers.

Returns:	List of with a tuple for every utterances containing the region info.
Return type:	list

partitioned_iterator(partition_size, shuffle=True, seed=None)[source]¶

Return a partitioning audiomate.feeding.MultiFrameIterator for the dataset.

Parameters:	partition_size (str) – Size of the partitions in bytes. The units `k` (kibibytes), `m` (mebibytes) and `g` (gibibytes) are supported, i.e. a `partition_size` of `1g` equates \(2^{30}\) bytes. shuffle (bool) – Indicates whether the data should be returned in random order (`True`) or not (`False`). seed (int) – Seed to be used for the random number generator.
Returns:	A partition iterator over the dataset.
Return type:	MultiFrameIterator

Iterator¶

class audiomate.feeding.DataIterator(corpus_or_utt_ids, feature_containers, shuffle=True, seed=None)[source]¶

An abstract class representing a data-iterator. A data-iterator provides sequential access to data. An implementation of a concrete data-iterator should override the methods __iter__ and __next__.

A sample returned from a data-iterator is a tuple containing the data for this sample from every container. The data from different containers is ordered in the way the containers were passed to the DataIterator.

Parameters:

corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for iterating.
containers (list, Container) – A single container or a list of containers.
shuffle (bool) – Indicates whether the data should be returned in random order (True) or not (False).
seed (int) – Seed to be used for the random number generator.

class audiomate.feeding.FrameIterator(corpus_or_utt_ids, container, partition_size, shuffle=True, seed=None)[source]¶

A data-iterator wrapping frames of a corpus. A single sample represents a single frame.

Parameters:

corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for iterating.
container (list, Container) – A single container or a list of containers.
partition_size (str) – Size of the partitions in bytes. The units k (kibibytes), m (mebibytes) and g (gibibytes) are supported, i.e. a partition_size of 1g equates \(2^{30}\) bytes.
shuffle (bool) – Indicates whether the data should be returned in random order (True) or not (False).
seed (int) – Seed to be used for the random number generator.

Note

For a FrameIterator it is expected that every container contains exactly one value/vector for every frame. So the first dimension of every array in every container have to match.

Example

>>> corpus = audiomate.Corpus.load('/path/to/corpus')
>>> container_inputs = containers.FeatureContainer('/path/to/features.hdf5')
>>> container_outputs = containers.Container('/path/to/targets.hdf5')
>>>
>>> ds = FrameIterator(corpus, [container_inputs, container_outputs], '1G', shuffle=True, seed=23)
>>> next(ds) # Next Frame (inputs, outputs)
(
    array([0.58843831, 0.18128443, 0.19718328, 0.25284105]),
    array([0.0, 1.0])
)

class audiomate.feeding.MultiFrameIterator(corpus_or_utt_ids, container, partition_size, frames_per_chunk, return_length=False, pad=False, shuffle=True, seed=None)[source]¶

A data-iterator wrapping chunks of subsequent frames of a corpus. A single sample represents a chunk of frames.

Parameters:

corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for iterating.
container (list, Container) – A single container or a list of containers.
partition_size (str) – Size of the partitions in bytes. The units k (kibibytes), m (mebibytes) and g (gibibytes) are supported, i.e. a partition_size of 1g equates \(2^{30}\) bytes.
frames_per_chunk (int) – Number of subsequent frames in a single sample.
return_length (bool) – If True, the length of the chunk is returned as well. (default False) The length is appended to tuple as the last element. (e.g. [container1-data, container2-data, length])
pad (bool) – If True, samples that are shorter are padded with zeros to match frames_per_chunk. If padding is enabled, the lengths are always returned return_length = True.
shuffle (bool) – Indicates whether the data should be returned in random order (True) or not (False).
seed (int) – Seed to be used for the random number generator.

Note

For a MultiFrameIterator it is expected that every container contains exactly one value/vector for every frame. So the first dimension (outermost) of every array in every container have to match.

Example

>>> corpus = audiomate.Corpus.load('/path/to/corpus')
>>> container_inputs = containers.FeatureContainer('/path/to/features.hdf5')
>>> container_outputs = containers.Container('/path/to/targets.hdf5')
>>>
>>> ds = MultiFrameIterator(corpus, [container_inputs, container_outputs], '1G', 5, shuffle=True, seed=23)
>>> next(ds) # Next Chunk (inputs, outputs)
(
    array([[0.72991909, 0.20258683, 0.30574747, 0.53783217],
           [0.38875413, 0.83611128, 0.49054591, 0.15710017],
           [0.35153358, 0.40051009, 0.93647765, 0.29589257],
           [0.97465772, 0.80160451, 0.81871436, 0.4892925 ],
           [0.59310933, 0.8565602 , 0.95468696, 0.07933512]])
    array([[0.0, 1.0], [0.0, 1.0],[0.0, 1.0],[0.0, 1.0], [0.0, 1.0]])
)

Partitioning¶

class audiomate.feeding.PartitioningContainerLoader(corpus_or_utt_ids, feature_containers, partition_size, shuffle=True, seed=None)[source]¶

Load data from one or more containers in partitions. It computes a scheme to load the data of as many utterances as possible in one partition.

A scheme is initially computed on creation of the loader. To compute a new one the reload() method can be used. This only has an effect if shuffle == True, otherwise the utterances are defined always loaded in the same order.

With a given scheme, data of a partition can be retrieved via load_partition_data(). It loads all data of the partition with the given index into memory.

Parameters:

corpus_or_utt_ids (Corpus, list) – Either a corpus or a list of utterances. This defines which utterances are considered for loading.
containers (container.Container, list) – Either a single or a list of Container objects. From the given containers data is loaded.
partition_size (str) – Size of the partitions in bytes. The units k (kibibytes), m (mebibytes) and g (gibibytes) are supported, i.e. a partition_size of 1g equates \(2^{30}\) bytes.
shuffle (bool) – Indicates whether the utterances should be returned in random order (True) or not (False).
seed (int) – Seed to be used for the random number generator.

Example

>>> corpus = audiomate.Corpus.load('/path/to/corpus')
>>> container_inputs = containers.FeatureContainer('/path/to/feat.hdf5')
>>> container_outputs = containers.Container('/path/to/targets.hdf5')
>>>
>>> lo = PartitioningContainerLoader(
>>>     corpus,
>>>     [container_inputs, container_outputs],
>>>     '1G',
>>>     shuffle=True,
>>>     seed=23
>>> )
>>> len(lo.partitions) # Number of parititions
5
>>> lo.partitions[0].utt_ids # Utterances in the partition with index 0
['utt-1', 'utt-2', ...]
>>> p0 = lo.load_partition_data(0) # Load partition 0 into memory
>>> p0.info.utt_ids[0] # First utterance in the partition
'utt-1'
>>> p0.utt_data[0] # Data of the first utterance
(
    array([[0.58843831, 0.18128443, 0.19718328, 0.25284105], ...]),
    array([[0.0, 1.0], ...])
)

load_partition_data(index)[source]¶

Load and return the partition with the given index.

Parameters:	index (int) – The index of partition, that refers to the index in `self.partitions`.
Returns:	A PartitionData object containing the data for the partition with the given index.
Return type:	PartitionData

reload()[source]¶

Create a new partition scheme. A scheme defines which utterances are in which partition. The scheme only changes after every call if self.shuffle == True.

Returns:	List of PartitionInfo objects, defining the new partitions (same as `self.partitions`).
Return type:	list

class audiomate.feeding.PartitionInfo[source]¶

Class for holding the info of a partition.

Variables:

utt_ids (list) – A list of utterance-ids in the partition.
utt_lengths (list) – List with lengths of the utterances (Outermost dimension in the dataset of the container). Since there are maybe multiple containers, every item is a tuple of lengths. They correspond to the length of the utterance in every container, in the order of the containers passed to the ParitioningContainerLoader.
size (int) – The number of bytes the partition will allocate, when loaded.

total_lengths()[source]¶: Return the total length of all utterances for every container.

class audiomate.feeding.PartitionData(info)[source]¶

Class for holding the loaded data of a partition.

Parameters:	info (PartitionInfo) – The info about the partition.
Variables:	utt_data (list) – A list holding the data-objects for every utterance in the order of `info.utt_ids`. The entries are also lists or tuples containing the array for every container.

class audiomate.feeding.PartitioningFeatureIterator(hdf5file, partition_size, shuffle=True, seed=None, includes=None, excludes=None)[source]¶

Iterates over all features in the given HDF5 file.

Before iterating over the features, the iterator slices the file into one or more partitions and loads the data into memory. This leads to significant speed-ups even with moderate partition sizes, regardless of the type of disk (spinning or flash). Pseudo random access is supported with a negligible impact on performance and randomness: The data is randomly sampled (without replacement) within each partition and the partitions are loaded in random order, too.

The features are emitted as triplets in the form of (utterance name, index of the feature within the utterance, feature).

When calculating the partition sizes only the size of the features itself is factored in, overhead of data storage is ignored. This overhead is usually negligible even with partition sizes of multiple gigabytes because the data is stored as numpy ndarrays in memory (one per utterance). The overhead of a single ndarray is 96 bytes regardless of its size. Nonetheless the partition size should be chosen to be lower than the total available memory.

Parameters:

hdf5file (h5py.File) – HDF5 file containing the features
partition_size (str) – Size of the partitions in bytes. The units k (kibibytes), m (mebibytes) and g (gibibytes) are supported, i.e. a partition_size of 1g equates \(2^{30}\) bytes.
shuffle (bool) – Indicates whether the features should be returned in random order (True) or not (False).
seed (int) – Seed to be used for the random number generator.
includes (iterable) – Iterable of names of data sets that should be included when iterating over the feature container. Mutually exclusive with excludes. If both are specified, only includes will be considered.
excludes (iterable) – Iterable of names of data sets to skip when iterating over the feature container. Mutually exclusive with includes. If both are specified, only includes will be considered.

Example

>>> import h5py
>>> from audiomate.feeding import PartitioningFeatureIterator
>>> hdf5 = h5py.File('features.h5', 'r')
>>> iterator = PartitioningFeatureIterator(hdf5, '12g', shuffle=True)
>>> next(iterator)
('music-fma-0100', 227, array([
    -0.15004082, -0.30246958, -0.38708138, ...,
    -0.93471956, -0.94194776, -0.90878332 ], dtype=float32))
>>> next(iterator)
('music-fma-0081', 2196, array([
    -0.00207647, -0.00101351, -0.00058832, ...,
    -0.00207647, -0.00292684, -0.00292684], dtype=float32))
>>> next(iterator)
('music-hd-0050', 1026, array([
    -0.57352495, -0.63049972, -0.63049972, ...,
    0.82490814, 0.84680521,  0.75517786], dtype=float32))