audiomate.containers

This module contains the different implementations of containers. A container is normally used to store data of a specific type for all instances of a corpus (e.g. mfcc-features of all utterances).

All container implementations are based on audiomate.containers.Container, which provides the basic functionality to access a HDF5-file using h5py.

Container

class audiomate.containers.Container(path, mode='a')[source]

A container is a wrapper around a HDF5 file. In a container is used to store array-like data. Every array is associated with some idx/key. Every array (a dataset in h5py-terms) may have additional attributes.

Parameters:
  • path (str) – Path where the HDF5 file is stored. If the file doesn’t exist, one is created.
  • mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’).

Example

>>> ct = Container('/path/to/hdf5file')
>>> with ct:
>>>     ct.set('utt-1', np.array([1,2,3,4]))
>>>     data = ct.get('utt-1')
array([1, 2, 3, 4])
append(key, data)[source]

Append the given data to the data that already exists in the container for the given key. Only data with equal dimensions (except the first) are allowed, since they are concatenated/stacked along the first dimension.

Parameters:
  • key (str) – Key to store data for.
  • data (numpy.ndarray) – Array-like data. Has to have the same dimension as the existing data after the first dimension.

Note

The container has to be opened in advance. For appending to existing data the HDF5-Dataset has to be chunked, so it is not allowed to first add data via set.

close()[source]

Close the container file if its open.

get(key, mem_map=True)[source]

Read and return the data stored for the given key.

Parameters:
  • key (str) – The key to read the data from.
  • mem_map (bool) – If True returns the data as memory-mapped array, otherwise a copy is returned.

Note

The container has to be opened in advance.

Returns:The stored data.
Return type:numpy.ndarray
is_open(mode=None)[source]

Return True, if container is already open. False otherwise.

keys()[source]

Return a list of keys for which an array is stored in the container.

Returns:List of identifiers available in the container.
Return type:list

Note

The container has to be opened in advance.

open(mode=None)[source]

Open the container file.

Parameters:mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’). If None, uses self.mode.
open_if_needed(mode=None)[source]

Convenience context-manager for the use with with. Opens the container if not already done. Only closes the container if it was opened within this context.

Parameters:mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’). If None, uses self.mode.
raise_error_if_not_open()[source]

Check if container is opened, raise error if not.

remove(key)[source]

Remove the data stored for the given key.

Parameters:key (str) – Key of the data to remove.

Note

The container has to be opened in advance.

set(key, data)[source]

Set the given data to the container with the given key. Any existing data for the given key is discarded/overwritten.

Parameters:
  • key (str) – A key to store the data for.
  • data (numpy.ndarray) – Array-like data.

Note

The container has to be opened in advance.

FeatureContainer

class audiomate.containers.FeatureContainer(path, mode='a')[source]

The FeatureContainer is a container for storing features extracted from audio data. Features are array-like data, where every feature represents the properties of a given segment of audio.

Parameters:
  • path (str) – Path to where the HDF5 file is stored. If the file doesn’t exist, one is created.
  • mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’).

Example

>>> fc = FeatureContainer('/path/to/hdf5file')
>>> with fc:
>>>     fc.set('utt-1', np.array([1,2,3,4]))
>>>     data = fc.get('utt-1')
array([1, 2, 3, 4])
append(key, data)

Append the given data to the data that already exists in the container for the given key. Only data with equal dimensions (except the first) are allowed, since they are concatenated/stacked along the first dimension.

Parameters:
  • key (str) – Key to store data for.
  • data (numpy.ndarray) – Array-like data. Has to have the same dimension as the existing data after the first dimension.

Note

The container has to be opened in advance. For appending to existing data the HDF5-Dataset has to be chunked, so it is not allowed to first add data via set.

close()

Close the container file if its open.

frame_size

The number of samples used per frame.

get(key, mem_map=True)

Read and return the data stored for the given key.

Parameters:
  • key (str) – The key to read the data from.
  • mem_map (bool) – If True returns the data as memory-mapped array, otherwise a copy is returned.

Note

The container has to be opened in advance.

Returns:The stored data.
Return type:numpy.ndarray
hop_size

The number of samples between two frames.

is_open(mode=None)

Return True, if container is already open. False otherwise.

keys()

Return a list of keys for which an array is stored in the container.

Returns:List of identifiers available in the container.
Return type:list

Note

The container has to be opened in advance.

open(mode=None)

Open the container file.

Parameters:mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’). If None, uses self.mode.
open_if_needed(mode=None)

Convenience context-manager for the use with with. Opens the container if not already done. Only closes the container if it was opened within this context.

Parameters:mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’). If None, uses self.mode.
raise_error_if_not_open()

Check if container is opened, raise error if not.

remove(key)

Remove the data stored for the given key.

Parameters:key (str) – Key of the data to remove.

Note

The container has to be opened in advance.

sampling_rate

The sampling-rate of the signal these frames are based on.

set(key, data)

Set the given data to the container with the given key. Any existing data for the given key is discarded/overwritten.

Parameters:
  • key (str) – A key to store the data for.
  • data (numpy.ndarray) – Array-like data.

Note

The container has to be opened in advance.

stats()[source]

Return statistics calculated overall features in the container.

Note

The feature container has to be opened in advance.

Returns:Statistics overall data points of all features.
Return type:DataStats
stats_per_key()[source]

Return statistics calculated for each key in the container.

Note

The feature container has to be opened in advance.

Returns:A dictionary containing a DataStats object for each key.
Return type:dict

AudioContainer

class audiomate.containers.AudioContainer(path, mode='a')[source]

Container to store raw audio samples.

Notes

The samples are stored as 16-Bit Integers. But all methods expect or return the samples as 32-Bit Floats, in the range of -1.0 to 1.0.

append(key, samples, sampling_rate)[source]

Append the given samples to the data that already exists in the container for the given key.

Parameters:
  • key (str) – A key to store the data for.
  • samples (numpy.ndarray) – 1-D array of audio samples (int-16).
  • sampling_rate (int) – The sampling-rate of the audio samples.

Note

The container has to be opened in advance. For appending to existing data the HDF5-Dataset has to be chunked, so it is not allowed to first add data via set.

close()

Close the container file if its open.

get(key, mem_map=True)[source]

Return the samples for the given key and the sampling-rate.

Parameters:
  • key (str) – The key to read the data from.
  • mem_map (bool) – If True returns the data as memory-mapped array, otherwise a copy is returned.

Note

The container has to be opened in advance.

Returns:
A tuple containing the samples as numpy array
with np.float32 [-1.0,1.0] and the sampling-rate.
Return type:tuple
is_open(mode=None)

Return True, if container is already open. False otherwise.

keys()

Return a list of keys for which an array is stored in the container.

Returns:List of identifiers available in the container.
Return type:list

Note

The container has to be opened in advance.

open(mode=None)

Open the container file.

Parameters:mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’). If None, uses self.mode.
open_if_needed(mode=None)

Convenience context-manager for the use with with. Opens the container if not already done. Only closes the container if it was opened within this context.

Parameters:mode (str) – Either ‘r’ for read-only, ‘w’ for truncate and write or ‘a’ for append. (default: ‘a’). If None, uses self.mode.
raise_error_if_not_open()

Check if container is opened, raise error if not.

remove(key)

Remove the data stored for the given key.

Parameters:key (str) – Key of the data to remove.

Note

The container has to be opened in advance.

set(key, samples, sampling_rate)[source]

Set the samples and sampling-rate for the given key. Existing data will be overwritten. The samples have to have np.float32 datatype and values in the range of -1.0 and 1.0.

Parameters:
  • key (str) – A key to store the data for.
  • samples (numpy.ndarray) – 1-D array of audio samples (np.float32).
  • sampling_rate (int) – The sampling-rate of the audio samples.

Note

The container has to be opened in advance.