audiomate.tracks¶

This module contains the different implementations of a track. A track is an abstract representation of an audio signal.

A concrete implementation provides the functionalty for reading the audio samples from a specific source.

Track¶

class audiomate.tracks.Track(idx)[source]¶

Track is the abstract base class for an audio track.

Parameters:	idx (str) – A identifier to uniquely identify a track.

duration¶: Return the duration in seconds.

num_channels¶: Return the number of channels.

num_samples¶: Return the total number of samples.

read_frames(frame_size, hop_size, offset=0, duration=None, buffer_size=5760000)[source]¶

Generator that reads and returns the samples of the track in frames.

Parameters:	frame_size (int) – The number of samples per frame. hop_size (int) – The number of samples between two frames. offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start). duration (float) – The length of the samples to read in seconds.
Returns:	A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.
Return type:	Generator

read_samples(sr=None, offset=0, duration=None)[source]¶

Return the samples of the track.

Parameters:	sr (int) – If `None`, uses the native sampling-rate, otherwise resamples to the given sampling rate. offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start). duration (float) – The length of the samples to read in seconds.
Returns:	A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type:	np.ndarray

sampling_rate¶: Return the sampling rate.

FileTrack¶

class audiomate.tracks.FileTrack(idx, path)[source]¶

A track that is stored in a file.

Parameters:	idx (str) – A identifier to uniquely identify a track. path (str) – The path to the file.

duration¶: Return the duration in seconds.

num_channels¶: Return the number of channels.

num_samples¶: Return the total number of samples.

read_frames(frame_size, hop_size, offset=0, duration=None, buffer_size=5760000)[source]¶

Generator that reads and returns the samples of the track in frames.

Parameters:	frame_size (int) – The number of samples per frame. hop_size (int) – The number of samples between two frames. offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start). duration (float) – The length of the samples to read in seconds.
Returns:	A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.
Return type:	Generator

read_samples(sr=None, offset=0, duration=None)[source]¶

Return the samples from the file. Uses librosa for loading (see http://librosa.github.io/librosa/generated/librosa.core.load.html).

Parameters:	sr (int) – If `None`, uses the sampling rate given by the file, otherwise resamples to the given sampling rate. offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start). duration (float) – The length of the samples to read in seconds.
Returns:	A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type:	np.ndarray

sampling_rate¶: Return the sampling rate.

ContainerTrack¶

class audiomate.tracks.ContainerTrack(idx, container, key=None)[source]¶

A track that is stored in a audiomate.containers.AudioContainer.

Parameters:	idx (str) – A identifier to uniquely identify a track. container (AudioContainer) – The audio container with the samples. key (str) – The key of the samples in the container. If `None`, it is assumed it’s the same as `idx`.

duration¶: Return the duration in seconds.

num_channels¶: Return the number of channels.

num_samples¶: Return the total number of samples.

read_frames(frame_size, hop_size, offset=0, duration=None, block_size=None)[source]¶

Generator that reads and returns the samples of the track in frames.

Parameters:	frame_size (int) – The number of samples per frame. hop_size (int) – The number of samples between two frames. offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start). duration (float) – The length of the samples to read in seconds.
Returns:	A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.
Return type:	Generator

read_samples(sr=None, offset=0, duration=None)[source]¶

Return the samples from the track in the container. Uses librosa for resampling, if needed.

Parameters:	sr (int) – If `None`, uses the sampling rate given by the file, otherwise resamples to the given sampling rate. offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start). duration (float) – The length of the samples to read in seconds.
Returns:	A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type:	np.ndarray

sampling_rate¶: Return the sampling rate.

Utterance¶

class audiomate.tracks.Utterance(idx, track, issuer=None, start=0, end=inf, label_lists=None)[source]¶

An utterance defines a sample of audio. It is part of a track or can span over the whole track.

Parameters:

idx (str) – A unique identifier for the utterance within a dataset.
track (Track) – The track this utterance is belonging to.
issuer (Issuer) – The issuer this utterance was created from.
start (float) – The start of the utterance within the audio track in seconds. (default 0)
end (float) – The end of the utterance within the audio track in seconds. inf indicates that the utterance ends at the end of the track. (default inf)
label_lists (LabelList, list) – A single or multiple label-lists.

Variables:

label_lists (dict) – A dictionary containing label-lists with the label-list-idx as key.

all_label_values(label_list_ids=None)[source]¶

Return a set of all label-values occurring in this utterance.

Parameters:	label_list_ids (list) – If not None, only label-values from label-lists with an id contained in this list are considered.
Returns:	A set of distinct label-values.
Return type:	`set`

all_tokens(delimiter=' ', label_list_ids=None)[source]¶

Return a list of all tokens occurring in one of the labels in the label-lists.

Parameters:	delimiter (str) – The delimiter used to split labels into tokens (see `audiomate.annotations.Label.tokenized()`). label_list_ids (list) – If not None, only labels from label-lists with an idx contained in this list are considered.
Returns:	A set of distinct tokens.
Return type:	`set`

duration¶: Return the absolute duration in seconds.

end_abs¶: Return the absolute end of the utterance relative to the signal.

label_count(label_list_ids=None)[source]¶

Return a dictionary containing the number of times, every label-value in this utterance is occurring.

Parameters:	label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:	A dictionary containing the number of occurrences with the label-value as key.
Return type:	dict

label_total_duration(label_list_ids=None)[source]¶

Return a dictionary containing the number of seconds, every label-value is occurring in this utterance.

Parameters:	label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:	A dictionary containing the number of seconds with the label-value as key.
Return type:	dict

num_samples(sr=None)[source]¶

Return the number of samples.

Parameters:	sr (int) – Calculate the number of samples with the given sampling-rate. If None use the native sampling-rate.
Returns:	Number of samples
Return type:	int

read_samples(sr=None, offset=0, duration=None)[source]¶

Read the samples of the utterance.

Parameters:

sr (int) – If None uses the sampling rate given by the track, otherwise resamples to the given sampling rate.
offset (float) – Offset in seconds to read samples from.
duration (float) – If not None read only this number of seconds in maximum.

Returns:

A numpy array containing the samples: as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate¶: Return the sampling rate.

set_label_list(label_lists)[source]¶

Set the given label-list for this utterance. If the label-list-idx is not set, default is used. If there is already a label-list with the given idx, it will be overriden.

Parameters:	label_list (LabelList, list) – A single or multi. label-lists to add.

split(cutting_points, track_relative=False, overlap=0.0)[source]¶

Split the utterance into x parts (sub-utterances) and return them as new utterances. x is defined by cutting_points (x = len(cutting_points) + 1).

By default cutting-points are relative to the start of the utterance. For example if an utterance starts at 50s, a cutting-point of 10.0 will split the utterance at 60s relative to the track.

Parameters:	cutting_points (list) – List of floats defining the times in seconds where to split the utterance. track_relative (bool) – If `True`, cutting-points are relative to the start of the track. Otherwise they are relative to the start of the utterance. overlap (float) – Amount of overlap in seconds. This amount is subtracted from a start-cutting-point, and added to a end-cutting-point.
Returns:	List of `Utterance`’s.
Return type:	list

Example

>>> utt = Utterance('utt-1', 'file-x', start=0.0, end=30.0)
>>> sub_utts = utt.split([10.0, 20.0])
>>> len(sub_utts)
3
>>> sub_utts[0].start
0.0
>>> sub_utts[0].end
10.0