audiomate.tracks

This module contains the different implementations of a track. A track is an abstract representation of an audio signal.

A concrete implementation provides the functionalty for reading the audio samples from a specific source.

Track

class audiomate.tracks.Track(idx)[source]

Track is the abstract base class for an audio track.

Parameters:idx (str) – A identifier to uniquely identify a track.
duration

Return the duration in seconds.

num_channels

Return the number of channels.

num_samples

Return the total number of samples.

read_frames(frame_size, hop_size, offset=0, duration=None, buffer_size=5760000)[source]

Generator that reads and returns the samples of the track in frames.

Parameters:
  • frame_size (int) – The number of samples per frame.
  • hop_size (int) – The number of samples between two frames.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.

Return type:

Generator

read_samples(sr=None, offset=0, duration=None)[source]

Return the samples of the track.

Parameters:
  • sr (int) – If None, uses the native sampling-rate, otherwise resamples to the given sampling rate.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A numpy array containing the samples as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate

Return the sampling rate.

FileTrack

class audiomate.tracks.FileTrack(idx, path)[source]

A track that is stored in a file.

Parameters:
  • idx (str) – A identifier to uniquely identify a track.
  • path (str) – The path to the file.
duration

Return the duration in seconds.

num_channels

Return the number of channels.

num_samples

Return the total number of samples.

read_frames(frame_size, hop_size, offset=0, duration=None, buffer_size=5760000)[source]

Generator that reads and returns the samples of the track in frames.

Parameters:
  • frame_size (int) – The number of samples per frame.
  • hop_size (int) – The number of samples between two frames.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.

Return type:

Generator

read_samples(sr=None, offset=0, duration=None)[source]

Return the samples from the file. Uses librosa for loading (see http://librosa.github.io/librosa/generated/librosa.core.load.html).

Parameters:
  • sr (int) – If None, uses the sampling rate given by the file, otherwise resamples to the given sampling rate.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A numpy array containing the samples as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate

Return the sampling rate.

ContainerTrack

class audiomate.tracks.ContainerTrack(idx, container, key=None)[source]

A track that is stored in a audiomate.containers.AudioContainer.

Parameters:
  • idx (str) – A identifier to uniquely identify a track.
  • container (AudioContainer) – The audio container with the samples.
  • key (str) – The key of the samples in the container. If None, it is assumed it’s the same as idx.
duration

Return the duration in seconds.

num_channels

Return the number of channels.

num_samples

Return the total number of samples.

read_frames(frame_size, hop_size, offset=0, duration=None, block_size=None)[source]

Generator that reads and returns the samples of the track in frames.

Parameters:
  • frame_size (int) – The number of samples per frame.
  • hop_size (int) – The number of samples between two frames.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.

Return type:

Generator

read_samples(sr=None, offset=0, duration=None)[source]

Return the samples from the track in the container. Uses librosa for resampling, if needed.

Parameters:
  • sr (int) – If None, uses the sampling rate given by the file, otherwise resamples to the given sampling rate.
  • offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start).
  • duration (float) – The length of the samples to read in seconds.
Returns:

A numpy array containing the samples as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate

Return the sampling rate.

Utterance

class audiomate.tracks.Utterance(idx, track, issuer=None, start=0, end=inf, label_lists=None)[source]

An utterance defines a sample of audio. It is part of a track or can span over the whole track.

Parameters:
  • idx (str) – A unique identifier for the utterance within a dataset.
  • track (Track) – The track this utterance is belonging to.
  • issuer (Issuer) – The issuer this utterance was created from.
  • start (float) – The start of the utterance within the audio track in seconds. (default 0)
  • end (float) – The end of the utterance within the audio track in seconds. inf indicates that the utterance ends at the end of the track. (default inf)
  • label_lists (LabelList, list) – A single or multiple label-lists.
Variables:

label_lists (dict) – A dictionary containing label-lists with the label-list-idx as key.

all_label_values(label_list_ids=None)[source]

Return a set of all label-values occurring in this utterance.

Parameters:label_list_ids (list) – If not None, only label-values from label-lists with an id contained in this list are considered.
Returns:A set of distinct label-values.
Return type:set
all_tokens(delimiter=' ', label_list_ids=None)[source]

Return a list of all tokens occurring in one of the labels in the label-lists.

Parameters:
  • delimiter (str) – The delimiter used to split labels into tokens (see audiomate.annotations.Label.tokenized()).
  • label_list_ids (list) – If not None, only labels from label-lists with an idx contained in this list are considered.
Returns:

A set of distinct tokens.

Return type:

set

duration

Return the absolute duration in seconds.

end_abs

Return the absolute end of the utterance relative to the signal.

label_count(label_list_ids=None)[source]

Return a dictionary containing the number of times, every label-value in this utterance is occurring.

Parameters:label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:
A dictionary containing the number of occurrences
with the label-value as key.
Return type:dict
label_total_duration(label_list_ids=None)[source]

Return a dictionary containing the number of seconds, every label-value is occurring in this utterance.

Parameters:label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered.
Returns:
A dictionary containing the number of seconds
with the label-value as key.
Return type:dict
num_samples(sr=None)[source]

Return the number of samples.

Parameters:sr (int) – Calculate the number of samples with the given sampling-rate. If None use the native sampling-rate.
Returns:Number of samples
Return type:int
read_samples(sr=None, offset=0, duration=None)[source]

Read the samples of the utterance.

Parameters:
  • sr (int) – If None uses the sampling rate given by the track, otherwise resamples to the given sampling rate.
  • offset (float) – Offset in seconds to read samples from.
  • duration (float) – If not None read only this number of seconds in maximum.
Returns:

A numpy array containing the samples

as a floating point (numpy.float32) time series.

Return type:

np.ndarray

sampling_rate

Return the sampling rate.

set_label_list(label_lists)[source]

Set the given label-list for this utterance. If the label-list-idx is not set, default is used. If there is already a label-list with the given idx, it will be overriden.

Parameters:label_list (LabelList, list) – A single or multi. label-lists to add.
split(cutting_points, track_relative=False, overlap=0.0)[source]

Split the utterance into x parts (sub-utterances) and return them as new utterances. x is defined by cutting_points (x = len(cutting_points) + 1).

By default cutting-points are relative to the start of the utterance. For example if an utterance starts at 50s, a cutting-point of 10.0 will split the utterance at 60s relative to the track.

Parameters:
  • cutting_points (list) – List of floats defining the times in seconds where to split the utterance.
  • track_relative (bool) – If True, cutting-points are relative to the start of the track. Otherwise they are relative to the start of the utterance.
  • overlap (float) – Amount of overlap in seconds. This amount is subtracted from a start-cutting-point, and added to a end-cutting-point.
Returns:

List of Utterance’s.

Return type:

list

Example

>>> utt = Utterance('utt-1', 'file-x', start=0.0, end=30.0)
>>> sub_utts = utt.split([10.0, 20.0])
>>> len(sub_utts)
3
>>> sub_utts[0].start
0.0
>>> sub_utts[0].end
10.0