audiomate.tracks¶
This module contains the different implementations of a track. A track is an abstract representation of an audio signal.
A concrete implementation provides the functionalty for reading the audio samples from a specific source.
Track¶
-
class
audiomate.tracks.
Track
(idx)[source]¶ Track is the abstract base class for an audio track.
Parameters: idx (str) – A identifier to uniquely identify a track. -
duration
¶ Return the duration in seconds.
-
num_channels
¶ Return the number of channels.
-
num_samples
¶ Return the total number of samples.
-
read_frames
(frame_size, hop_size, offset=0, duration=None, buffer_size=5760000)[source]¶ Generator that reads and returns the samples of the track in frames.
Parameters: - frame_size (int) – The number of samples per frame.
- hop_size (int) – The number of samples between two frames.
- offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
- duration (float) – The length of the samples to read in seconds.
Returns: A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.
Return type: Generator
-
read_samples
(sr=None, offset=0, duration=None)[source]¶ Return the samples of the track.
Parameters: - sr (int) – If
None
, uses the native sampling-rate, otherwise resamples to the given sampling rate. - offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
- duration (float) – The length of the samples to read in seconds.
Returns: A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type: np.ndarray
- sr (int) – If
-
sampling_rate
¶ Return the sampling rate.
-
FileTrack¶
-
class
audiomate.tracks.
FileTrack
(idx, path)[source]¶ A track that is stored in a file.
Parameters: - idx (str) – A identifier to uniquely identify a track.
- path (str) – The path to the file.
-
duration
¶ Return the duration in seconds.
-
num_channels
¶ Return the number of channels.
-
num_samples
¶ Return the total number of samples.
-
read_frames
(frame_size, hop_size, offset=0, duration=None, buffer_size=5760000)[source]¶ Generator that reads and returns the samples of the track in frames.
Parameters: - frame_size (int) – The number of samples per frame.
- hop_size (int) – The number of samples between two frames.
- offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
- duration (float) – The length of the samples to read in seconds.
Returns: A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.
Return type: Generator
-
read_samples
(sr=None, offset=0, duration=None)[source]¶ Return the samples from the file. Uses librosa for loading (see http://librosa.github.io/librosa/generated/librosa.core.load.html).
Parameters: - sr (int) – If
None
, uses the sampling rate given by the file, otherwise resamples to the given sampling rate. - offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start).
- duration (float) – The length of the samples to read in seconds.
Returns: A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type: np.ndarray
- sr (int) – If
-
sampling_rate
¶ Return the sampling rate.
ContainerTrack¶
-
class
audiomate.tracks.
ContainerTrack
(idx, container, key=None)[source]¶ A track that is stored in a
audiomate.containers.AudioContainer
.Parameters: - idx (str) – A identifier to uniquely identify a track.
- container (AudioContainer) – The audio container with the samples.
- key (str) – The key of the samples in the container.
If
None
, it is assumed it’s the same asidx
.
-
duration
¶ Return the duration in seconds.
-
num_channels
¶ Return the number of channels.
-
num_samples
¶ Return the total number of samples.
-
read_frames
(frame_size, hop_size, offset=0, duration=None, block_size=None)[source]¶ Generator that reads and returns the samples of the track in frames.
Parameters: - frame_size (int) – The number of samples per frame.
- hop_size (int) – The number of samples between two frames.
- offset (float) – The time in seconds, from where to start reading the samples (rel. to the track start).
- duration (float) – The length of the samples to read in seconds.
Returns: A generator yielding a tuple for every frame. The first item is the frame, the second the sampling-rate and the third a boolean indicating if it is the last frame.
Return type: Generator
-
read_samples
(sr=None, offset=0, duration=None)[source]¶ Return the samples from the track in the container. Uses librosa for resampling, if needed.
Parameters: - sr (int) – If
None
, uses the sampling rate given by the file, otherwise resamples to the given sampling rate. - offset (float) – The time in seconds, from where to start reading the samples (rel. to the file start).
- duration (float) – The length of the samples to read in seconds.
Returns: A numpy array containing the samples as a floating point (numpy.float32) time series.
Return type: np.ndarray
- sr (int) – If
-
sampling_rate
¶ Return the sampling rate.
Utterance¶
-
class
audiomate.tracks.
Utterance
(idx, track, issuer=None, start=0, end=inf, label_lists=None)[source]¶ An utterance defines a sample of audio. It is part of a track or can span over the whole track.
Parameters: - idx (str) – A unique identifier for the utterance within a dataset.
- track (Track) – The track this utterance is belonging to.
- issuer (Issuer) – The issuer this utterance was created from.
- start (float) – The start of the utterance within the audio track in seconds. (default 0)
- end (float) – The end of the utterance within the audio track in seconds.
inf
indicates that the utterance ends at the end of the track. (defaultinf
) - label_lists (LabelList, list) – A single or multiple label-lists.
Variables: label_lists (dict) – A dictionary containing label-lists with the label-list-idx as key.
-
all_label_values
(label_list_ids=None)[source]¶ Return a set of all label-values occurring in this utterance.
Parameters: label_list_ids (list) – If not None, only label-values from label-lists with an id contained in this list are considered. Returns: A set of distinct label-values. Return type: set
-
all_tokens
(delimiter=' ', label_list_ids=None)[source]¶ Return a list of all tokens occurring in one of the labels in the label-lists.
Parameters: - delimiter (str) – The delimiter used to split labels into tokens
(see
audiomate.annotations.Label.tokenized()
). - label_list_ids (list) – If not None, only labels from label-lists with an idx contained in this list are considered.
Returns: A set of distinct tokens.
Return type: set
- delimiter (str) – The delimiter used to split labels into tokens
(see
-
duration
¶ Return the absolute duration in seconds.
-
end_abs
¶ Return the absolute end of the utterance relative to the signal.
-
label_count
(label_list_ids=None)[source]¶ Return a dictionary containing the number of times, every label-value in this utterance is occurring.
Parameters: label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered. Returns: - A dictionary containing the number of occurrences
- with the label-value as key.
Return type: dict
-
label_total_duration
(label_list_ids=None)[source]¶ Return a dictionary containing the number of seconds, every label-value is occurring in this utterance.
Parameters: label_list_ids (list) – If not None, only labels from label-lists with an id contained in this list are considered. Returns: - A dictionary containing the number of seconds
- with the label-value as key.
Return type: dict
-
num_samples
(sr=None)[source]¶ Return the number of samples.
Parameters: sr (int) – Calculate the number of samples with the given sampling-rate. If None use the native sampling-rate. Returns: Number of samples Return type: int
-
read_samples
(sr=None, offset=0, duration=None)[source]¶ Read the samples of the utterance.
Parameters: - sr (int) – If None uses the sampling rate given by the track, otherwise resamples to the given sampling rate.
- offset (float) – Offset in seconds to read samples from.
- duration (float) – If not
None
read only this number of seconds in maximum.
Returns: - A numpy array containing the samples
as a floating point (numpy.float32) time series.
Return type: np.ndarray
-
sampling_rate
¶ Return the sampling rate.
-
set_label_list
(label_lists)[source]¶ Set the given label-list for this utterance. If the label-list-idx is not set,
default
is used. If there is already a label-list with the given idx, it will be overriden.Parameters: label_list (LabelList, list) – A single or multi. label-lists to add.
-
split
(cutting_points, track_relative=False, overlap=0.0)[source]¶ Split the utterance into x parts (sub-utterances) and return them as new utterances. x is defined by cutting_points (
x = len(cutting_points) + 1
).By default cutting-points are relative to the start of the utterance. For example if an utterance starts at 50s, a cutting-point of 10.0 will split the utterance at 60s relative to the track.
Parameters: - cutting_points (list) – List of floats defining the times in seconds where to split the utterance.
- track_relative (bool) – If
True
, cutting-points are relative to the start of the track. Otherwise they are relative to the start of the utterance. - overlap (float) – Amount of overlap in seconds. This amount is subtracted from a start-cutting-point, and added to a end-cutting-point.
Returns: List of
Utterance
’s.Return type: list
Example
>>> utt = Utterance('utt-1', 'file-x', start=0.0, end=30.0) >>> sub_utts = utt.split([10.0, 20.0]) >>> len(sub_utts) 3 >>> sub_utts[0].start 0.0 >>> sub_utts[0].end 10.0