audiomate.corpus.preprocessing

This module provides building blocks for preprocessing blocks/pipelines. A pipeline is built out of processors, which process the samples of an utterance.

There are different levels of abstractions for a processor. On top there is the audiomate.corpus.preprocessing.Processor, which provides the basic structure to process the utterances one after another.

Based on that there are two subclasses audiomate.corpus.preprocessing.OnlineProcessor and audiomate.corpus.preprocessing.OfflineProcessor. The first one is streaming-processor for processing samples frame by frame without the need for loading the full utterance.

The latter one is used to process all samples of the utterance at once. In a lot of cases this is easier to implement.

Processors

class audiomate.corpus.preprocessing.Processor[source]

This class is the base class for all kind of feature extraction. The processor produces from a given corpus features, which it then stores in a feature-container.

For implementing a specific processor, the process_utterance method has to be implemented:

  • This method is called for every utterance in the corpus.
  • In the method any feature extraction / pre-processing can be done.
  • The result then has to be saved in the feature-container, which is passed along with the utterance. The result has to be saved, with the id of the utterance, which is passed as argument.

Example

>>> import audiomate
>>> from audiomate.corpus.preprocessing.pipeline import offline
>>>
>>> ds = audiomate.Corpus.load('some/corpus/path')
>>> mfcc_processor = offline.MFCC(n_mfcc=13, n_mels=128)
>>> norm_processor = offline.MeanVarianceNorm(mean=5.4, variance=2.3, parent=mfcc_processor)
>>>
>>> fc = norm_processor.process_corpus(ds, output_path='path/mfcc_features.h5', frame_size=400, hop_size=160)
>>> fc
<audiomate.corpus.assets.features.FeatureContainer at 0x10d451a20>
>>> fc.open()
>>> fc.get('existing-utterance-id')[()]
array([[-6.18418212,  3.93379946,  2.51237535,  3.62199459, -6.77845303,
 3.28746939,  1.36316432, -0.7814685 , -2.36003147,  3.27370797,
-3.24373709, -2.42513017, -1.55695699],
...
process_corpus(corpus, output_path, frame_size=400, hop_size=160, sr=None)[source]

Process the given corpus and save the processed features in a feature-container at the given path.

Parameters:
  • corpus (Corpus) – The corpus to process the utterances from.
  • output_path (str) – A path to save the feature-container to.
  • frame_size (int) – The number of samples per frame.
  • hop_size (int) – The number of samples between two frames.
  • sr (int) – Use the given sampling rate. If None uses the native sampling rate from the file.
Returns:

The feature-container containing the processed features.

Return type:

FeatureContainer

process_corpus_from_feature_container(corpus, input_features, output_path)[source]

Process the given corpus and save the processed features in a feature-container at the given path. Instead of using the framed signal, use the features from from the given feature-container.

Parameters:
  • corpus (Corpus) – The corpus to process the utterances from.
  • input_features (FeatureContainer) – The feature-container to process the frames from.
  • output_path (str) – A path to save the feature-container to.
Returns:

The feature-container containing the processed features.

Return type:

FeatureContainer

process_utterance(utterance, feature_container, corpus=None, frame_size=400, hop_size=160, sr=None)[source]

Extract features of the given utterances and put it in the given feature-container.

Parameters:
  • utterance (Utterance) – The utterance to process.
  • feature_container (FeatureContainer) – The feature-container to store the output.
  • corpus (Corpus) – The corpus where the utterance is from, if available.
  • frame_size (int) – The number of samples per frame.
  • hop_size (int) – The number of samples between two frames.
  • sr (int) – Use the given sampling rate. If None uses the native sampling rate from the file.
process_utterance_from_feature_container(utterance, in_feat_container, out_feat_container, corpus=None)[source]

Process the feature of the given utterance from the given input feature-container and put it to the given output feature-container.

Parameters:
  • utterance (Utterance) – The utterance to process.
  • in_feat_container (FeatureContainer) – The feature-container to read the input frames.
  • out_feat_container (FeatureContainer) – The feature-container to store the output.
  • corpus (Corpus) – The corpus where the utterance is from, if available.
class audiomate.corpus.preprocessing.OfflineProcessor[source]

This class should be used for feature extraction in batch mode (one full utterance in a step).

For implementing a specific offline-processor, the process_sequence method has to be implemented:

  • As input the method receives a 2-Dimensional array of frames (n-frames x n-samples-per-frame).
  • It must return a array with the first dimension of the same size as the input.

Note

The samples are padded with zeros to match the number of frames equal to math.ceil((num_samples - self.frame_size) / self.hop_size + 1).

process_sequence(frames, sampling_rate, utterance=None, corpus=None)[source]

Process the given frames, which represent an utterance.

Parameters:
  • frames (numpy.ndarray) – (n-frames x n-samples-per-frame) Frames.
  • sampling_rate (int) – The sampling rate of the underlying signal.
  • corpus (Corpus) – The corpus where the data is from, if available.
  • utterance (Utterance) – The utterance the data is from, if available.
Returns:

(n-frames x …) The features extracted from the given samples.

Return type:

numpy.ndarray

process_utterance(utterance, feature_container, corpus=None, frame_size=400, hop_size=160, sr=None)[source]

Extract features of the given utterances and put it in the given feature-container.

Parameters:
  • utterance (Utterance) – The utterance to process.
  • feature_container (FeatureContainer) – The feature-container to store the output.
  • corpus (Corpus) – The corpus where the utterance is from, if available.
  • frame_size (int) – The number of samples per frame.
  • hop_size (int) – The number of samples between two frames.
  • sr (int) – Use the given sampling rate. If None uses the native sampling rate from the file.
process_utterance_from_feature_container(utterance, in_feat_container, out_feat_container, corpus=None)[source]

Process the feature of the given utterance from the given input feature-container and put it to the given output feature-container.

Parameters:
  • utterance (Utterance) – The utterance to process.
  • in_feat_container (FeatureContainer) – The feature-container to read the input frames.
  • out_feat_container (FeatureContainer) – The feature-container to store the output.
  • corpus (Corpus) – The corpus where the utterance is from, if available.

Pipeline

This module contains classes for creating a preprocessing/feature-extraction pipeline.

There are different classes for offline and online processing, subclassing either audiomate.corpus.preprocessing.OfflineProcessor or audiomate.corpus.preprocessing.OnlineProcessor.

A pipeline consists of one of two types of steps. A computation step takes data from a previous step or the input and processes it. A reduction step is used to merge outputs of multiple previous steps. It takes outputs of all incoming steps and outputs a single data block.

The steps are managed as a directed graph, which is built by passing the parent steps to the __init__ method of a step. Every step that is created has his own graph, but inherits all nodes and edges of the graphs of his parent steps.

class audiomate.corpus.preprocessing.pipeline.base.Step(name=None)[source]

This class is the base class for a step in a processing pipeline.

It handles the procedure of executing the pipeline. It makes sure the steps are computed in the correct order. It also provides the correct inputs to every step.

Every step has to provide a compute method which is the actual processing.

Parameters:name (str, optional) – A name for identifying the step.
process(data, sampling_rate, **kwargs)[source]

Execute the processing of this step and all dependent parent steps.

class audiomate.corpus.preprocessing.pipeline.base.Computation(parent=None, name=None)[source]

Base class for a computation step.

Parameters:
  • parent (Step, optional) – The parent step this step depends on.
  • name (str, optional) – A name for identifying the step.
class audiomate.corpus.preprocessing.pipeline.base.Reduction(parents, name=None)[source]

Base class for a reduction step.

Parameters:
  • parents (list) – List of parent steps this step depends on.
  • name (str, optional) – A name for identifying the step.

As for the processor there are different subclasses for either offline or online pipelines.

class audiomate.corpus.preprocessing.pipeline.offline.OfflineComputation(parent=None, name=None)[source]

Base class for a computation step in a offline processing pipeline.

process(frames, sampling_rate, corpus=None, utterance=None)[source]

Execute the processing of this step and all dependent parent steps.

process_sequence(frames, sampling_rate, utterance=None, corpus=None)[source]

Process the given frames, which represent an utterance.

Parameters:
  • frames (numpy.ndarray) – (n-frames x n-samples-per-frame) Frames.
  • sampling_rate (int) – The sampling rate of the underlying signal.
  • corpus (Corpus) – The corpus where the data is from, if available.
  • utterance (Utterance) – The utterance the data is from, if available.
Returns:

(n-frames x …) The features extracted from the given samples.

Return type:

numpy.ndarray

class audiomate.corpus.preprocessing.pipeline.offline.OfflineReduction(parents, name=None)[source]

Base class for a reduction step in a offline processing pipeline.

process(frames, sampling_rate, corpus=None, utterance=None)[source]

Execute the processing of this step and all dependent parent steps.

process_sequence(frames, sampling_rate, utterance=None, corpus=None)[source]

Process the given frames, which represent an utterance.

Parameters:
  • frames (numpy.ndarray) – (n-frames x n-samples-per-frame) Frames.
  • sampling_rate (int) – The sampling rate of the underlying signal.
  • corpus (Corpus) – The corpus where the data is from, if available.
  • utterance (Utterance) – The utterance the data is from, if available.
Returns:

(n-frames x …) The features extracted from the given samples.

Return type:

numpy.ndarray

Implementations

Some preprocessing steps are already implemented.

Offline

Implementations of offline preprocessing steps.
Name Description
MeanVarianceNorm Normalizes features with given mean and variance.
MelSpectrogram Exctracts MelSpectrogram features.
MFCC Extracts MFCC features.
class audiomate.corpus.preprocessing.pipeline.offline.MeanVarianceNorm(mean, variance, parent=None, name=None)[source]

Pre-processing step to normalize mean and variance.

frame = (frame - mean) / sqrt(variance)

Parameters:
  • mean (float) – The mean to use for normalization.
  • variance (float) – The variance to use for normalization.s
class audiomate.corpus.preprocessing.pipeline.offline.MelSpectrogram(n_mels=128, parent=None, name=None)[source]

Computation step that extracts mel-spectrogram features from the given frames.

Based on http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html#

Parameters:n_mels (int) – Number of mel bands to generate.
class audiomate.corpus.preprocessing.pipeline.offline.MFCC(n_mfcc=13, n_mels=128, parent=None, name=None)[source]

Computation step that extracts mfcc features from the given frames.

Based on http: // librosa.github.io / librosa / generated / librosa.feature.mfcc.html

Parameters:
  • n_mels (int) – Number of mel bands to generate.
  • n_mfcc (int) – number of MFCCs to return.