audiomate.corpus.validation

This module contains functions for validating a corpus on different properties. e.g. if the length of the utterance is to short for its corresponding transcription.

audiomate.corpus.validation.Validator is the base class for performing validations. It can be extended to implement validators for specific tests/validations. Thre result of every validator has to be a audiomate.corpus.validation.ValidationResult or a subclass of it.

Base

class audiomate.corpus.validation.Validator[source]

A validator is a class that tests a specific behaviour/state of a corpus.

name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform the validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
Returns:The result containing at least the pass/fail indication.
Return type:ValidationResult
class audiomate.corpus.validation.ValidationResult(passed, name='Validation', info=None)[source]

Representation of the result of a validation. The basic result just indicates a pass or fail. Depending on the validator it can be extended to hold more information (e.g. utterance-ids which triggered the task to fail).

Parameters:
  • passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
  • name (str) – The name of the validator, that produced the result.
  • info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
get_report()[source]

Return a string containing a report of the result. This can used to print or save to a text file.

Returns:String containing infos about the result
Return type:str
class audiomate.corpus.validation.InvalidItemsResult(passed, invalid_items, name='Validation', item_name='Utterances', info=None)[source]

A generic result class for validators that return a list of items (utterances, tracks) that were classified invalid. Besides the id of the item, a reason may be appended.

Parameters:
  • passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
  • invalid_items (dict) – A dictionary containing item-ids, that are invalid. The values are reasons why they are invalid.
  • name (str) – The name of the validator, that produced the result.
  • info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
get_report()[source]

Return a string containing a report of the result. This can used to print or save to a text file.

Returns:String containing infos about the result
Return type:str

Combination

class audiomate.corpus.validation.CombinedValidator(validators=None)[source]

The CombinedValidator is used to execute multiple validators at once.

Parameters:validators (list) – A list of validators that are executed.
name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
class audiomate.corpus.validation.CombinedValidationResult(passed, results=None, info=None)[source]

Result of running multiple validation-tasks with the validator.

Parameters:
  • passed (bool) – A boolean, indicating if all tasks have passed (True) or at least one failed (False).
  • results (dict) – A dictionary containing the results of all validators, with the task name as key.
  • info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
get_report()[source]

Return a string containing a report of the result. This can used to print or save to a text file.

Returns:String containing infos about the result
Return type:str

Label-List

class audiomate.corpus.validation.UtteranceTranscriptionRatioValidator(max_characters_per_second=10, label_list_idx='word-transcript')[source]

Checks if the ratio between utterance-duration and transcription-length is below a given ratio. This is used to find utterances where the speech transcription is to long for a given utterance, meaning too much characters per second.

Parameters:
  • max_characters_per_second (int) – If char/sec of an utterance is higher than this it is returned.
  • label_list_idx (str) – The label-list to use for validation.
name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform the validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
Returns:Validation result.
Return type:InvalidItemsResult
class audiomate.corpus.validation.LabelCountValidator(min_number_of_labels=1, label_list_idx='word-transcript')[source]

Checks if every utterance contains a label-list with the given id and has at least min_number_of_labels.

Parameters:
  • min_number_of_labels (int) – Minimum number of expected labels.
  • label_list_idx (str) – The label-list to use for validation.
name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform the validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
Returns:Validation result.
Return type:InvalidItemsResult
class audiomate.corpus.validation.LabelCoverageValidator(label_list_idx, threshold=0.01)[source]

Check if every portion of the utterance is covered with at least one label. The validator returns segments (start, end) of an utterance, where no label is defined within the given label-list.

Parameters:
  • label_list_idx (str) – The idx of the label-list to check.
  • threshold (float) – A threshold for the length of a segment to be considered as uncovered.
name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform the validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
Returns:Validation result.
Return type:LabelCoverageValidationResult
validate_utterance(utterance)[source]

Validate the given utterance and return a list of uncovered segments (start, end).

class audiomate.corpus.validation.LabelCoverageValidationResult(passed, uncovered_segments, name, info=None)[source]

Result of a the LabelCoverageValidator.

Parameters:
  • passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
  • uncovered_segments (dict) – A dictionary containing a list of uncovered segments for every utterance.
  • name (str) – The name of the validator, that produced the result.
  • info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
get_report()[source]

Return a string containing a report of the result. This can used to print or save to a text file.

Returns:String containing infos about the result
Return type:str
class audiomate.corpus.validation.LabelOverflowValidator(label_list_idx, threshold=0.01)[source]

Check if all labels are within the boundaries of an utterance. Finds all segments of labels that lie outside of an utterance.

Parameters:
  • label_list_idx (str) – The idx of the label-list to check.
  • threshold (float) – A threshold for a time distance to be considered for an overflow.
name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform the validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
Returns:Validation result.
Return type:LabelOverflowValidationResult
validate_utterance(utterance)[source]

Validate the given utterance and return a list of segments (start, end, label-value), that are outside of the utterance.

class audiomate.corpus.validation.LabelOverflowValidationResult(passed, overflow_segments, name, info=None)[source]

Result of a the LabelOverflowValidator.

Parameters:
  • passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
  • overflow_segments (dict) – A dictionary containing a list of overflowing segments for every utterance.
  • name (str) – The name of the validator, that produced the result.
  • info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
get_report()[source]

Return a string containing a report of the result. This can used to print or save to a text file.

Returns:String containing infos about the result
Return type:str

Track

class audiomate.corpus.validation.TrackReadValidator[source]

Check if the track can be opened and read.

name()[source]

Return a name, identifying the task.

validate(corpus)[source]

Perform the validation on the given corpus.

Parameters:corpus (Corpus) – The corpus to test/validate.
Returns:Validation result.
Return type:InvalidItemsResult