audiomate.corpus.validation¶
This module contains functions for validating a corpus on different properties. e.g. if the length of the utterance is to short for its corresponding transcription.
audiomate.corpus.validation.Validator
is the base class for performing validations.
It can be extended to implement validators for specific tests/validations.
Thre result of every validator has to be a audiomate.corpus.validation.ValidationResult
or a subclass of it.
Base¶
-
class
audiomate.corpus.validation.
Validator
[source]¶ A validator is a class that tests a specific behaviour/state of a corpus.
-
validate
(corpus)[source]¶ Perform the validation on the given corpus.
Parameters: corpus (Corpus) – The corpus to test/validate. Returns: The result containing at least the pass/fail indication. Return type: ValidationResult
-
-
class
audiomate.corpus.validation.
ValidationResult
(passed, name='Validation', info=None)[source]¶ Representation of the result of a validation. The basic result just indicates a pass or fail. Depending on the validator it can be extended to hold more information (e.g. utterance-ids which triggered the task to fail).
Parameters: - passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
- name (str) – The name of the validator, that produced the result.
- info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
-
class
audiomate.corpus.validation.
InvalidItemsResult
(passed, invalid_items, name='Validation', item_name='Utterances', info=None)[source]¶ A generic result class for validators that return a list of items (utterances, tracks) that were classified invalid. Besides the id of the item, a reason may be appended.
Parameters: - passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
- invalid_items (dict) – A dictionary containing item-ids, that are invalid. The values are reasons why they are invalid.
- name (str) – The name of the validator, that produced the result.
- info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
Combination¶
-
class
audiomate.corpus.validation.
CombinedValidator
(validators=None)[source]¶ The CombinedValidator is used to execute multiple validators at once.
Parameters: validators (list) – A list of validators that are executed.
-
class
audiomate.corpus.validation.
CombinedValidationResult
(passed, results=None, info=None)[source]¶ Result of running multiple validation-tasks with the validator.
Parameters: - passed (bool) – A boolean, indicating if all tasks have passed (True) or at least one failed (False).
- results (dict) – A dictionary containing the results of all validators, with the task name as key.
- info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
Label-List¶
-
class
audiomate.corpus.validation.
UtteranceTranscriptionRatioValidator
(max_characters_per_second=10, label_list_idx='word-transcript')[source]¶ Checks if the ratio between utterance-duration and transcription-length is below a given ratio. This is used to find utterances where the speech transcription is to long for a given utterance, meaning too much characters per second.
Parameters: - max_characters_per_second (int) – If char/sec of an utterance is higher than this it is returned.
- label_list_idx (str) – The label-list to use for validation.
-
validate
(corpus)[source]¶ Perform the validation on the given corpus.
Parameters: corpus (Corpus) – The corpus to test/validate. Returns: Validation result. Return type: InvalidItemsResult
-
class
audiomate.corpus.validation.
LabelCountValidator
(min_number_of_labels=1, label_list_idx='word-transcript')[source]¶ Checks if every utterance contains a label-list with the given id and has at least min_number_of_labels.
Parameters: - min_number_of_labels (int) – Minimum number of expected labels.
- label_list_idx (str) – The label-list to use for validation.
-
validate
(corpus)[source]¶ Perform the validation on the given corpus.
Parameters: corpus (Corpus) – The corpus to test/validate. Returns: Validation result. Return type: InvalidItemsResult
-
class
audiomate.corpus.validation.
LabelCoverageValidator
(label_list_idx, threshold=0.01)[source]¶ Check if every portion of the utterance is covered with at least one label. The validator returns segments (start, end) of an utterance, where no label is defined within the given label-list.
Parameters: - label_list_idx (str) – The idx of the label-list to check.
- threshold (float) – A threshold for the length of a segment to be considered as uncovered.
-
validate
(corpus)[source]¶ Perform the validation on the given corpus.
Parameters: corpus (Corpus) – The corpus to test/validate. Returns: Validation result. Return type: LabelCoverageValidationResult
-
class
audiomate.corpus.validation.
LabelCoverageValidationResult
(passed, uncovered_segments, name, info=None)[source]¶ Result of a the
LabelCoverageValidator
.Parameters: - passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
- uncovered_segments (dict) – A dictionary containing a list of uncovered segments for every utterance.
- name (str) – The name of the validator, that produced the result.
- info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.
-
class
audiomate.corpus.validation.
LabelOverflowValidator
(label_list_idx, threshold=0.01)[source]¶ Check if all labels are within the boundaries of an utterance. Finds all segments of labels that lie outside of an utterance.
Parameters: - label_list_idx (str) – The idx of the label-list to check.
- threshold (float) – A threshold for a time distance to be considered for an overflow.
-
validate
(corpus)[source]¶ Perform the validation on the given corpus.
Parameters: corpus (Corpus) – The corpus to test/validate. Returns: Validation result. Return type: LabelOverflowValidationResult
-
class
audiomate.corpus.validation.
LabelOverflowValidationResult
(passed, overflow_segments, name, info=None)[source]¶ Result of a the
LabelOverflowValidator
.Parameters: - passed (bool) – A boolean indicating, if the validation has passed (True) or failed (False).
- overflow_segments (dict) – A dictionary containing a list of overflowing segments for every utterance.
- name (str) – The name of the validator, that produced the result.
- info (dict) – Dictionary containing key/value string-pairs with detailed information of the validation. For example id of the label-list that was validated.