src.data package

Submodules

src.data.balanced_image_data_reader module

This file implements an image data reader which balances data.

class src.data.balanced_image_data_reader.BalancedImageDataReader(folder: Optional[str] = None)

Bases: ImageDataReader

Class that reads images from folders in a balanced way. This means that of all classes, there should be an approximately equal amount of images from that class. This means that some images from underrepresented classes might appear twice and some images from overrepresented classes might not appear at all. Note: Has higher memory requirements than other Data Readers.

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the text dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dictionary

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.balanced_plant_exp_reader module

This data reader reads the PlantSpikerBox data from the experiments.

class src.data.balanced_plant_exp_reader.BalancedPlantExperimentDataReader(folder: str = 'data/plant', default_label_mode: str = 'expected')

Bases: ExperimentDataReader

This data reader reads the plant spiker box files from the experiments and balances the classes exactly.

cleanup(parameters: Optional[Dict] = None) None

Function that cleans up the big data arrays for memory optimization.

Parameters:

parameters – Parameter Dictionary

get_input_shape(parameters: Dict) Tuple[int]

Returns the shape of a preprocessed sample.

Parameters:

parameters – Parameter dictionary

Returns:

Tuple that is the shape of the sample.

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

This function returns labels for the dataset

Parameters:
  • which_set – Which set to get the labels for.

  • parameters – Additional parameters.

Returns:

Label numpy array

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the plant data into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.balanced_watch_exp_reader module

This data reader reads the watch data from the experiments.

class src.data.balanced_watch_exp_reader.BalancedWatchExperimentDataReader(folder: str = 'data/watch', default_label_mode: str = 'expected')

Bases: ExperimentDataReader

This data reader reads the watch data files from the experiments and balances the classes exactly.

get_input_shape(parameters: Dict) tuple

Returns the shape of a preprocessed sample.

Parameters:

parameters – Parameter dictionary

Returns:

Tuple that is the shape of the sample.

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

This function returns labels for the dataset

Parameters:
  • which_set – Which set to get the labels for.

  • parameters – Additional parameters.

Returns:

Label numpy array

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the watch data into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.classwise_speech_data_reader module

This file implements classwise data reading for speech data.

class src.data.classwise_speech_data_reader.ClasswiseSpeechDataReader(name: str = 'classwise_speech', folder: Optional[str] = None)

Bases: DataReader

Class that reads the speech datasets per class. This means that the data extraction methods return one array per class. This is required for HMM and GMM classifiers which need all data for one class at the same time and do not support batching like NNs.

get_crema_samples(crema_d: DatasetV2, class_name: str) ndarray

Gets the samples from a specified class from the crema dataset

Parameters:
  • crema_d – The entire crema dataset instance

  • class_name – The class to extract from crema_d

Returns:

A numpy array with the extracted data

get_file_samples(emotion_class: str, data_dir: str) ndarray

Extract the data from a specific class from disk

Parameters:
  • emotion_class – The class to load from disk

  • data_dir – The directory on disk that contains the data

Returns:

Numpy array with the data

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the text dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dictionary

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) Generator[Tuple[ndarray, str], None, None]

Main data reading function which reads the audio files and then returns them one class at a time.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

Generator that yields (array, class name)

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) Generator[Tuple[ndarray, str], None, None]

Main data reading function which reads the audio data from disk.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

static get_waveform_and_label(file_path: bytes) Tuple[Tensor, Tensor]

Preprocessing function for the audio files that are read from the data folder. Files are read, decoded and padded or truncated.

Parameters:

file_path – The path of one audio file to read.

Returns:

Audio tensor and label tensor in a tuple

static map_emotions(data: ndarray, labels: ndarray)

Conversion function that is applied when three emotion labels are required.

Parameters:
  • data – The emotions data.

  • labels – The labels that are to be converted to three emotions.

static process_crema(x: ndarray, y: int) Tuple[Tensor, Tensor]

Preprocessing function for the crema dataset read from tensorflow_datasets package.

Parameters:
  • x – The audio data

  • y – The label data

Returns:

Processed audio and label data

src.data.comparison_image_data_reader module

This file implements the data reading functionality for the image data from the comparison dataset.

class src.data.comparison_image_data_reader.ComparisonImageDataReader(name: str = 'comparison_image', folder: Optional[str] = None)

Bases: DataReader

Class that reads the comparison dataset image data

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the image dataset in an array

Parameters:
  • which_set – Train, val or test set - only test allowed here

  • parameters – Parameter dictionary

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the images into a dataset

Parameters:
  • which_set – Which dataset to use - only test is allowed here

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the image folders into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - test only

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.comparison_speech_data_reader module

This file implements the data reading functionality for the speech data from the comparison dataset.

class src.data.comparison_speech_data_reader.ComparisonSpeechDataReader(name: str = 'comparison_speech', folder: Optional[str] = None)

Bases: DataReader

Class that reads the comparison speech dataset

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the text dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dictionary

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the audio files into a dataset

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the audio data from disk.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

static get_waveform_and_label(file_path: bytes) Tuple[Tensor, Tensor]

Preprocessing function for the audio files that are read from the data folder. Files are read, decoded and padded or truncated.

Parameters:

file_path – The path of one audio file to read.

Returns:

Audio tensor and label tensor in a tuple

static map_emotions(data: ndarray, labels: ndarray)

Conversion function that is applied when three emotion labels are required.

Parameters:
  • data – The emotional data.

  • labels – The labels that need to be converted to three emotions.

static set_tensor_shapes(x: Tensor, y: Tensor) Tuple[Tensor, Tensor]

Function that sets the tensor shapes in the dataset manually. This fixes an issue where using Dataset.map and numpy_function causes the tensor shape to be unknown. See the issue here: https://github.com/tensorflow/tensorflow/issues/47032

Parameters:
  • x – The speech tensor

  • y – The labels tensor

Returns:

Tuple with speech and labels tensor

src.data.comparison_text_data_reader module

This file implements the data reading functionality for text data from the comparison dataset.

class src.data.comparison_text_data_reader.ComparisonTextDataReader(folder: Optional[str] = None)

Bases: DataReader

Class that reads the CSV datasets from the data/train/text folder

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the text dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dict (unused)

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.data_factory module

This class implements a factory for easy access to data readers and data

class src.data.data_factory.DataFactory

Bases: object

The Data Factory returning data readers or data sets

static get_data_reader(data_type: str, data_folder=None) DataReader

This factory method returns a data reader instance

Parameters:
  • data_type – The type of data to return the reader for

  • data_folder – Override data folder for the data reader

Raises:

ValueError – If the data_type does not exist

Returns:

A DataReader for the specified data type

static get_dataset(data_type: str, which_set: Set, emotions: str = 'neutral_ekman', batch_size: int = 64, data_folder: Optional[str] = None, parameters: Optional[Dict] = None) DatasetV2

Get a specific dataset from a data reader

Parameters:
  • data_type – The data type to consider

  • which_set – Which dataset to return: train, val or test

  • emotions – Which emotion set to use: neutral_ekman or three

  • batch_size – The batch size for the returned dataset

  • data_folder – The folder where data is stored

  • parameters – Additional parameters for creating data

Raises:

ValueError – If the emotion type is not available

Returns:

Dataset instance that was requested

src.data.data_reader module

This file implements that basic functions for data reading

class src.data.data_reader.DataReader(name: str, folder: str)

Bases: ABC

The DataReader class is responsible for creating a tensorflow DataSet which is used for training and evaluating the emotion detection models.

cleanup(parameters: Optional[Dict] = None) None

Optional cleanup method that deletes unneccessary memory elements.

Parameters:

parameters – Parameters that might be required

static convert_to_numpy(dataset: DatasetV2) Tuple[ndarray, ndarray]

Converts a given tensorflow dataset into a single numpy array

Parameters:

dataset – The dataset to convert to numpy

Returns:

Tuple containing two array: - numpy array containing data from all batches - numpy array containing labels from all batches

static convert_to_three_emotions(labels: ndarray) ndarray

Convert the NeutralEkmanEmotion labels to the ThreeEmotionSet

Parameters:

labels – The integer labels from 0-6 in NeutralEkman format

Returns:

The integer labels from 0-2 in ThreeEmotion format

static convert_to_three_emotions_onehot(labels: ndarray) ndarray

Convert the NeutralEkmanEmotion labels to the ThreeEmotionSet

Parameters:

labels – The integer labels from 0-6 in a one-hot encoding -> shape (n, 7)

Returns:

The integer labels from 0-2 in ThreeEmotion format in one-hot encoding: shape (n,3)

get_emotion_data(emotions: str = 'neutral_ekman', which_set: Set = Set.TRAIN, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Method that returns a dataset depending on the emotion set.

Parameters:
  • emotions – The emotion set to use: neutral_ekman or three

  • which_set – train, test or val set

  • batch_size – The batch size for the dataset

  • parameters – Additional arguments

Returns:

The obtained dataset

abstract get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Method that gets only the labels for the dataset that is specified

Parameters:
  • which_set – Which set to use, train, val or test

  • parameters – Parameter dictionary

Returns:

An array of labels in shape (num_samples,)

abstract get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main method which loads the data from disk into a Dataset instance

Parameters:
  • which_set – Which set to use, can be either train, val or test

  • batch_size – The batch size for the requested dataset

  • parameters – Additional parameters

Returns:

The Dataset instance to use in the emotion classifiers

abstract get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Method that loads the dataset from disk and stores the labels in the ThreeEmotionSet instead of the NeutralEkmanEmotionSet

Parameters:
  • which_set – train, val or test set distinguisher

  • batch_size – the batch size for the dataset

  • parameters – Additional arguments

Returns:

The Dataset that contains data and labels

static map_emotions(data, labels)

Conversion function that is applied when three emotion labels are required.

Parameters:
  • data – The emotional data.

  • labels – The labels that need to be converted to three emotions.

class src.data.data_reader.Set(value)

Bases: IntEnum

Define the different set types that are available

ALL = 3
TEST = 2
TRAIN = 0
VAL = 1

src.data.experiment_data_reader module

This file contains a base class for data readers that read experiment related data and implements common functionality.

class src.data.experiment_data_reader.ExperimentDataReader(name: str, folder: str)

Bases: DataReader

This is the base class for all experiment related data readers.

static get_complete_data_indices() List[int]

Static method that returns all experiment indices that have complete data and are supposed to be used in the evaluation.

Returns:

List of experiment indices.

get_emotion_times() Dict[str, Dict[str, float]]

This function returns start and end times for every emotion in the experiments.

Returns:

The start and end time for every emotion.

abstract get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Return the labels for the unsorted data in the dataset.

Parameters:
  • which_set – Which set to get labels for

  • parameters – Additional parameters

Returns:

Numpy array of labels.

abstract get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

The abstract method for getting the dataset to train on.

Parameters:
  • which_set – Training, Validation or Test Set

  • batch_size – Batch Size for the dataset

  • parameters – Additional parameters.

Returns:

A tensorflow Dataset instance.

abstract get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

The abstract method for getting the dataset to train on. This method should return only three emotions.

Parameters:
  • which_set – Training, Validation or Test Set

  • batch_size – Batch Size for the dataset

  • parameters – Additional parameters.

Returns:

A tensorflow Dataset instance.

src.data.fusion_data_reader module

This data reader reads the fusion data from the experiments.

class src.data.fusion_data_reader.FusionProbDataReader(folder: Optional[str] = None)

Bases: ExperimentDataReader

This data reader reads fusion data from the experiments

get_data_generator(which_set: Set, parameters: Dict) Generator[Tuple[ndarray, ndarray], None, None]

Generator that generates the data

Parameters:
  • which_set – Train, val or test set

  • parameters – Additional parameters including: - window: The length of the window to use in seconds

Returns:

Generator that yields data and label.

get_input_shape(parameters: Dict) Tuple[int]

Returns the shape of a concatenated input sample.

Parameters:

parameters – Parameter dictionary

Returns:

Tuple that is the shape of the sample.

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

This function returns labels for the dataset

Parameters:
  • which_set – Which set to get the labels for.

  • parameters – Additional parameters.

Returns:

Label numpy array

get_raw_data(parameters: Dict) tuple[numpy.ndarray, numpy.ndarray]

Function that reads all experiment emotion probabilities from the data/continuous folder.

Parameters:

parameters – Parameters for the data reading process

Returns:

Tuple with samples, labels

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Method that returns a dataset of fusion probabilities.

Parameters:
  • which_set – Which set to use.

  • batch_size – Batch size for the dataset.

  • parameters – Additional parameters.

Returns:

Dataset instance.

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Create a dataset that uses only three emotions.

Parameters:
  • which_set – Which set: Train, val or test

  • batch_size – Batch size

  • parameters – Additional parameters

Returns:

Dataset with three emotion labels.

split_set(all_data: ndarray, all_labels: ndarray, which_set: Set) tuple[numpy.ndarray, numpy.ndarray]

Split all labels into train, val and test sets.

Parameters:
  • all_data – All data array shape (n_exp * 613, n_modalities * 7)

  • all_labels – All corresponding labels (n_exp * 613,)

  • which_set – Train, Val or Test set

Returns:

Training, validation or test set as specified

src.data.image_data_reader module

This file implements the data reading functionality for image data.

class src.data.image_data_reader.ImageDataReader(name: str = 'image', folder: Optional[str] = None)

Bases: DataReader

Class that reads the image dataset from the data/train/image folder

add_augmentations(dataset: DatasetV2, use_augmentations: bool = True)

Function that adds augmentation to the dataset. This helps reduce overfitting of the model.

Parameters:
  • dataset – The dataset containing images

  • use_augmentations – Boolean flag to enable augmentation

Returns:

The dataset with augmented images

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the image dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dictionary

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the images into a dataset

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the image folders into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.plant_exp_reader module

This data reader reads the PlantSpikerBox data from the experiments.

class src.data.plant_exp_reader.PlantExperimentDataReader(folder: str = 'data/plant', default_label_mode: str = 'expected')

Bases: ExperimentDataReader

This data reader reads the plant spiker box files from the experiments

cleanup(parameters: Optional[Dict] = None) None

Cleanup method to free RAM which due to a bug in garbage collection is not cleared up automatically.

Parameters:

parameters – Parameters.

get_cross_validation_indices(which_set: Set, parameters: Dict) List[int]

Generate a list of indices according to CrossValidation.

Parameters:
  • which_set – Which set to use.

  • parameters – Additional parameters including: - cv_portions: Number of cv splits to do. - cv_index: Which split to use.

Returns:

List of indexes in a cv form.

get_data_generator(which_set: Set, parameters: Dict) Generator[Tuple[ndarray, ndarray], None, None]

Generator that generates the data

Parameters:
  • which_set – Train, val or test set

  • parameters – Additional parameters including: - window: The length of the window to use in seconds

Returns:

Generator that yields data and label.

get_input_shape(parameters: Dict) Tuple[int]

Returns the shape of a preprocessed sample.

Parameters:

parameters – Parameter dictionary

Returns:

Tuple that is the shape of the sample.

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

This function returns labels for the dataset

Parameters:
  • which_set – Which set to get the labels for.

  • parameters – Additional parameters.

Returns:

Label numpy array

get_raw_data(parameters: Dict) None

Load the raw plant data from the wave files and split it into windows according to the parameters.

Parameters:

parameters – Additional parameters

get_raw_expected_labels() ndarray

Load the raw emotions from the expected emotions during the video. The expected emotion means that while the participant is watching a happy video, we expect them to be happy, thus the label is happy.

Returns:

Labels that are expected from the user.

get_raw_faceapi_labels() ndarray

Load the raw labels from the faceapi output files.

Returns:

Labels that are collected from the user’s face expression.

get_raw_labels(label_mode: str) ndarray

Get the raw labels per experiment and time. Populates the raw_labels member of this class. The two axis are [experiment_index, time_in_seconds]

Parameters:

label_mode – Whether to use expected or faceapi labels

Returns:

Array of all labels in shape (file, second)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Method that returns a dataset of plant data.

Parameters:
  • which_set – Which set to use.

  • batch_size – Batch size for the dataset.

  • parameters – Additional parameters.

Returns:

Dataset instance.

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Create a dataset that uses only three emotions.

Parameters:
  • which_set – Which set: Train, val or test

  • batch_size – Batch size

  • parameters – Additional parameters

Returns:

Dataset with three emotion labels.

static prepare_faceapi_labels() None

This function prepares the faceapi labels if they are not computed yet.

static preprocess_sample(sample: ndarray, parameters: Optional[Dict] = None) ndarray

Gets a sample with shape (window_size * 10000,) and then preprocesses it before using it in the classifier.

Parameters:
  • sample – The data sample to preprocess.

  • parameters – Additional parameters for preprocessing.

Returns:

The preprocessed sample.

src.data.speech_data_reader module

This file implements the data reading functionality for speech data.

class src.data.speech_data_reader.SpeechDataReader(name: str = 'speech', folder: Optional[str] = None)

Bases: DataReader

Class that reads the speech datasets

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the text dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dictionary

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the audio files into a dataset

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the audio data from disk.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

static get_waveform_and_label(file_path: bytes) Tuple[Tensor, Tensor]

Preprocessing function for the audio files that are read from the data folder. Files are read, decoded and padded or truncated.

Parameters:

file_path – The path of one audio file to read.

Returns:

Audio tensor and label tensor in a tuple

static map_emotions(data: ndarray, labels: ndarray)

Conversion function that is applied when three emotion labels are required.

Parameters:
  • data – The emotional data.

  • labels – The labels that need to be converted to three emotions.

static process_crema(x: ndarray, y: int) Tuple[Tensor, Tensor]

Preprocessing function for the crema dataset read from tensorflow_datasets package.

Parameters:
  • x – The audio data

  • y – The label data

Returns:

Processed audio and label data

static set_tensor_shapes(x: Tensor, y: Tensor) Tuple[Tensor, Tensor]

Function that sets the tensor shapes in the dataset manually. This fixes an issue where using Dataset.map and numpy_function causes the tensor shape to be unknown. See the issue here: https://github.com/tensorflow/tensorflow/issues/47032

Parameters:
  • x – The speech tensor

  • y – The labels tensor

Returns:

Tuple with speech and labels tensor

src.data.text_data_reader module

This file implements the data reading functionality for text data.

class src.data.text_data_reader.TextDataReader(folder: str = 'data/train/text')

Bases: DataReader

Class that reads the CSV datasets from the data/train/text folder

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

Get the labels for the text dataset that is specified in an array

Parameters:
  • which_set – Train, val or test set

  • parameters – Parameter dict (unused)

Returns:

The labels in an array of shape (num_samples,)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional parameters

Returns:

The tensorflow Dataset instance

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.

Parameters:
  • which_set – Which dataset to use - train, val or test

  • batch_size – The batch size for the resulting dataset

  • parameters – Additional arguments

Returns:

The tensorflow Dataset instance

src.data.watch_exp_reader module

This data reader reads the Happimeter data from the experiments.

class src.data.watch_exp_reader.WatchExperimentDataReader(folder: str = 'data/watch', default_label_mode: str = 'expected')

Bases: ExperimentDataReader

This data reader reads the watch csv files from the experiments

get_cross_validation_indices(which_set: Set, parameters: Dict) List[int]

Generate a list of indices according to CrossValidation.

Parameters:
  • which_set – Which set to use.

  • parameters – Additional parameters including: - cv_portions: Number of cv splits to do. - cv_index: Which split to use.

Returns:

List of indexes in a cv form.

get_data_generator(which_set: Set, parameters: Dict) Generator[Tuple[ndarray, ndarray], None, None]

Generator that generates the data

Parameters:
  • which_set – Train, val or test set

  • parameters – Additional parameters including: - window: The length of the window to use in seconds

Returns:

Generator that yields data and label.

static get_input_shape(parameters: Dict) tuple

Returns the shape of a preprocessed sample.

Parameters:

parameters – Parameter dictionary

Returns:

Tuple that is the shape of the sample.

get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray

This function returns labels for the dataset

Parameters:
  • which_set – Which set to get the labels for.

  • parameters – Additional parameters.

Returns:

Label numpy array

get_raw_data(parameters: Dict) None

Load the raw watch data from the csv files and split it into windows according to the parameters.

Parameters:

parameters – Additional parameters

get_raw_expected_labels() ndarray

Load the raw emotions from the expected emotions during the video. The expected emotion means that while the participant is watching a happy video, we expect them to be happy, thus the label is happy.

Returns:

Labels that are expected from the user.

get_raw_faceapi_labels() ndarray

Load the raw labels from the faceapi output files.

Returns:

Labels that are collected from the user’s face expression.

get_raw_labels(label_mode: str) ndarray

Get the raw labels per experiment and time. Populates the raw_labels member of this class. The two axis are [experiment_index, time_in_seconds]

Parameters:

label_mode – Whether to use expected or faceapi labels

Returns:

Array of all labels in shape (file, second)

get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Method that returns a dataset of watch data.

Parameters:
  • which_set – Which set to use.

  • batch_size – Batch size for the dataset.

  • parameters – Additional parameters.

Returns:

Dataset instance.

get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2

Create a dataset that uses only three emotions.

Parameters:
  • which_set – Which set: Train, val or test

  • batch_size – Batch size

  • parameters – Additional parameters

Returns:

Dataset with three emotion labels.

static prepare_faceapi_labels() None

This function prepares the faceapi labels if they are not computed yet.

Module contents

Package responsible for data reading and processing