src.data package¶
Submodules¶
src.data.balanced_image_data_reader module¶
This file implements an image data reader which balances data.
- class src.data.balanced_image_data_reader.BalancedImageDataReader(folder: Optional[str] = None)¶
Bases:
ImageDataReader
Class that reads images from folders in a balanced way. This means that of all classes, there should be an approximately equal amount of images from that class. This means that some images from underrepresented classes might appear twice and some images from overrepresented classes might not appear at all. Note: Has higher memory requirements than other Data Readers.
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the text dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dictionary
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.balanced_plant_exp_reader module¶
This data reader reads the PlantSpikerBox data from the experiments.
- class src.data.balanced_plant_exp_reader.BalancedPlantExperimentDataReader(folder: str = 'data/plant', default_label_mode: str = 'expected')¶
Bases:
ExperimentDataReader
This data reader reads the plant spiker box files from the experiments and balances the classes exactly.
- cleanup(parameters: Optional[Dict] = None) None ¶
Function that cleans up the big data arrays for memory optimization.
- Parameters:
parameters – Parameter Dictionary
- get_input_shape(parameters: Dict) Tuple[int] ¶
Returns the shape of a preprocessed sample.
- Parameters:
parameters – Parameter dictionary
- Returns:
Tuple that is the shape of the sample.
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
This function returns labels for the dataset
- Parameters:
which_set – Which set to get the labels for.
parameters – Additional parameters.
- Returns:
Label numpy array
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the plant data into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.balanced_watch_exp_reader module¶
This data reader reads the watch data from the experiments.
- class src.data.balanced_watch_exp_reader.BalancedWatchExperimentDataReader(folder: str = 'data/watch', default_label_mode: str = 'expected')¶
Bases:
ExperimentDataReader
This data reader reads the watch data files from the experiments and balances the classes exactly.
- get_input_shape(parameters: Dict) tuple ¶
Returns the shape of a preprocessed sample.
- Parameters:
parameters – Parameter dictionary
- Returns:
Tuple that is the shape of the sample.
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
This function returns labels for the dataset
- Parameters:
which_set – Which set to get the labels for.
parameters – Additional parameters.
- Returns:
Label numpy array
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the watch data into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.classwise_speech_data_reader module¶
This file implements classwise data reading for speech data.
- class src.data.classwise_speech_data_reader.ClasswiseSpeechDataReader(name: str = 'classwise_speech', folder: Optional[str] = None)¶
Bases:
DataReader
Class that reads the speech datasets per class. This means that the data extraction methods return one array per class. This is required for HMM and GMM classifiers which need all data for one class at the same time and do not support batching like NNs.
- get_crema_samples(crema_d: DatasetV2, class_name: str) ndarray ¶
Gets the samples from a specified class from the crema dataset
- Parameters:
crema_d – The entire crema dataset instance
class_name – The class to extract from crema_d
- Returns:
A numpy array with the extracted data
- get_file_samples(emotion_class: str, data_dir: str) ndarray ¶
Extract the data from a specific class from disk
- Parameters:
emotion_class – The class to load from disk
data_dir – The directory on disk that contains the data
- Returns:
Numpy array with the data
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the text dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dictionary
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) Generator[Tuple[ndarray, str], None, None] ¶
Main data reading function which reads the audio files and then returns them one class at a time.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
Generator that yields (array, class name)
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) Generator[Tuple[ndarray, str], None, None] ¶
Main data reading function which reads the audio data from disk.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
- static get_waveform_and_label(file_path: bytes) Tuple[Tensor, Tensor] ¶
Preprocessing function for the audio files that are read from the data folder. Files are read, decoded and padded or truncated.
- Parameters:
file_path – The path of one audio file to read.
- Returns:
Audio tensor and label tensor in a tuple
- static map_emotions(data: ndarray, labels: ndarray)¶
Conversion function that is applied when three emotion labels are required.
- Parameters:
data – The emotions data.
labels – The labels that are to be converted to three emotions.
- static process_crema(x: ndarray, y: int) Tuple[Tensor, Tensor] ¶
Preprocessing function for the crema dataset read from tensorflow_datasets package.
- Parameters:
x – The audio data
y – The label data
- Returns:
Processed audio and label data
src.data.comparison_image_data_reader module¶
This file implements the data reading functionality for the image data from the comparison dataset.
- class src.data.comparison_image_data_reader.ComparisonImageDataReader(name: str = 'comparison_image', folder: Optional[str] = None)¶
Bases:
DataReader
Class that reads the comparison dataset image data
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the image dataset in an array
- Parameters:
which_set – Train, val or test set - only test allowed here
parameters – Parameter dictionary
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the images into a dataset
- Parameters:
which_set – Which dataset to use - only test is allowed here
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the image folders into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - test only
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.comparison_speech_data_reader module¶
This file implements the data reading functionality for the speech data from the comparison dataset.
- class src.data.comparison_speech_data_reader.ComparisonSpeechDataReader(name: str = 'comparison_speech', folder: Optional[str] = None)¶
Bases:
DataReader
Class that reads the comparison speech dataset
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the text dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dictionary
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the audio files into a dataset
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the audio data from disk.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
- static get_waveform_and_label(file_path: bytes) Tuple[Tensor, Tensor] ¶
Preprocessing function for the audio files that are read from the data folder. Files are read, decoded and padded or truncated.
- Parameters:
file_path – The path of one audio file to read.
- Returns:
Audio tensor and label tensor in a tuple
- static map_emotions(data: ndarray, labels: ndarray)¶
Conversion function that is applied when three emotion labels are required.
- Parameters:
data – The emotional data.
labels – The labels that need to be converted to three emotions.
- static set_tensor_shapes(x: Tensor, y: Tensor) Tuple[Tensor, Tensor] ¶
Function that sets the tensor shapes in the dataset manually. This fixes an issue where using Dataset.map and numpy_function causes the tensor shape to be unknown. See the issue here: https://github.com/tensorflow/tensorflow/issues/47032
- Parameters:
x – The speech tensor
y – The labels tensor
- Returns:
Tuple with speech and labels tensor
src.data.comparison_text_data_reader module¶
This file implements the data reading functionality for text data from the comparison dataset.
- class src.data.comparison_text_data_reader.ComparisonTextDataReader(folder: Optional[str] = None)¶
Bases:
DataReader
Class that reads the CSV datasets from the data/train/text folder
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the text dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dict (unused)
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.data_factory module¶
This class implements a factory for easy access to data readers and data
- class src.data.data_factory.DataFactory¶
Bases:
object
The Data Factory returning data readers or data sets
- static get_data_reader(data_type: str, data_folder=None) DataReader ¶
This factory method returns a data reader instance
- Parameters:
data_type – The type of data to return the reader for
data_folder – Override data folder for the data reader
- Raises:
ValueError – If the data_type does not exist
- Returns:
A DataReader for the specified data type
- static get_dataset(data_type: str, which_set: Set, emotions: str = 'neutral_ekman', batch_size: int = 64, data_folder: Optional[str] = None, parameters: Optional[Dict] = None) DatasetV2 ¶
Get a specific dataset from a data reader
- Parameters:
data_type – The data type to consider
which_set – Which dataset to return: train, val or test
emotions – Which emotion set to use: neutral_ekman or three
batch_size – The batch size for the returned dataset
data_folder – The folder where data is stored
parameters – Additional parameters for creating data
- Raises:
ValueError – If the emotion type is not available
- Returns:
Dataset instance that was requested
src.data.data_reader module¶
This file implements that basic functions for data reading
- class src.data.data_reader.DataReader(name: str, folder: str)¶
Bases:
ABC
The DataReader class is responsible for creating a tensorflow DataSet which is used for training and evaluating the emotion detection models.
- cleanup(parameters: Optional[Dict] = None) None ¶
Optional cleanup method that deletes unneccessary memory elements.
- Parameters:
parameters – Parameters that might be required
- static convert_to_numpy(dataset: DatasetV2) Tuple[ndarray, ndarray] ¶
Converts a given tensorflow dataset into a single numpy array
- Parameters:
dataset – The dataset to convert to numpy
- Returns:
Tuple containing two array: - numpy array containing data from all batches - numpy array containing labels from all batches
- static convert_to_three_emotions(labels: ndarray) ndarray ¶
Convert the NeutralEkmanEmotion labels to the ThreeEmotionSet
- Parameters:
labels – The integer labels from 0-6 in NeutralEkman format
- Returns:
The integer labels from 0-2 in ThreeEmotion format
- static convert_to_three_emotions_onehot(labels: ndarray) ndarray ¶
Convert the NeutralEkmanEmotion labels to the ThreeEmotionSet
- Parameters:
labels – The integer labels from 0-6 in a one-hot encoding -> shape (n, 7)
- Returns:
The integer labels from 0-2 in ThreeEmotion format in one-hot encoding: shape (n,3)
- get_emotion_data(emotions: str = 'neutral_ekman', which_set: Set = Set.TRAIN, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Method that returns a dataset depending on the emotion set.
- Parameters:
emotions – The emotion set to use: neutral_ekman or three
which_set – train, test or val set
batch_size – The batch size for the dataset
parameters – Additional arguments
- Returns:
The obtained dataset
- abstract get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Method that gets only the labels for the dataset that is specified
- Parameters:
which_set – Which set to use, train, val or test
parameters – Parameter dictionary
- Returns:
An array of labels in shape (num_samples,)
- abstract get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main method which loads the data from disk into a Dataset instance
- Parameters:
which_set – Which set to use, can be either train, val or test
batch_size – The batch size for the requested dataset
parameters – Additional parameters
- Returns:
The Dataset instance to use in the emotion classifiers
- abstract get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Method that loads the dataset from disk and stores the labels in the ThreeEmotionSet instead of the NeutralEkmanEmotionSet
- Parameters:
which_set – train, val or test set distinguisher
batch_size – the batch size for the dataset
parameters – Additional arguments
- Returns:
The Dataset that contains data and labels
- static map_emotions(data, labels)¶
Conversion function that is applied when three emotion labels are required.
- Parameters:
data – The emotional data.
labels – The labels that need to be converted to three emotions.
src.data.experiment_data_reader module¶
This file contains a base class for data readers that read experiment related data and implements common functionality.
- class src.data.experiment_data_reader.ExperimentDataReader(name: str, folder: str)¶
Bases:
DataReader
This is the base class for all experiment related data readers.
- static get_complete_data_indices() List[int] ¶
Static method that returns all experiment indices that have complete data and are supposed to be used in the evaluation.
- Returns:
List of experiment indices.
- get_emotion_times() Dict[str, Dict[str, float]] ¶
This function returns start and end times for every emotion in the experiments.
- Returns:
The start and end time for every emotion.
- abstract get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Return the labels for the unsorted data in the dataset.
- Parameters:
which_set – Which set to get labels for
parameters – Additional parameters
- Returns:
Numpy array of labels.
- abstract get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
The abstract method for getting the dataset to train on.
- Parameters:
which_set – Training, Validation or Test Set
batch_size – Batch Size for the dataset
parameters – Additional parameters.
- Returns:
A tensorflow Dataset instance.
- abstract get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
The abstract method for getting the dataset to train on. This method should return only three emotions.
- Parameters:
which_set – Training, Validation or Test Set
batch_size – Batch Size for the dataset
parameters – Additional parameters.
- Returns:
A tensorflow Dataset instance.
src.data.fusion_data_reader module¶
This data reader reads the fusion data from the experiments.
- class src.data.fusion_data_reader.FusionProbDataReader(folder: Optional[str] = None)¶
Bases:
ExperimentDataReader
This data reader reads fusion data from the experiments
- get_data_generator(which_set: Set, parameters: Dict) Generator[Tuple[ndarray, ndarray], None, None] ¶
Generator that generates the data
- Parameters:
which_set – Train, val or test set
parameters – Additional parameters including: - window: The length of the window to use in seconds
- Returns:
Generator that yields data and label.
- get_input_shape(parameters: Dict) Tuple[int] ¶
Returns the shape of a concatenated input sample.
- Parameters:
parameters – Parameter dictionary
- Returns:
Tuple that is the shape of the sample.
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
This function returns labels for the dataset
- Parameters:
which_set – Which set to get the labels for.
parameters – Additional parameters.
- Returns:
Label numpy array
- get_raw_data(parameters: Dict) tuple[numpy.ndarray, numpy.ndarray] ¶
Function that reads all experiment emotion probabilities from the data/continuous folder.
- Parameters:
parameters – Parameters for the data reading process
- Returns:
Tuple with samples, labels
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Method that returns a dataset of fusion probabilities.
- Parameters:
which_set – Which set to use.
batch_size – Batch size for the dataset.
parameters – Additional parameters.
- Returns:
Dataset instance.
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Create a dataset that uses only three emotions.
- Parameters:
which_set – Which set: Train, val or test
batch_size – Batch size
parameters – Additional parameters
- Returns:
Dataset with three emotion labels.
- split_set(all_data: ndarray, all_labels: ndarray, which_set: Set) tuple[numpy.ndarray, numpy.ndarray] ¶
Split all labels into train, val and test sets.
- Parameters:
all_data – All data array shape (n_exp * 613, n_modalities * 7)
all_labels – All corresponding labels (n_exp * 613,)
which_set – Train, Val or Test set
- Returns:
Training, validation or test set as specified
src.data.image_data_reader module¶
This file implements the data reading functionality for image data.
- class src.data.image_data_reader.ImageDataReader(name: str = 'image', folder: Optional[str] = None)¶
Bases:
DataReader
Class that reads the image dataset from the data/train/image folder
- add_augmentations(dataset: DatasetV2, use_augmentations: bool = True)¶
Function that adds augmentation to the dataset. This helps reduce overfitting of the model.
- Parameters:
dataset – The dataset containing images
use_augmentations – Boolean flag to enable augmentation
- Returns:
The dataset with augmented images
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the image dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dictionary
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the images into a dataset
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the image folders into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.plant_exp_reader module¶
This data reader reads the PlantSpikerBox data from the experiments.
- class src.data.plant_exp_reader.PlantExperimentDataReader(folder: str = 'data/plant', default_label_mode: str = 'expected')¶
Bases:
ExperimentDataReader
This data reader reads the plant spiker box files from the experiments
- cleanup(parameters: Optional[Dict] = None) None ¶
Cleanup method to free RAM which due to a bug in garbage collection is not cleared up automatically.
- Parameters:
parameters – Parameters.
- get_cross_validation_indices(which_set: Set, parameters: Dict) List[int] ¶
Generate a list of indices according to CrossValidation.
- Parameters:
which_set – Which set to use.
parameters – Additional parameters including: - cv_portions: Number of cv splits to do. - cv_index: Which split to use.
- Returns:
List of indexes in a cv form.
- get_data_generator(which_set: Set, parameters: Dict) Generator[Tuple[ndarray, ndarray], None, None] ¶
Generator that generates the data
- Parameters:
which_set – Train, val or test set
parameters – Additional parameters including: - window: The length of the window to use in seconds
- Returns:
Generator that yields data and label.
- get_input_shape(parameters: Dict) Tuple[int] ¶
Returns the shape of a preprocessed sample.
- Parameters:
parameters – Parameter dictionary
- Returns:
Tuple that is the shape of the sample.
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
This function returns labels for the dataset
- Parameters:
which_set – Which set to get the labels for.
parameters – Additional parameters.
- Returns:
Label numpy array
- get_raw_data(parameters: Dict) None ¶
Load the raw plant data from the wave files and split it into windows according to the parameters.
- Parameters:
parameters – Additional parameters
- get_raw_expected_labels() ndarray ¶
Load the raw emotions from the expected emotions during the video. The expected emotion means that while the participant is watching a happy video, we expect them to be happy, thus the label is happy.
- Returns:
Labels that are expected from the user.
- get_raw_faceapi_labels() ndarray ¶
Load the raw labels from the faceapi output files.
- Returns:
Labels that are collected from the user’s face expression.
- get_raw_labels(label_mode: str) ndarray ¶
Get the raw labels per experiment and time. Populates the raw_labels member of this class. The two axis are [experiment_index, time_in_seconds]
- Parameters:
label_mode – Whether to use expected or faceapi labels
- Returns:
Array of all labels in shape (file, second)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Method that returns a dataset of plant data.
- Parameters:
which_set – Which set to use.
batch_size – Batch size for the dataset.
parameters – Additional parameters.
- Returns:
Dataset instance.
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Create a dataset that uses only three emotions.
- Parameters:
which_set – Which set: Train, val or test
batch_size – Batch size
parameters – Additional parameters
- Returns:
Dataset with three emotion labels.
- static prepare_faceapi_labels() None ¶
This function prepares the faceapi labels if they are not computed yet.
- static preprocess_sample(sample: ndarray, parameters: Optional[Dict] = None) ndarray ¶
Gets a sample with shape (window_size * 10000,) and then preprocesses it before using it in the classifier.
- Parameters:
sample – The data sample to preprocess.
parameters – Additional parameters for preprocessing.
- Returns:
The preprocessed sample.
src.data.speech_data_reader module¶
This file implements the data reading functionality for speech data.
- class src.data.speech_data_reader.SpeechDataReader(name: str = 'speech', folder: Optional[str] = None)¶
Bases:
DataReader
Class that reads the speech datasets
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the text dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dictionary
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the audio files into a dataset
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the audio data from disk.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
- static get_waveform_and_label(file_path: bytes) Tuple[Tensor, Tensor] ¶
Preprocessing function for the audio files that are read from the data folder. Files are read, decoded and padded or truncated.
- Parameters:
file_path – The path of one audio file to read.
- Returns:
Audio tensor and label tensor in a tuple
- static map_emotions(data: ndarray, labels: ndarray)¶
Conversion function that is applied when three emotion labels are required.
- Parameters:
data – The emotional data.
labels – The labels that need to be converted to three emotions.
- static process_crema(x: ndarray, y: int) Tuple[Tensor, Tensor] ¶
Preprocessing function for the crema dataset read from tensorflow_datasets package.
- Parameters:
x – The audio data
y – The label data
- Returns:
Processed audio and label data
- static set_tensor_shapes(x: Tensor, y: Tensor) Tuple[Tensor, Tensor] ¶
Function that sets the tensor shapes in the dataset manually. This fixes an issue where using Dataset.map and numpy_function causes the tensor shape to be unknown. See the issue here: https://github.com/tensorflow/tensorflow/issues/47032
- Parameters:
x – The speech tensor
y – The labels tensor
- Returns:
Tuple with speech and labels tensor
src.data.text_data_reader module¶
This file implements the data reading functionality for text data.
- class src.data.text_data_reader.TextDataReader(folder: str = 'data/train/text')¶
Bases:
DataReader
Class that reads the CSV datasets from the data/train/text folder
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
Get the labels for the text dataset that is specified in an array
- Parameters:
which_set – Train, val or test set
parameters – Parameter dict (unused)
- Returns:
The labels in an array of shape (num_samples,)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional parameters
- Returns:
The tensorflow Dataset instance
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Main data reading function which reads the CSV file into a dataset and also converts the emotion labels to the three emotion space.
- Parameters:
which_set – Which dataset to use - train, val or test
batch_size – The batch size for the resulting dataset
parameters – Additional arguments
- Returns:
The tensorflow Dataset instance
src.data.watch_exp_reader module¶
This data reader reads the Happimeter data from the experiments.
- class src.data.watch_exp_reader.WatchExperimentDataReader(folder: str = 'data/watch', default_label_mode: str = 'expected')¶
Bases:
ExperimentDataReader
This data reader reads the watch csv files from the experiments
- get_cross_validation_indices(which_set: Set, parameters: Dict) List[int] ¶
Generate a list of indices according to CrossValidation.
- Parameters:
which_set – Which set to use.
parameters – Additional parameters including: - cv_portions: Number of cv splits to do. - cv_index: Which split to use.
- Returns:
List of indexes in a cv form.
- get_data_generator(which_set: Set, parameters: Dict) Generator[Tuple[ndarray, ndarray], None, None] ¶
Generator that generates the data
- Parameters:
which_set – Train, val or test set
parameters – Additional parameters including: - window: The length of the window to use in seconds
- Returns:
Generator that yields data and label.
- static get_input_shape(parameters: Dict) tuple ¶
Returns the shape of a preprocessed sample.
- Parameters:
parameters – Parameter dictionary
- Returns:
Tuple that is the shape of the sample.
- get_labels(which_set: Set = Set.TRAIN, parameters: Optional[Dict] = None) ndarray ¶
This function returns labels for the dataset
- Parameters:
which_set – Which set to get the labels for.
parameters – Additional parameters.
- Returns:
Label numpy array
- get_raw_data(parameters: Dict) None ¶
Load the raw watch data from the csv files and split it into windows according to the parameters.
- Parameters:
parameters – Additional parameters
- get_raw_expected_labels() ndarray ¶
Load the raw emotions from the expected emotions during the video. The expected emotion means that while the participant is watching a happy video, we expect them to be happy, thus the label is happy.
- Returns:
Labels that are expected from the user.
- get_raw_faceapi_labels() ndarray ¶
Load the raw labels from the faceapi output files.
- Returns:
Labels that are collected from the user’s face expression.
- get_raw_labels(label_mode: str) ndarray ¶
Get the raw labels per experiment and time. Populates the raw_labels member of this class. The two axis are [experiment_index, time_in_seconds]
- Parameters:
label_mode – Whether to use expected or faceapi labels
- Returns:
Array of all labels in shape (file, second)
- get_seven_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Method that returns a dataset of watch data.
- Parameters:
which_set – Which set to use.
batch_size – Batch size for the dataset.
parameters – Additional parameters.
- Returns:
Dataset instance.
- get_three_emotion_data(which_set: Set, batch_size: int = 64, parameters: Optional[Dict] = None) DatasetV2 ¶
Create a dataset that uses only three emotions.
- Parameters:
which_set – Which set: Train, val or test
batch_size – Batch size
parameters – Additional parameters
- Returns:
Dataset with three emotion labels.
- static prepare_faceapi_labels() None ¶
This function prepares the faceapi labels if they are not computed yet.
Module contents¶
Package responsible for data reading and processing