Feature Extraction

General Explanation

This modular pipeline processes movement data from accelerometers to extract features that can be used for emotion recognition and other analyses. It consists of predefined pipeline components, each represented by a class. These components include data import, filtering, scaling, and feature extraction. The modular structure allows users to set "checkpoints" at various stages, making it easy to reuse intermediate outputs (e.g., scaled data or extracted features) for different configurations or experiments.

Click to view diagram

Overview of pipeline

Use Case

Extracting meaningful features from raw accelerometer data for model training.

Preparing data for supervised learning.

Experimenting with different preprocessing steps (e.g., scaling methods, time windows or cutoff frequencies).

Input and Configuration Data

Input/Configuration	Description
Combined Dataframe	Pre-combined dataset of accelerometer and self-reports data.
Raw Accelerometer Data	CSV file with columns for x, y, z axes, timestamps (timeOfNotification), and participantId.
`accel_path`	Path to the raw accelerometer data file.
`reports_path`	Path to the self-reports data file (optional).
`combined_data_path`	Path to the pre-combined dataset (optional).
`features_data_path`	Path to the pre-extracted features file (optional).
`cutoff_frequency`	Cutoff frequency for the low-pass filter (default: 5 Hz).
`data_frequency`	Sampling rate of the accelerometer data (default: 25 Hz).
`order`	Filter order (default: 4).
`scaler_type`	Type of scaling to apply to the accelerometer data:
	- 'standard': StandardScaler (zero mean, unit variance).
	- 'minmax': MinMaxScaler (scales to a [0,1] range).
	- 'none': No scaling applied.
`window_length`	Length of the sliding window (in seconds, e.g., 60 seconds).
`window_step_size`	Step size for the sliding window (in seconds, e.g., 30 seconds).
`selected_domains`	Domains to extract features from:
	- Options: 'time_domain', 'spatial', 'frequency', 'statistical', 'wavelet'.
	- Default: None (extract all domains).
`include_magnitude`	Whether to include magnitude-based features.
`label_columns`	Emotion label columns to include in the output (e.g., arousal, valence).

Outputs

It Outputs a Dataset with these Columns:'time_domain', 'spatial', 'frequency', 'statistical', 'wavelet'

Time Domain Features

Click to view Time Domain Features

Mean (x, y, z, magnitude)
Standard Deviation (x, y, z, magnitude)
Variance (x, y, z, magnitude)
Root Mean Square (RMS) (x, y, z, magnitude)
Maximum (x, y, z, magnitude)
Minimum (x, y, z, magnitude)
Peak-to-Peak Amplitude (x, y, z, magnitude)
Skewness (x, y, z, magnitude)
Kurtosis (x, y, z, magnitude)
Zero-Crossing Rate (x, y, z, magnitude)
Signal Magnitude Area (SMA)

Frequency Domain Features

Click to view Frequency Domain Features

Dominant Frequency (x, y, z, magnitude)
Spectral Entropy (x, y, z, magnitude)
Power Spectral Density (PSD) Mean (x, y, z, magnitude)
Energy (x, y, z, magnitude)
Bandwidth (x, y, z, magnitude)
Spectral Centroid (x, y, z, magnitude)

Statistical Features

Click to view Statistical Features

25th Percentile (x, y, z, magnitude)
75th Percentile (x, y, z, magnitude)

Wavelet Domain Features

Click to view Wavelet Domain Features

Wavelet Energy (Approximation) (x, y, z, magnitude)

Spatial Features

Click to view Spatial Features

Euclidean Norm (Magnitude)
Mean Tilt Angle (Pitch, Roll)
Correlation between Axes (x-y, x-z, y-z)