Principal Component Analysis (PCA)
This class handles Principal Component Analysis (PCA), a mathematical method used to simplify large datasets. PCA reduces the number of variables (columns) in the data while keeping as much important information as possible. It’s especially helpful when working with complex data, like features extracted from accelerometer readings.
Step-by-Step Explanation:
- What is PCA?
- PCA transforms the data into a smaller set of "principal components." These are new variables that summarize the original data.
- For example, if you have 100 columns, PCA can reduce it to just a few components that still capture most of the data's variation.
- Input Data:
- The input is a table (DataFrame) with many columns (features).
- When PCA is Applied:
- The class applies PCA only if
apply_pca
is set toTrue
. If not, the data remains unchanged. - Variance: You can specify how much of the original data’s information (variance) you want to keep. For example:
variance=0.95
means PCA will retain 95% of the data's important information.
- The class applies PCA only if
- How It Works:
- During fitting (
fit
method):- The class calculates the PCA transformation based on the input data.
- This identifies which components (combinations of columns) best summarize the data.
- During transformation (
transform
method):- The data is transformed into its principal components, reducing the number of columns.
- During fitting (
- Output:
- If PCA is applied, the output is a simplified table (DataFrame) with fewer columns (principal components).
- If PCA is not applied, the data remains the same.
- Key Features:
- Automatic Adjustment: PCA automatically determines how many components to use based on the
variance
value. - Flexibility: You can turn PCA on or off using
apply_pca
.
- Automatic Adjustment: PCA automatically determines how many components to use based on the
- Error Handling:
- If PCA is enabled but hasn't been "fitted" to the data, the class ensures that an error doesn’t occur by skipping PCA.