Implementation

Principal component analysis¶

Principal component analysis (PCA) is a technique used for dimensionality reduction in data analysis and machine learning. The technique is based on the idea of finding a new set of linearly uncorrelated variables, called principal components, that can explain the most variance in the original data.

Here's an example of how to perform PCA in Python using the scikit-learn library:

from sklearn.decomposition import PCA
import numpy as np

# Generate sample data
data = np.array([[1, 2], [1, 4], [1, 0],
                [4, 2], [4, 4], [4, 0]])

# Define the number of principal components
num_components = 2

# Create the PCA model
pca = PCA(n_components=num_components)

# Fit the PCA model to the data
pca.fit(data)

# Transform the data
transformed_data = pca.transform(data)

# Get the principal components
principal_components = pca.components_

# Get the explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

print(transformed_data)
# Output: [[-1.56669890e+00 -5.04134475e-01]
#          [-5.04134475e-01  1.566698

In this example, we first generate some sample data in the form of a 2D array. Then we define the number of principal components we want to use, in this case 2. Next, we create a PCA model using the PCA() function from the scikit-learn library with the desired number of components. Then we fit the PCA model to the data using the fit() method.

We use the transform() method on the data to get the transformed data, which is represented in the new coordinate system formed by the principal components. We can also get the principal components themselves by accessing the components_ attribute of the fitted PCA model. Also, the explained variance ratio of each principal component can be obtained by accessing the explained_variance_ratio_ attribute of the fitted PCA model.

It's important to mention that, PCA is a technique that can be used to reduce the number of features in a dataset while preserving as much of the original information as possible. This can be useful when working with high-dimensional datasets where the number of features is very large.

Singular value decomposition¶

Singular Value Decomposition (SVD) is a technique used for matrix factorization and dimensionality reduction similar to PCA. It is used for a wide range of applications such as image and video compression, linear algebra, and more.

SVD decomposes a matrix into three matrices: U, S and V. where U and V are orthogonal matrices and S is a diagonal matrix. The diagonal elements of S, called singular values, are nonnegative and represent the strength of each principal component in descending order.

Here's an example of how to perform SVD in Python using the numpy library:

import numpy as np

# Generate sample data
data = np.array([[1, 2], [1, 4], [1, 0],
                [4, 2], [4, 4], [4, 0]])

# Perform SVD
U, S, V = np.linalg.svd(data, full_matrices=True)

# Print the results
print('U:', U)
print('S:', S)
print('V:', V)

In this example, we first generate some sample data in the form of a 2D array. Then we perform SVD on the data using the svd() function from the numpy library. The function returns three matrices: U, S, and V. U and V are orthogonal matrices and S is a diagonal matrix with the singular values on the diagonal.

It's important to mention that, SVD is a technique that can be used to reduce the number of features in a dataset while preserving as much of the original information as possible. Also, SVD can be used for data compression, denoising, and other applications.