Unsupervised

What is Unsupervised Learning?¶

This type of machine learning algorithms is used to analyse and cluster unlabelled datasets. These algorithms discover hidden patterns or data groupings without the need for human intervention. Its ability to discover similarities and differences in information make it the ideal solution for exploratory data analysis, cross-selling strategies, customer segmentation, and image recognition.

How does it work?¶

Unsupervised learning is a type of machine learning where the model is not provided with labelled data. Instead, it is given a dataset and left to discover patterns and structure on its own.

The general process of unsupervised learning is:
1. Collect and preprocess the data
2. Define the model and its parameters
3. Train the model on the data
4. Evaluate the model's performance
5. Use the model to make predictions or extract insights from the data

It's important to mention that unsupervised learning is often used as a preprocessing step before supervised learning, where the goal is to extract useful features from the data that can be used to train a supervised model with better performance.

Types¶

There are several types of unsupervised learning algorithms such as:

Clustering: Clustering algorithms group similar data points together into clusters. The most common clustering algorithms are k-means, hierarchical clustering and density-based clustering.
Dimensionality reduction: Dimensionality reduction algorithms reduce the number of features in a dataset while preserving as much of the original information as possible. The most common dimensionality reduction algorithms are PCA and SVD.
Association rule learning: Association rule learning algorithms identify relationships between variables in a dataset. The most common association rule learning algorithm is the Apriori algorithm.
Autoencoder: Autoencoder is a type of neural network architecture that is used for unsupervised learning, dimensionality reduction and feature learning. Autoencoder is composed of two main parts: an encoder and a decoder. The encoder is used to compress the input data into a lower-dimensional representation called the bottleneck or latent representation, and the decoder is used to reconstruct the input data from the bottleneck representation.