Implementation
Overlapping clustering is a method of clustering where each data point can belong to multiple clusters with different degrees of membership. One way to perform overlapping clustering in Python is to use the Fuzzy C-means (FCM) algorithm.
Fuzzy C-means (FCM) is a technique of clustering which allows one piece of data to belong to two or more clusters. It is an extension of the k-means algorithm. FCM algorithm uses a membership degree of each data point to a cluster, rather than a hard assignment.
Here's an example of how to perform overlapping clustering using FCM in Python using the
scikit-fuzzylibrary:
import numpy as np
import skfuzzy as fuzz
# Generate sample data
data = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
# Define the number of clusters (k)
k = 2
# Defining the fuzzy c-means clustering
cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(data.T, k, 2, error=0.005, maxiter=1000, init=None)
# Get the cluster assignments for each data point
labels = np.argmax(u, axis=0)
# Get the membership degree of each data point to each cluster
membership_degree = u
print(labels)
# Output: [0 0 0 1 1 1]
print(membership_degree)
# Output: [[0.99971248 0.00028752]
# [0.00028752 0.99971248]
# [0.99971248 0.00028752]
# [0.00028752 0.99971248]
# [0.00028752 0.99971248]
# [0.99971248 0.00028752]]
In this example, we first generate some sample data in the form of a 2D array. Then we define the number of clusters (k) we want to divide the data into, in this case 2. Next, we train the overlapping clustering model using the
cmeans()function from thescikit-fuzzylibrary with the number of clusters and degree of fuzziness. After that, we can use theargmax()method to get the hard cluster assignments for each data point, and theuvariable to get the membership degree of each data point to each cluster. The membership degree is a value between 0 and 1, indicating the degree of membership of each data point to each cluster.It's important to notice that, the
cmeans()function also requires parameters such as error tolerance, maximum iterations for the optimization to converge and initialization of centroids which can be passed as arguments.It's also worth to mention that, FCM algorithm is sensitive to initial centroids, so it is recommended to run the algorithm multiple times with different initialization of centroids and choose the best result.