InfinityCodeX

Check out our blogs where we cover topics such as Python, Data Science, Machine Learning, Deep Learning. A Best place to start your AI career for beginner, intermediate peoples.

Unsupervised Learning in Machine Learning.

As we all know there are basically 3 types of Machine Learning.

(ii) Unsupervised Learning

(iii) Reinforcement Learning

In our previous articles we had covered Supervised Learning & algorithms which come under Supervised Learning.

So our Agenda for this post is :

(i) What is Unsupervised Learning?

-Unsupervised Learning is a type of Machine Learning that looks for previously undetected patterns in the dataset with no pre-existing labels & minimum of human supervision.

The unsupervised learning algorithms are not provided any “Answers” to learn from; it must make sense of the data just by observations.

(ii) Example of Unsupervised Learning

Let’s say there is a little kid who just joins the playschool & now his teacher gave him a task that all the balls which were on the floor should be put under bucket which is of red color & all the cars which were on the floor should be put under bucket which is of blue color.

Now the kid put all the balls in the bucket which is of color red & put all the toys such as trains, trucks, motorcycles, etc… under bucket which is of the color blue. Now that little kid recognizes the features which contain wheels, mirrors, plastic body, or even resembles like quadrilateral in shape comes under the category of cars & the thing which can bounce & have a circular shape comes under the ball category.

(iii) Why use Unsupervised Learning?

* Unsupervised Learning finds all kinds of unknown patterns in data.

* They work on unlabeled data which makes our work easier. It is easier to get unlabeled data from the computer than labeled data, which needs manual intervention.

* Unsupervised Learning methods help us to find features that can be useful for categorization or find an association.

* It is taken place in real-time, so all the input data to be analyzed & labeled in the presence of learners.

* They can detect any outliers & defects in data.

(iv) Types of Unsupervised Learning

Unsupervised Learning is also broken down & more precisely it is broken down into 2 parts.

(II) Parametric Unsupervised Learning (Association)

Let’s understand both of them one by one.

(I) Non-Parametric Unsupervised Learning (Clustering)

In Non-Parametric Unsupervised Learning the data is grouped into clusters. Clustering is a type of unsupervised learning where you find patterns & data that you are working on it may be the shape, size, color etc… which can be used to group data items or create clusters. This method is commonly used on the dataset of small size to model & analyze the data. Non-Parametric models do not require the modeler to make any assumptions about the distribution of the population, & so are sometimes referred to be a distributive free method.

Basically, there are 2 types of clustering :

(a) Hard Clustering: In hard Clustering, each data point either belongs to a cluster completely or not.

(b) Soft Clustering: In soft clustering, the object may belong to several clusters with a fractional degree of membership in each.

Clustering Algorithms are based on the 4 types of different models
Some of the most popular known algorithms models are :

(1) Connectivity Models: These models are based on the notion that the data points closer in data space exhibit more similarity to each other than the data points lying farther away. Connectivity models use both Bottom-up approach & Top-down approach.

Top-Down Approach

-------------------------------------------------------------------------------------------------

Bottom-Up Approach

(2) Centroid models: These are iterative clustering algorithms in which the notion of similarity is derived by the closeness of a data point to the centroid if of the clusters. K-Means clustering algorithm is a popular algorithm that falls into this category.

(3) Distribution Models: These clustering models are based on the notion of how probable is it that all data points in the cluster belong to the same distribution (For Example: Normal, Gaussian)

(4) Density models: These models search the data space from areas of the varied density of data points in the data space. It isolates various different density regions and assigns the data points within these regions in the same cluster.

Q.) What is Hierarchical Clustering?

-Hierarchical Clustering, as the name, suggests it is an algorithm that builds a hierarchy of clusters. This algorithm starts with all the data points assigned to a cluster of their own. Then 2 nearest clusters are merged into the same cluster. In the end, this algorithm terminates when there is only a single cluster left.

Q.) How does clustering work?

So let’s say we have a basket full of different types of fruits.

Now we just lay up all the fruits on the table and we make an algorithm work so we are just checking for the color, the size the shape & all of that we come to the conclusion that there are 2 fruits i.e apple and oranges. Now we will put them group the same fruits i.e all the apples in one group & all the oranges in one group.

(II) Parametric Unsupervised Learning (Association)

In Parametric Unsupervised Learning we assume a parametric distribution of data. It assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters. Association is the kind of Unsupervised Learning where you can find the dependencies of one data item to another data item & map them such that they can help you profit better so we have algorithms such as the Apriori Algorithm, FP-Growth Algorithm which are usually used. This case is much difficult than the standard supervised learning because there are no answer labels available & hence there is no correct measure of accuracy available to check the result.

Q.) Example of Association.

So this is how basically association works under unsupervised learning.

(vi) Application on Unsupervised Learning

* On the bases of the similarities clustering automatically splits the datasets into the number of groups which it founds suitable.

* Association mining identifies a set of similar or not similar items which often occur together in our dataset.

* Helps to detect the defects in the dataset which were not detected initially.

* Mapping the various data items to all the other dependencies.

* Cleansing the dataset by removing the features which are not really required for the machine to learn from.

They are used in renown applications such as :

(a) Marketing

Amazon learns from the customer's purchase & then recommends them the products which are frequently bought together.

(b) Banks

Banks use it for credit card fraud detection. The various pattern of the users & their usage of credit cards is studied by the algorithm. If the card is used in ways that do not match the behavior, an alarm is generated possibly meaning fraud.

(c) Biology

Classification of different types of plants & animals based on the features.

* Unsupervised Learning is difficult than Supervised Learning.

* Less accurate results because the input data is not known and not labeled by people in advance.

* The user needs to spend time interpreting & label the classes which follow that classification.

* The spectral classes do not always correspond to informational classes.