Lab 1 - Unsupervised Learning

Relates to the unsupervised learning lecture.

Q1) Given the following three attributes:

Choose any 2 patients and calculate:

I chose 3 and 4.

a) What is the Euclidean distance between each of them?

ed = $(32 - 18)^{2} + (110 - 85)^{2} + (23 - 27)^{2} =$

= $196 + 625 + 16$

= $837 = 29$

b) What is the Manhattan distance between each of them?

$∣32 - 18∣ + ∣110 - 85∣ + ∣23 - 27∣ =$

= $14 + 25 + 4 = 43$

Q2) Briefly describe three different forms of hierarchical clustering methods.

Q3) Describe two clustering methods and their advantages/disadvantages.

Hierarchical: it is a way of grouping things together based on how similar or different they are from each other. For example, if you have different animals, you start by putting each in its own group. Then, we look at them and put the most similar together in a new group. You keep doing this until you have a few big groups that represent different categories of animals. We merge close clusters together and the result is a dendrogram.

Pros:

Cons:

cant reallocate object that has been ‘incorrectly’ grouped at an early stage.
different distance metrics for measuring distances might generate different results.

K-Means: we first decide how many groups we want, for example 3. we then randomly pick 3 “items” to be the centre of each group. now, we look at each item and see which centre it is closest to, we put then with the closest centre. after we have them all together in groups, we find a new centre by taking the average of all the items in the group. we keep repeating this until the centres don’t change too much.

Pros:

Cons:

✨ Morioh