Relates to the supervised learning lecture.

Q1) Name one advantage and one disadvantage of using the KNN classifier:

  • Advantage: K-Nearest Neighbours (KNN) is easy to interpret, making it a suitable choice when transparency and understanding the model’s decisions are important.
  • Disadvantage: KNN does not explicitly model the data, which means it can be sensitive to noise and doesn’t perform well when dealing with high-dimensional datasets or imbalanced data.

Q2) Given two classifiers learned from a training set and tested on unseen data to give the following two confusion matrices:

Classifier 1:

TF
T128
F155

Classifier 2:

TF
T173
F416

a) Which classifier is the most accurate?

To determine accuracy, we calculate the ratio of correctly classified instances to the total instances:

Classifier 1 Accuracy: Accuracy = (12 + 5) / (12 + 8 + 15 + 5) = 17 / 40 ≈ 0.425

Classifier 2 Accuracy: Accuracy = (17 + 16) / (17 + 3 + 4 + 16) = 33 / 40 ≈ 0.825

Classifier 2 is more accurate as it has a higher accuracy score.

b) Calculate the Sensitivity and Specificity of the two classifiers:

For Sensitivity (True Positive Rate) and Specificity (True Negative Rate), we use the following formulas:

Sensitivity*

Specificity

Classifier 1:

  • Sensitivity = 12 / (12 + 8) = 12 / 20 = 0.6
  • Specificity = 5 / (5 + 15) = 5 / 20 = 0.25

Classifier 2:

  • Sensitivity = 17 / (17 + 3) = 17 / 20 = 0.85

  • Specificity = 16 / (16 + 4) = 16 / 20 = 0.8

  • For Classifier 1, Sensitivity is 0.6, and Specificity is 0.25.

  • For Classifier 2, Sensitivity is 0.85, and Specificity is 0.8.


Q3) Briefly compare two classifiers stating one advantage that each has over the other:

  • Classifier 1 Advantage: Classifier 1 has higher Specificity (True Negative Rate), making it better at correctly identifying the “F” class, which is valuable when avoiding false alarms or Type I errors is crucial.

    • Where to use it: medical diagnosis (identifying rare diseases), airport security (identifying threats) and quality control in manufacturing.
  • Classifier 2 Advantage: Classifier 2 is more accurate and has a higher Sensitivity (True Positive Rate). This means it is better at correctly identifying the “T” class, which is essential when ensuring that actual positive cases are detected.

    • Where to use it: email spam detection (avoiding false positives), credit fraud detection and search engine ranking (filter out non-relevant or low-quality results.)