Relates to the supervised learning lecture.
Q1) Name one advantage and one disadvantage of using the KNN classifier:
- Advantage: K-Nearest Neighbours (KNN) is easy to interpret, making it a suitable choice when transparency and understanding the model’s decisions are important.
- Disadvantage: KNN does not explicitly model the data, which means it can be sensitive to noise and doesn’t perform well when dealing with high-dimensional datasets or imbalanced data.
Q2) Given two classifiers learned from a training set and tested on unseen data to give the following two confusion matrices:
Classifier 1:
T | F | |
---|---|---|
T | 12 | 8 |
F | 15 | 5 |
Classifier 2:
T | F | |
---|---|---|
T | 17 | 3 |
F | 4 | 16 |
a) Which classifier is the most accurate?
To determine accuracy, we calculate the ratio of correctly classified instances to the total instances:
Classifier 1 Accuracy: Accuracy = (12 + 5) / (12 + 8 + 15 + 5) = 17 / 40 ≈ 0.425
Classifier 2 Accuracy: Accuracy = (17 + 16) / (17 + 3 + 4 + 16) = 33 / 40 ≈ 0.825
Classifier 2 is more accurate as it has a higher accuracy score.
b) Calculate the Sensitivity and Specificity of the two classifiers:
For Sensitivity (True Positive Rate) and Specificity (True Negative Rate), we use the following formulas:
Sensitivity*
Specificity
Classifier 1:
- Sensitivity = 12 / (12 + 8) = 12 / 20 = 0.6
- Specificity = 5 / (5 + 15) = 5 / 20 = 0.25
Classifier 2:
-
Sensitivity = 17 / (17 + 3) = 17 / 20 = 0.85
-
Specificity = 16 / (16 + 4) = 16 / 20 = 0.8
-
For Classifier 1, Sensitivity is 0.6, and Specificity is 0.25.
-
For Classifier 2, Sensitivity is 0.85, and Specificity is 0.8.
Q3) Briefly compare two classifiers stating one advantage that each has over the other:
-
Classifier 1 Advantage: Classifier 1 has higher Specificity (True Negative Rate), making it better at correctly identifying the “F” class, which is valuable when avoiding false alarms or Type I errors is crucial.
- Where to use it: medical diagnosis (identifying rare diseases), airport security (identifying threats) and quality control in manufacturing.
-
Classifier 2 Advantage: Classifier 2 is more accurate and has a higher Sensitivity (True Positive Rate). This means it is better at correctly identifying the “T” class, which is essential when ensuring that actual positive cases are detected.
- Where to use it: email spam detection (avoiding false positives), credit fraud detection and search engine ranking (filter out non-relevant or low-quality results.)