Intro to Data Mining Chap 8

{"created":"2024-03-11T15:13:37.949Z","updated":"2024-03-11T15:13:37.949Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":46,"end":134},{"type":"TextQuoteSelector","exact":"Cluster analysis divides data into groups (clusters) that are meaningful, useful,or both","prefix":"sis:Basic Concepts andAlgorithms","suffix":". If meaningful groups are the g"}]}]}

%% Cluster analysis divides data into groups (clusters) that are meaningful, useful,or both . If meaningful groups are the g show annotation

^tuz4t9tflk

{"created":"2024-03-11T15:13:49.850Z","updated":"2024-03-11T15:13:49.850Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":800,"end":915},{"type":"TextQuoteSelector","exact":"Clustering for Understanding Classes, or conceptually meaningful groupsof objects that share common characteristics","prefix":"ing is understanding or utility.","suffix":", play an important role in howp"}]}]}

%% Clustering for Understanding Classes, or conceptually meaningful groupsof objects that share common characteristics , play an important role in howp show annotation

^e0r3615bxgl

{"created":"2024-03-11T15:13:54.825Z","updated":"2024-03-11T15:13:54.825Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":1516,"end":1523},{"type":"TextQuoteSelector","exact":"Biology","prefix":" Basic Concepts and Algorithms• ","suffix":". Biologists have spent many yea"}]}]}

%% Biology . Biologists have spent many yea show annotation

^qnjc6vdzfw

{"created":"2024-03-11T15:14:06.443Z","updated":"2024-03-11T15:14:06.443Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":2847,"end":2883},{"type":"TextQuoteSelector","exact":"patterns in the atmospheric pressure","prefix":"nalysis has beenapplied to find ","suffix":" of polar regions andareas of th"}]}]}

%% patterns in the atmospheric pressure of polar regions andareas of th show annotation

^d4ihzmk64ro

{"created":"2024-03-11T15:14:11.466Z","updated":"2024-03-11T15:14:11.466Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":3147,"end":3212},{"type":"TextQuoteSelector","exact":"clustering has been used to identifydifferent types of depression","prefix":"ent subcategories. For example, ","suffix":". Cluster analysis can also be u"}]}]}

%% clustering has been used to identifydifferent types of depression . Cluster analysis can also be u show annotation

^171j3j0iv57

{"created":"2024-03-11T15:14:21.538Z","updated":"2024-03-11T15:14:21.538Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":3535,"end":3557},{"type":"TextQuoteSelector","exact":"Clustering for Utility","prefix":"nalysis and marketingactivities.","suffix":" Cluster analysis provides an ab"}]}]}

%% Clustering for Utility Cluster analysis provides an ab show annotation

^70zma2zeim4

{"created":"2024-03-11T15:15:31.596Z","updated":"2024-03-11T15:15:31.596Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":3558,"end":3679},{"type":"TextQuoteSelector","exact":"Cluster analysis provides an abstraction from in-dividual data objects to the clusters in which those data objects reside","prefix":"tivities.Clustering for Utility ","suffix":". Ad-ditionally, some clustering"}]}]}

%% Cluster analysis provides an abstraction from in-dividual data objects to the clusters in which those data objects reside . Ad-ditionally, some clustering show annotation

^jzczniur7ho

{"created":"2024-03-11T15:15:57.577Z","updated":"2024-03-11T15:15:57.577Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":4940,"end":5080},{"type":"TextQuoteSelector","exact":"Each object is represented by the indexof the prototype associated with its cluster. This type of compression isknown as vector quantization","prefix":"sposition (index) in the table. ","suffix":" and is often applied to image, "}]}]}

%% Each object is represented by the indexof the prototype associated with its cluster. This type of compression isknown as vector quantization and is often applied to image, show annotation

^cbouunx7tve

{"created":"2024-03-11T15:16:07.017Z","text":"cluster prototype is basically a data point that represents the whole cluster","updated":"2024-03-11T15:16:07.017Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":3975,"end":4112},{"type":"TextQuoteSelector","exact":"Therefore, in the con-text of utility, cluster analysis is the study of techniques for finding the mostrepresentative cluster prototypes.","prefix":" or data processing techniques. ","suffix":"• Summarization. Many data analy"}]}]}

%% Therefore, in the con-text of utility, cluster analysis is the study of techniques for finding the mostrepresentative cluster prototypes. • Summarization. Many data analy show annotation cluster prototype is basically a data point that represents the whole cluster

^vwa8vfoigkp

{"created":"2024-03-11T15:18:10.700Z","updated":"2024-03-11T15:18:10.700Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":7470,"end":7748},{"type":"TextQuoteSelector","exact":"Cluster analysis groups data objects based only on information found in thedata that describes the objects and their relationships. The goal is that theobjects within a group be similar (or related) to one another and different from(or unrelated to) the objects in other groups.","prefix":".8.1.1 What Is Cluster Analysis?","suffix":" The greater the similarity (orh"}]}]}

%% Cluster analysis groups data objects based only on information found in thedata that describes the objects and their relationships. The goal is that theobjects within a group be similar (or related) to one another and different from(or unrelated to) the objects in other groups. The greater the similarity (orh show annotation

^yyrr5af0am8

{"created":"2024-03-11T15:19:03.108Z","updated":"2024-03-11T15:19:03.108Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":9324,"end":9395},{"type":"TextQuoteSelector","exact":"cluster analysis is sometimes referredto as unsupervised classification","prefix":" class labels. For this reason, ","suffix":". When the term classification i"}]}]}

%% cluster analysis is sometimes referredto as unsupervised classification . When the term classification i show annotation

^da2v5ezi5ou

{"created":"2024-03-11T15:19:10.304Z","updated":"2024-03-11T15:19:10.304Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":8779,"end":8953},{"type":"TextQuoteSelector","exact":"clustering can be regarded as a form ofclassification in that it creates a labeling of objects with class (cluster) labels.However, it derives these labels only from the data","prefix":"ects into groups. For instance, ","suffix":". In contrast, classification490"}]}]}

%% clustering can be regarded as a form ofclassification in that it creates a labeling of objects with class (cluster) labels.However, it derives these labels only from the data . In contrast, classification490 show annotation

^y0pq2ga2np8

{"created":"2024-03-11T15:20:21.504Z","updated":"2024-03-11T15:20:21.504Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":10809,"end":10971},{"type":"TextQuoteSelector","exact":"Apartitional clustering is simply a division of the set of data objects intonon-overlapping subsets (clusters) such that each data object is in exactly onesubset.","prefix":"y, hierarchical or partitional. ","suffix":" Taken individually, each collec"}]}]}

%% Apartitional clustering is simply a division of the set of data objects intonon-overlapping subsets (clusters) such that each data object is in exactly onesubset. Taken individually, each collec show annotation

^hz23sbba01m

{"created":"2024-03-11T15:20:54.397Z","updated":"2024-03-11T15:20:54.397Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":11068,"end":11384},{"type":"TextQuoteSelector","exact":"If we permit clusters to have subclusters, then we obtain a hierarchicalclustering, which is a set of nested clusters that are organized as a tree. Eachnode (cluster) in the tree (except for the leaf nodes) is the union of its children(subclusters), and the root of the tree is the cluster containing all the objects","prefix":"b–d) isa partitional clustering.","suffix":".Often, but not always, the leav"}]}]}

%% If we permit clusters to have subclusters, then we obtain a hierarchicalclustering, which is a set of nested clusters that are organized as a tree. Eachnode (cluster) in the tree (except for the leaf nodes) is the union of its children(subclusters), and the root of the tree is the cluster containing all the objects .Often, but not always, the leav show annotation

^dtcirsa6no8

{"created":"2024-03-11T15:21:30.869Z","updated":"2024-03-11T15:21:30.869Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":11844,"end":12076},{"type":"TextQuoteSelector","exact":"note that ahierarchical clustering can be viewed as a sequence of partitional clusteringsand a partitional clustering can be obtained by taking any member of thatsequence; i.e., by cutting the hierarchical tree at a particular level","prefix":"lusters on each level. Finally, ","suffix":".Exclusive versus Overlapping ve"}]}]}

%% note that ahierarchical clustering can be viewed as a sequence of partitional clusteringsand a partitional clustering can be obtained by taking any member of thatsequence; i.e., by cutting the hierarchical tree at a particular level .Exclusive versus Overlapping ve show annotation

^w2vbi1z2fr

{"created":"2024-03-11T15:21:52.914Z","updated":"2024-03-11T15:21:52.914Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":12162,"end":12218},{"type":"TextQuoteSelector","exact":"exclusive, as they assign each object to a single cluste","prefix":"ings shown inFigure 8.1 are all ","suffix":"r.There are many situations in w"}]}]}

%% exclusive, as they assign each object to a single cluste r.There are many situations in w show annotation

^owoiy45rzb

{"created":"2024-03-11T15:22:04.182Z","updated":"2024-03-11T15:22:04.182Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":12412,"end":12552},{"type":"TextQuoteSelector","exact":"an overlapping or non-exclusiveclustering is used to reflect the fact that an object can simultaneously belongto more than one group (class)","prefix":"ing. In the most general sense, ","suffix":". For instance, a person at a un"}]}]}

%% an overlapping or non-exclusiveclustering is used to reflect the fact that an object can simultaneously belongto more than one group (class) . For instance, a person at a un show annotation

^cccbwroj23

{"created":"2024-03-11T15:22:15.913Z","updated":"2024-03-11T15:22:15.913Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":13024,"end":13235},{"type":"TextQuoteSelector","exact":"In a fuzzy clustering, every object belongs to every cluster with a mem-bership weight that is between 0 (absolutely doesn’t belong) and 1 (absolutelybelongs). In other words, clusters are treated as fuzzy sets.","prefix":" of the “equally good” clusters.","suffix":" (Mathematically,a fuzzy set is "}]}]}

%% In a fuzzy clustering, every object belongs to every cluster with a mem-bership weight that is between 0 (absolutely doesn’t belong) and 1 (absolutelybelongs). In other words, clusters are treated as fuzzy sets. (Mathematically,a fuzzy set is show annotation

^nldjt3rt1ft

{"created":"2024-03-11T15:22:44.292Z","updated":"2024-03-11T15:22:44.292Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":14248,"end":14455},{"type":"TextQuoteSelector","exact":"A complete clustering assigns every object toa cluster, whereas a partial clustering does not. The motivation for a partialclustering is that some objects in a data set may not belong to well-definedgroups. ","prefix":"highest.Complete versus Partial ","suffix":"Many times objects in the data s"}]}]}

%% A complete clustering assigns every object toa cluster, whereas a partial clustering does not. The motivation for a partialclustering is that some objects in a data set may not belong to well-definedgroups. Many times objects in the data s show annotation

^yvccdk18vek

%%
{"created":"2024-03-11T15:23:37.189Z","updated":"2024-03-11T15:23:37.189Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":16539,"end":17023},{"type":"TextQuoteSelector","exact":"For data with continuous attributes, the prototype of acluster is often a centroid, i.e., the average (mean) of all the points in the clus-ter. When a centroid is not meaningful, such as when the data has categoricalattributes, the prototype is often a medoid, i.e., the most representative pointof a cluster. For many types of data, the prototype can be regarded as themost central point, and in such instances, we commonly refer to prototype-based clusters as center-based clusters.","prefix":" prototypeof any other cluster. ","suffix":" Not surprisingly, such clusters"}]}]}
%% For data with continuous attributes, the prototype of acluster is often a centroid, i.e., the average (mean) of all the points in the clus-ter. When a centroid is not meaningful, such as when the data has categoricalattributes, the prototype is often a medoid, i.e., the most representative pointof a cluster. For many types of data, the prototype can be regarded as themost central point, and in such instances, we commonly refer to prototype-based clusters as center-based clusters. Not surprisingly, such clusters show annotation

^jbg6q7vu02

{"created":"2024-03-11T15:24:59.568Z","updated":"2024-03-11T15:24:59.568Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":20502,"end":20819},{"type":"TextQuoteSelector","exact":"Agglomerative Hierarchical Clustering. This clustering approachrefers to a collection of closely related clustering techniques that producea hierarchical clustering by starting with each point as a singleton clusterand then repeatedly merging the two closest clusters until a single, all-encompassing cluster remains.","prefix":"epresented by their centroids.• ","suffix":" Some of these techniques have a"}]}]}

%% Agglomerative Hierarchical Clustering. This clustering approachrefers to a collection of closely related clustering techniques that producea hierarchical clustering by starting with each point as a singleton clusterand then repeatedly merging the two closest clusters until a single, all-encompassing cluster remains. Some of these techniques have a show annotation

^inf3ttoknom

{"created":"2024-03-11T15:25:09.445Z","updated":"2024-03-11T15:25:09.445Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":20986,"end":21273},{"type":"TextQuoteSelector","exact":"DBSCAN. This is a density-based clustering algorithm that producesa partitional clustering, in which the number of clusters is automaticallydetermined by the algorithm. Points in low-density regions are classi-fied as noise and omitted; thus, DBSCAN does not produce a completeclustering","prefix":"of a prototype-based approach.• ","suffix":".495Chapter 8 Cluster Analysis: "}]}]}

%% DBSCAN. This is a density-based clustering algorithm that producesa partitional clustering, in which the number of clusters is automaticallydetermined by the algorithm. Points in low-density regions are classi-fied as noise and omitted; thus, DBSCAN does not produce a completeclustering .495Chapter 8 Cluster Analysis: show annotation

^4tjxbpn9oot

%%
{"created":"2024-03-11T15:25:37.563Z","updated":"2024-03-11T15:25:37.563Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":23031,"end":23479},{"type":"TextQuoteSelector","exact":"We first choose K initial centroids, where K is a user-specified parameter, namely, the number of clusters desired. Each point isthen assigned to the closest centroid, and each collection of points assigned toa centroid is a cluster. The centroid of each cluster is then updated based onthe points assigned to the cluster. We repeat the assignment and update stepsuntil no point changes clusters, or equivalently, until the centroids remain thesame","prefix":"criptionof the basic algorithm. ","suffix":".K-means is formally described b"}]}]}
%% We first choose K initial centroids, where K is a user-specified parameter, namely, the number of clusters desired. Each point isthen assigned to the closest centroid, and each collection of points assigned toa centroid is a cluster. The centroid of each cluster is then updated based onthe points assigned to the cluster. We repeat the assignment and update stepsuntil no point changes clusters, or equivalently, until the centroids remain thesame .K-means is formally described b show annotation

^1nqx2j10jnc

%%
{"created":"2024-03-11T15:27:18.249Z","updated":"2024-03-11T15:27:18.249Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":28100,"end":28590},{"type":"TextQuoteSelector","exact":"sum of the squared error (SSE), which is also knownas scatter. In other words, we calculate the error of each data point, i.e., itsEuclidean distance to the closest centroid, and then compute the total sumof the squared errors. Given two different sets of clusters that are producedby two different runs of K-means, we prefer the one with the smallest squarederror since this means that the prototypes (centroids) of this clustering area better representation of the points in their cluster","prefix":"lity of aclustering, we use the ","suffix":". Using the notation inTable 8.1"}]}]}
%% sum of the squared error (SSE), which is also knownas scatter. In other words, we calculate the error of each data point, i.e., itsEuclidean distance to the closest centroid, and then compute the total sumof the squared errors. Given two different sets of clusters that are producedby two different runs of K-means, we prefer the one with the smallest squarederror since this means that the prototypes (centroids) of this clustering area better representation of the points in their cluster . Using the notation inTable 8.1 show annotation

^u6koocg3f9r

{"created":"2024-03-11T15:28:02.942Z","updated":"2024-03-11T15:28:02.942Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":30760,"end":31001},{"type":"TextQuoteSelector","exact":"Proximity Function Centroid Objective FunctionManhattan (L1) median Minimize sum of the L1 distance of an ob-ject to its cluster centroidSquared Euclidean (L22) mean Minimize sum of the squared L2 distanceof an object to its cluster centroid","prefix":"troids, and objective functions.","suffix":"cosine mean Maximize sum of the "}]}]}

%% Proximity Function Centroid Objective FunctionManhattan (L1) median Minimize sum of the L1 distance of an ob-ject to its cluster centroidSquared Euclidean (L22) mean Minimize sum of the squared L2 distanceof an object to its cluster centroid cosine mean Maximize sum of the show annotation

^b9hu8geh3l7

%%
{"created":"2024-03-11T15:28:57.807Z","updated":"2024-03-11T15:28:57.807Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":36017,"end":36484},{"type":"TextQuoteSelector","exact":"One effective approach is to take a sample of pointsand cluster them using a hierarchical clustering technique. K clusters are ex-tracted from the hierarchical clustering, and the centroids of those clusters areused as the initial centroids. This approach often works well, but is practicalonly if (1) the sample is relatively small, e.g., a few hundred to a few thousand(hierarchical clustering is expensive), and (2) K is relatively small comparedto the sample size","prefix":"n em-ployed for initialization. ","suffix":".The following procedure is anot"}]}]}
%% One effective approach is to take a sample of pointsand cluster them using a hierarchical clustering technique. K clusters are ex-tracted from the hierarchical clustering, and the centroids of those clusters areused as the initial centroids. This approach often works well, but is practicalonly if (1) the sample is relatively small, e.g., a few hundred to a few thousand(hierarchical clustering is expensive), and (2) K is relatively small comparedto the sample size .The following procedure is anot show annotation

^qsl80f9vt87

{"created":"2024-03-11T15:29:34.107Z","updated":"2024-03-11T15:29:34.107Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":38829,"end":38977},{"type":"TextQuoteSelector","exact":"K-means is linear in m, the number of points,and is efficient as well as simple provided that K, the number of clusters, issignificantly less than m","prefix":"irst few iterations. Therefore, ","suffix":".8.2.2 K-means: Additional Issue"}]}]}

%% K-means is linear in m, the number of points,and is efficient as well as simple provided that K, the number of clusters, issignificantly less than m .8.2.2 K-means: Additional Issue show annotation

^h2ao945f8rh

{"created":"2024-03-11T15:29:49.198Z","updated":"2024-03-11T15:29:49.198Z","document":{"title":"intro-to-data-mining-chap-8.pdf","link":[{"href":"urn:x-pdf:db935a11eda3ac8e7bc66b4bfcdbe8ed"},{"href":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf"}],"documentFingerprint":"db935a11eda3ac8e7bc66b4bfcdbe8ed"},"uri":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","target":[{"source":"vault:/assets/university/year3/books/intro-to-data-mining-chap-8.pdf","selector":[{"type":"TextPositionSelector","start":39846,"end":39903},{"type":"TextQuoteSelector","exact":"outliers can unduly influence theclusters that are found.","prefix":"quared error criterion is used, ","suffix":" In particular, when outliers ar"}]}]}

%% outliers can unduly influence theclusters that are found. In particular, when outliers ar show annotation

^ren6t5k2ctc

✨ Morioh

Intro to Data Mining Chap 8

Graph View

Backlinks