# 机器学习代考_Machine Learning代考_CS7641

## 机器学习代考_Machine Learning代考_Evaluating the output of clustering methods

Clustering is an unsupervised learning technique, so it is hard to evaluate the quality of the output of any given method [Kle02; LWG12]. If we use probabilistic models, we can always evaluate the likelihood of the data, but this has two drawbacks: first, it does not directly assess any clustering that is discovered by the model; and second, it does not apply to non-probabilistic methods. So now we discuss some performance measures not based on likelihood.

Intuitively, the goal of clustering is to assign points that are similar to the same cluster, and to ensure that points that are dissimilar are in different clusters. There are several ways of measuring these quantities e.g., see [JD88; KR90]. However, these internal criteria may be of limited use. An alternative is to rely on some external form of data with which to validate the method. For example, if we have labels for each object, then we can assume that objects with the same label are similar. We can then use the metrics we discuss below to quantify the quality of the clusters. (If we do not have labels, but we have a reference clustering, we can derive labels from that clustering.)

Let $N_{i j}$ be the number of objects in cluster $i$ that belong to class $j$, and let $N_i=\sum_{j=1}^C N_{i j}$ be the total number of objects in cluster $i$. Define $p_{i j}=N_{i j} / N_i$; this is the empirical distribution over class labels for cluster $i$. We define the purity of a cluster as $p_i \triangleq \max j p{i j}$, and the overall purity of a clustering as
$$\text { purity } \triangleq \sum_i \frac{N_i}{N} p_i$$
For example, in Figure 21.1, we have that the purity is
$$\frac{6}{17} \frac{5}{6}+\frac{6}{17} \frac{4}{6}+\frac{5}{17} \frac{3}{5}=\frac{5+4+3}{17}=0.71$$
The purity ranges between 0 (bad) and 1 (good). However, we can trivially achieve a purity of 1 by putting each object into its own cluster, so this measure does not penalize for the number of clusters.

## 机器学习代考_Machine Learning代考_Mutual information

Another way to measure cluster quality is to compute the mutual information between two candidate partitions $U$ and $V$, as proposed in [VD99]. To do this, let $p_{U V}(i, j)=\frac{\left|u_i \cap v_j\right|}{N}$ be the probability that a randomly chosen object belongs to cluster $u_i$ in $U$ and $v_j$ in $V$. Also, let $p_U(i)=\left|u_i\right| / N$ be the be the probability that a randomly chosen object belongs to cluster $u_i$ in $U ;$ define $p_V(j)=\left|v_j\right| / N$ similarly. Then we have
$$\mathbb{I}(U, V)=\sum_{i=1}^R \sum_{j=1}^C p_{U V}(i, j) \log \frac{p_{U V}(i, j)}{p_U(i) p_V(j)}$$
This lies between 0 and $\min {\mathbb{H}(U), \mathbb{H}(V)}$. Unfortunately, the maximum value can be achieved by using lots of small clusters, which have low entropy. To compensate for this, we can use the normalized mutual information,
$$N M I(U, V) \triangleq \frac{\mathbb{I}(U, V)}{(\mathbb{H}(U)+\mathbb{H}(V)) / 2}$$
This lies between 0 and 1. A version of this that is adjusted for chance (under a particular random data model) is described in [VEB09]. Another variant, called variation of information, is described in [Mei05].

# 机器学习代考

## 机器学习代考_Machine Learning代考_Evaluating the output of clustering methods

$$\text { purity } \triangleq \sum_i \frac{N_i}{N} p_i$$

$$\frac{6}{17} \frac{5}{6}+\frac{6}{17} \frac{4}{6}+\frac{5}{17} \frac{3}{5}=\frac{5+4+3}{17}=0.71$$

## 机器学习代考_Machine Learning代考_Mutual information

$$\mathbb{I}(U, V)=\sum_{i=1}^R \sum_{j=1}^C p_{U V}(i, j) \log \frac{p_{U V}(i, j)}{p_U(i) p_V(j)}$$

$$N M I(U, V) \triangleq \frac{\mathbb{I}(U, V)}{(\mathbb{H}(U)+\mathbb{H}(V)) / 2}$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: