机器学习代考_Machine Learning代考_CS446

In practice, the preferred method is average link clustering, which measures the average distance between all pairs:
$$d_{\text {avg }}(G, H)=\frac{1}{n_G n_H} \sum_{i \in G} \sum_{i \in} \in d_{i, i^{\prime}}$$
where $n_G$ and $n_H$ are the number of elements in groups $G$ and $H$. See Figure 21.3(c).
Average link clustering represents a compromise between single and complete link clustering. It tends to produce relatively compact clusters that are relatively far apart. (See Figure 21.4(c).) result. In contrast, single linkage and complete linkage are invariant to monotonic transformations of $d_{i, i^{\prime}}$, since they leave the relative ordering the same.

Suppose we have a set of time series measurements of the expression levels for $N=300$ genes at $T=7$ points. Thus each data sample is a vector $\boldsymbol{x}_n \in \mathbb{R}^7$. See Figure $21.5$ for a visualization of the data. We see that there are several kinds of genes, such as those whose expression level goes up monotonically over time (in response to a given stimulus), those whose expression level goes down monotonically, and those with more complex response patterns.

Suppose we use Euclidean distance to compute a pairwise dissimilarity matrix, $\mathbf{D} \in \mathbb{R}^{300 \times 300}$, and apply HAC using average linkage. We get the dendogram in Figure 21.6(a). If we cut the tree at a certain height, we get the 16 clusters shown in Figure 21.6(b). The time series assigned to each cluster dô indeéd “look like” each otherr.

机器学习代考_Machine Learning代考_K means clustering

There are several problems with hierarchical agglomerative clustering (Section 21.2). First, it takes $O\left(N^3\right)$ time (for the average link method), making it hard to apply to big datasets. Second, it assumes that a dissimilarity matrix has already been computed, whereas the notion of “similarity” is often unclear and needs to be learned. Third, it is just an algorithm, not a model, and so it is hard to evaluate how good it is. That is, there is no clear objective that it is optimizing.

In this section, we discuss the K-means algorithm [Mac67; Llo82], which addresses these issues. First, it runs in $O(N K T)$ time, where $T$ is the number of iterations. Second, it computes similarity in terms of Euclidean distance to learned cluster centers $\boldsymbol{\mu}_k \in \mathbb{R}^D$, rather than requiring a dissimilarity matrix. Third, it optimizes a well-defined cost function, as we will see.

We assume there are $K$ cluster centers $\boldsymbol{\mu}k \in \mathbb{R}^D$, so we can cluster the data by assigning each data point $\boldsymbol{x}_n \in \mathbb{R}^D$ to it closest center: $$z_n^*=\arg \min _k\left|\boldsymbol{x}_n-\boldsymbol{\mu}_k\right|_2^2$$ Of course, we don’t know the cluster centers, but we can estimate them by computing the average value of all points assigned to them: $$\boldsymbol{\mu}_k=\frac{1}{N_k} \sum{n: z_n=k} \boldsymbol{x}n$$ We can then iterate these steps to convergence. More formally, we can view this as finding a local minimum of the following cost function, known as the distortion: $$J(\mathbf{M}, \mathbf{Z})=\sum{n=1}^N\left|\boldsymbol{x}n-\boldsymbol{\mu}{z_n}\right|^2=\left|\mathbf{X}-\mathbf{Z} \mathbf{M}^{\top}\right|_F^2$$
where $\mathbf{X} \in \mathbb{R}^{N \times D}, \mathbf{Z} \in[0,1]^{N \times K}$, and $\mathbf{M} \in \mathbb{R}^{D \times K}$ contains the cluster centers $\boldsymbol{\mu}_k$ in its columns. $\mathrm{K}$-means optimizes this using alternating minimization. (This is closely related to the EM algorithm for GMMs, as we discuss in Section 21.4.1.1.)

机器学习代考

$$d_{\text {avg }}(G, H)=\frac{1}{n_G n_H} \sum_{i \in G} \sum_{i \in} \in d_{i, i^{\prime}}$$

机器学习代考_Machine Learning代考_K means clustering

$$z_n^*=\arg \min _k\left|\boldsymbol{x}_n-\boldsymbol{\mu}_k\right|_2^2$$

$$\boldsymbol{\mu}_k=\frac{1}{N_k} \sum n: z_n=k \boldsymbol{x} n$$

$$J(\mathbf{M}, \mathbf{Z})=\sum n=\mathbf{1}^N\left|\boldsymbol{x} n-\boldsymbol{\mu} z_n\right|^2=\left|\mathbf{X}-\mathbf{Z} \mathbf{M}^{\top}\right|_F^2$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: