机器学习代考_Machine Learning代考_SimCLR

In this section, we discuss SimCLR, which stands for “Simple contrastive learning of visual representations” [Che $+20 \mathrm{~b}$; Che $+20 \mathrm{c}$. This has shown state of the art performance on transfer learning and semi-supervised learning. The basic idea is as follows. Each input $\boldsymbol{x} \in \mathbb{R}^D$ is converted to two augmented “views’ $\boldsymbol{x}_1=t_1(\boldsymbol{x}), \boldsymbol{x}_2=t_2(\boldsymbol{x})$, which are “semantically equivalent” versions of the input generated by some transformations $t_1, t_2$. For example, if $\boldsymbol{x}$ is an image, these could be small perturbations to the image, such as random crops, as discussed in Section 19.1. In addition, we sample “negative” examples $\boldsymbol{x}_1^{-}, \ldots, \boldsymbol{x}_n^{-} \in N(\boldsymbol{x})$ from the dataset which represent “semantically different” images (in practice, these are the other examples in the minibatch). Next we define some feature mapping $F: \mathbb{R}^D \rightarrow \mathbb{R}^E$, where $D$ is the size of the input, and $E$ is the size of the embedding.

We then try to maximize the similarity of the similar views, while minimizing the similarity of the different views, for each input $\boldsymbol{x}$ :
$$J=F\left(t_1(\boldsymbol{x})\right)^{\mathrm{T}} F\left(t_2(\boldsymbol{x})\right)-\log \sum_{\boldsymbol{x}_i^{-} \in N(\boldsymbol{x})} \exp \left[F\left(\boldsymbol{x}_i^{-}\right)^{\mathrm{T}} F\left(t_1(\boldsymbol{x})\right)\right]$$
In practice, we use cosine similarity, so we $\ell_2$-normalize the representations produced by $F$ before taking inner products, but this is omitted in the above equation. See Figure 19.5a for an illustration. (In this figure, we assume $F(\boldsymbol{x})=g(r(\boldsymbol{x}))$, where the intermediate representation $\boldsymbol{h}=r(\boldsymbol{x})$ is the one that will be later used for fine-tuning, and $g$ is an additional transformation applied during training.) Interestingly, we can interpret this as a form of conditional energy based model of the form
$$p\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)=\frac{\exp \left[-\mathcal{E}\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)\right]}{Z\left(\boldsymbol{x}_1\right)}$$
where $\mathcal{E}\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)=-F\left(\boldsymbol{x}_2\right)^{\mathrm{\top}} F\left(\boldsymbol{x}_1\right)$ is the energy, and
$$Z(\boldsymbol{x})=\int \exp \left[-\mathcal{E}\left(\boldsymbol{x}^{-} \mid \boldsymbol{x}\right)\right] d \boldsymbol{x}^{-}=\int \exp \left[F\left(\boldsymbol{x}^{-}\right)^{\mathrm{\top}} F(\boldsymbol{x})\right] d \boldsymbol{x}^{-}$$
is the normalization constant, known as the partition function. The conditional log likelihood under this model has the form
$$\log p\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)=F\left(\boldsymbol{x}_2\right)^{\mathrm{T}} F\left(\boldsymbol{x}_1\right)-\log \int \exp \left[F\left(\boldsymbol{x}^{-}\right)^{\mathrm{\top}} F\left(\boldsymbol{x}_1\right)\right] d \boldsymbol{x}^{-}$$

Consider a problem in which we have inputs from different domains, such as a source domain $\mathcal{X}_s$ and target domain $\mathcal{X}_t$, but a common set of output labels, $\mathcal{Y}$. (This is the “dual” of transfer learning, since the input domains are different, but the output domains the same.) For example, the domains might be images from a computer graphics system and real images, or product reviews and movie reviews. We assume we do not have labeled examples from the target domain. Our goal is to fit the model on the source domain, and then modify its parameters so it works on the target domain. This is called (unsupervised) domain adaptation (see e.g., [KL21] for a review).

A common approach to this problem is to train the source classifier in such a way that it cannot distinguish whether the input is coming from the source or target distribution; in this case, it will only be able to use features that are common to both domains. This is called domain adversarial learning [Gan $+16]$. More formally, let $d_n \in{s, t}$ be a label that specifies if the data example $n$ comes from domain $s$ or $t$. We want to optimize
$$\min \phi \max \theta \frac{1}{N_s+N_t} \sum_{n \in \mathcal{D}n, \mathcal{D}_t} \ell\left(d_n, f\theta\left(\boldsymbol{x}n\right)\right)+\frac{1}{N_s} \sum{m \in \mathcal{D}\nu} \ell\left(y_m, g\phi\left(f_\theta\left(\boldsymbol{x}_m\right)\right)\right)$$
where $N_s=\left|\mathcal{D}_s\right|, N_t=\left|\mathcal{D}_t\right|, f$ maps $\mathcal{X}_s \cup \mathcal{X}_t \rightarrow \mathcal{H}$, and $g$ maps $\mathcal{H} \rightarrow \mathcal{Y}_t$. The objective in Equation (19.15) minimizes the loss on the desired task of classifying $y$, but maximizes the loss on the auxiliary task of classifying the source domain $d$. This can be implemented by the gradient sign reversal trick, and is related to GANs (generative adversarial networks). See e.g., [Csu17; Wu $+19]$ for some other approaches to domain adaptation.

机器学习代考

机器学习代考_Machine Learning代考_SimCLR

$x_1=t_1(\boldsymbol{x}), \boldsymbol{x}2=t_2(\boldsymbol{x})$ ，它们是某些转换生成的输入的”语义等价”版本 $t_1, t_2$. 例如，如果 $\boldsymbol{x}$ 是图像， 这些可能是对图像的小扰动，例如随机裁剪，如第 $19.1$ 节所述。此外，我们对”负”样本进行采样 $x_1^{-}, \ldots, \boldsymbol{x}_n^{-} \in N(\boldsymbol{x})$ 来自表示”语义不同”图像的数据集 (实际上，这些是小批量中的其他示例)。接下 来我们定义一些特征映射 $F: \mathbb{R}^D \rightarrow \mathbb{R}^E$ ， 在哪里 $D$ 是输入的大小，并且 $E$ 是嵌入的大小。 然后我们尝试最大化相似视图的相似性，同时最小化不同视图的相似性，对于每个输入 $x$ : $$J=F\left(t_1(\boldsymbol{x})\right)^{\mathrm{T}} F\left(t_2(\boldsymbol{x})\right)-\log \sum{\boldsymbol{x}_i^{-} \in N(\boldsymbol{x})} \exp \left[F\left(\boldsymbol{x}_i^{-}\right)^{\mathrm{T}} F\left(t_1(\boldsymbol{x})\right)\right]$$

$$p\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)=\frac{\exp \left[-\mathcal{E}\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)\right]}{Z\left(\boldsymbol{x}_1\right)}$$

$$Z(\boldsymbol{x})=\int \exp \left[-\mathcal{E}\left(\boldsymbol{x}^{-} \mid \boldsymbol{x}\right)\right] d \boldsymbol{x}^{-}=\int \exp \left[F\left(\boldsymbol{x}^{-}\right)^{\top} F(\boldsymbol{x})\right] d \boldsymbol{x}^{-}$$

$$\log p\left(\boldsymbol{x}_2 \mid \boldsymbol{x}_1\right)=F\left(\boldsymbol{x}_2\right)^{\mathrm{T}} F\left(\boldsymbol{x}_1\right)-\log \int \exp \left[F\left(\boldsymbol{x}^{-}\right)^{\top} F\left(\boldsymbol{x}_1\right)\right] d \boldsymbol{x}^{-}$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: