# 机器学习代考_Machine Learning代考_COMP3670

## 机器学习代考_Machine Learning代考_Word analogies

One of the most remarkable properties of word embeddings produced by word2vec, GloVe, and other similar methods is that the learned vector space seems to capture relational semantics in terms of simple vector addition. For example, consider the word analogy problem “man is to woman as king is to queen”, often written as man:woman:king:queen. Suppose we are given the words $a=$ man, $b$ =woman, $c$ =king; how do we find $d=$ queen? Let $\boldsymbol{\delta}=\boldsymbol{v}_b-\boldsymbol{v}_a$ be the vector representing the concept of “converting the gender from male to female”. Intuitively wè can find word $d$ by computing $\boldsymbol{v}_d=\boldsymbol{c}+\boldsymbol{\delta}$, and then finding the closest word in the vocabulary to $\boldsymbol{v}_d$. See Figure $20.45$ for an illustration of this process, and code.probml.ai/book1/word_analogies torch for some code.

In [PSM14a], they conjecture that $a: b:: c: d$ holds iff for every word $w$ in the vocabulary, we have
$$\frac{p(w \mid a)}{p(w \mid b)} \approx \frac{p(w \mid c)}{p(w \mid d)}$$ In $[$ Aro $+16]$, they show that this follows from the RAND-WALK modeling assumptions in Section 20.5.5. See also [AH19; EDH19] for other explanations of why word analogies work, based on different modeling assumptions.

## 机器学习代考_Machine Learning代考_RAND-WALK model of word embeddings

Word embeddings significantly improve the performance of various kinds of NLP models compared to using one-hot encodings for words. It is natural to wonder why the above word embeddings work so well. In this section, we give a simple generative model for text documents that explains this phenomenon, based on [Aro+16].

Consider a sequence of words $w_1, \ldots, w_T$. We assume each word is generated by a latent context or discourse vector $\boldsymbol{z}t \in \mathbb{R}^D$ using the following log bilinear language model, similar to [MH07]: $$p\left(w_t=w \mid \boldsymbol{z}_t\right)=\frac{\exp \left(\boldsymbol{z}_t^{\top} \boldsymbol{v}_w\right)}{\sum{w^{\prime}} \exp \left(\boldsymbol{z}t^{\top} \boldsymbol{v}{w^{\prime}}\right)}=\frac{\exp \left(\boldsymbol{z}_t^{\top} \boldsymbol{v}_w\right)}{Z\left(\boldsymbol{z}_t\right)}$$
where $\boldsymbol{v}_w \in \mathbb{R}^D$ is the embedding for word $w$, and $Z\left(\boldsymbol{z}_t\right)$ is the partition function. We assume $D<M$, the number of words in the vocabulary.

Let us further assume the prior for the word embeddings $v_w$ is an isotropic Gaussian, and that the latent topic $z_t$ undergoes a slow Gaussian random walk. (This is therefore called the RAND-WALK model.) Under this model, one can show that $Z\left(\boldsymbol{z}t\right)$ is approximately equal to a fixed constant, $Z$, independent of the context. This is known as the self-normalization property of log-linear models [AK15]. Furthermore, one can show that the pointwise mutual information of predictions from the model is given by $$\mathbb{P M I I}\left(w, w^{\prime}\right)=\frac{p\left(w, w^{\prime}\right)}{p(w) p\left(w^{\prime}\right)} \approx \frac{\boldsymbol{v}_w^{\top} \boldsymbol{v}{w^{\prime}}}{D}$$
We can therefore fit the RAND-WALK model by matching the model’s predicted values for PMI with the empirical values, i.e., we minimize
$$\mathcal{L}=\sum_{w, w^{\prime}} X_{w, w^{\prime}}\left(\mathbb{P M I}\left(w, w^{\prime}\right)-\boldsymbol{v}w^{\top} \boldsymbol{v}{w^{\prime}}\right)^2$$
where $X_{w, w^{\prime}}$ is the number of times $w$ and $w^{\prime}$ occur next to each other. This objective can be seen as a frequency-weighted version of the SVD loss in Equation (20.138). (See [LG14] for more connections between word embeddings and SVD.)

Furthermore, some additional approximations can be used to show that the NLL for the RANDWALK model is equivalent to the CBOW and SGNS word’2vee objectives. We ean also derive the objective for GloVE from this approach.

# 机器学习代考

## 机器学习代考_Machine Learning代考_Word analogies

$$\frac{p(w \mid a)}{p(w \mid b)} \approx \frac{p(w \mid c)}{p(w \mid d)}$$

## 机器学习代考_Machine Learning代考_RAND-WALK model of word embeddings

$$p\left(w_t=w \mid \boldsymbol{z}t\right)=\frac{\exp \left(\boldsymbol{z}_t^{\top} \boldsymbol{v}_w\right)}{\sum w^{\prime} \exp \left(\boldsymbol{z} t^{\top} \boldsymbol{v} w^{\prime}\right)}=\frac{\exp \left(\boldsymbol{z}_t^{\top} \boldsymbol{v}_w\right)}{Z\left(\boldsymbol{z}_t\right)}$$ 在哪里 $\boldsymbol{v}_w \in \mathbb{R}^D$ 是词的嵌入 $w$ ，和 $Z\left(\boldsymbol{z}_t\right)$ 是配分函数。我们猜测 $D{w, w^{\prime}} X_{w, w^{\prime}}\left(\mathbb{P M I I}\left(w, w^{\prime}\right)-\boldsymbol{v} w^{\top} \boldsymbol{v} w^{\prime}\right)^2$$在哪里$X_{w, w^{\prime}}$是次数$w$和$w^{\prime}\$ 彼此相邻发生。这个目标可以看作是等式 (20.138) 中 SVD 掕失的频率加权版 本。（有关词嵌入和 SVD 之间的更多联系，请参阅 [LG14]。)

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: