# 电子工程代写|并行计算代写Parallel Computing代考|CSE179

## 电子工程代写|并行计算代写Parallel Computing代考|Hardware Realisation Flavours

There are two fundamentally different ways to realise SIMD in hardware: We can work with large registers that host $N$ values at once. When we add two of these massive registers, they effectively perform $N$ additions in one rush. Alternatively, we can work with $2 N$ normal registers where the $N$ pairs of registers all perform the same operation.

The former realisation variant is what we find in standard processors today. Though there are numerous early adoptions of the SIMD concept, today’s architectural blueprint dates back to around 1999 when Intel introduced a technique they called SSE. The main processor here is sidelined by an FPU (floating point unit as compared to $\mathrm{ALU}$ ) hosting additional registers. These registers are called $\mathrm{xmm} k$ with $k \in{0,1, \ldots, 7}$. They are larger than their ALU’s counterparts. In the original SSE, they had 128 bits. SSE can only be used for single precision arithmetics-its primary market had been computer games and graphics-which means each xmm register can host up to four single precision values with 32 bits each. If four entries of a vector $x \in \mathbb{R}^{4}$ are held in xmm0 and four entries $y \in \mathbb{R}^{4}$ in $\mathrm{xmm} 1$, then the addition of xmm0 and xmml computes four additions in one rush. The hardware ensures that $\mathrm{xmm} 0$ does not spoil $\mathrm{xmm} 2$ and so forth. The xmmk registers are the $\mathrm{RegA}, \mathrm{RegB}$, …registers from our introductory example, i.e. the RegA1, RegA2, and so forth are physically stored in one large RegA register.

## 电子工程代写|并行计算代写Parallel Computing代考|Vertical versus horizontal vector operations

We compute $f=\sum_{i=1}^{2} x_{i} y_{i}$, i.e. a small vector product, with a vector length of two. Our code loads $\left(x_{1}, x_{2}\right)$ into the first register, $\left(y_{1}, y_{2}\right)$ into the second, and then multiplies them component-wisely via one vertical operation. Thus, there will be one vector register holding $\left(x_{1} y_{1}, x_{2} y_{2}\right)$. Without horizontal vector operations, we next have to decompose (split) this vector register up into two registers – another step-before we eventually add up the partial results.

Further improvement of vector computing capabilities results from the fact that modern vector units offer fused multiply add (FMA): They compute $f=x+(y \cdot z)$ in one step. That is two arithmetic operations (a multiplication plus an addition) in one step rather than two! The operations are fused.

Beyond the extensions of the vector instruction set, the biggest improvement upon SSE is SSE’s successor Advanced Vector Extensions (AVX), which widens the individual register from 128 bits to 256 . Later, we got the AVX-512 extension. Eight double values a eight bytes now fit into one register.

Statements on the pay off of vector operations as factors of two or four lack two details: On the one hand, vector operations typically have a way higher latency than their scalar counterparts. That means, loading data into vector registers is expensive and we have to amortise this speed penalty by vector efficiency. On the other hand, vector units are independent of the CPU. Vendors thus drive them with slightly different clock speed. They reduce the frequency for AVX-heavy code. ${ }^{1}$ Otherwise, the chip would become too hot. We conclude that optimal code, from a vector point of view, relies on sequences of $f=x+(y \cdot z)$ operations, but the impact on the time-to-solution has to be analysed carefully and experimentally.

## 电子工程代写|并行计算代写Parallel Computing代考|Vertical versus horizontal vector operations

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: