# 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|CMSC411

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Active Wait and GNU OpenMP Policy

To synchronize threads means that a thread could wait until others become in states allowing the establishment of the synchronization. This waiting phase can be fulfilled in two different ways: (1) active wait, and (2) passive wait.

The active wait consists in doing polling (periodic reads) on a waiting flag until this one reaches an expected value. This kind of waiting is very efficient in term of execution resuming speed since as soon as the flag state changes, the thread can resume its nominal execution flow. However periodic reads may lead to a waste of computing time and power consumption for long waiting periods.
The passive wait involves putting the waiting thread to sleep. When a thread changes the state of the waiting flag from the waiting state to the release state, this thread is also in charge of awakening at least one sleeping thread.

The default GNU OpenMP library waiting policy is the following: active wait is performed by a waiting thread until a predefined amount of time is elapsed. When this amount of time is elapsed, wait policy switches to passive wait and the thread goes to sleep till the barrier completion. Waiting modes can also be forced using explicit directives.

The active wait is used when highly reactive applications are expected, reducing as much as possible the synchronization delays. In this study we decided to focus on these applications, for which reducing the total computation time is the main challenge.

## 电子工程代写|计算机系统结构代写Computer Systems Architecture代考|Evaluation Platform

Regarding the hardware side, our evaluation platform is based on a Veloce2 Quattro emulator. Emulation platform allows us to fast emulate a full Register Transfer Level (RTL) system, with a cycle accurate precision. Indeed, to lead timing measurement campaigns on operating system primitives (e.g. synchronization mechanisms), we have to get information from software execution (operating system boot + application run) during a very large number of clock cycles. With the “classical” simulation limitations, accurate simulation of these mechanisms is extensively long. For example, the boot of a Linux kernel on top of cycle accurate SystemCass [7] simulation system could take several days for a 16core TSAR platform. Hence it is hard to imagine a full measurement campaign with a so long kernel boot duration. Commonly, people choose to deteriorate the accuracy of the simulated model to improve the running time. On our side, we use hardware support to speed-up simulation time without losing accuracy.
As for the software aspect, we use a port of Linux kernel $4.6$ and the $\mu \mathrm{Clibc}$ to boot the TSAR platform in our measurement campaign.

The GCC version used to compile applications for this platform is the 4.8.2. Note that the GNU OpenMP library is part of GCC. Hence the GNU OpenMP library version is directly related to the GCC release version. The GCC version used is quite old, but the synchronization barrier management of the GNU OpenMP library has not changed in more recent GCC releases. Slowdown issues are the same in the release $4.8 .2$ that in the latest $7.2$ GCC release.

To avoid interferences of the scheduling policy on our measurements, we bound each thread to a different core, by setting the suitable OpenMP directive.

