# 经济代写|博弈论代写Game Theory代考|ECON3503

## 经济代写|博弈论代写Game Theory代考|Approaches to Learning in Game Theory

The relative-payoff-sum (RPS) learning rule proposed by Harley (1981) and Maynard Smith (1982) appears to have been developed with the aim to make players approach an ESS. It makes use of discounted payoff sums for different actions $u$. For an individual who obtains a reward $R_t$ in round $t$, we can write a total discounted payoff sum as $S_0=r$ and
$$S_t=r+\sum_{\tau=0}^{t-1} \gamma^\tau R_{t-\tau}$$
for $t \geq 1$, where $\gamma \leq 1$ is a discount factor and $r>0$ is referred to as a (total) residual payoff. With actions $u_k$, for instance $u_1$ and $u_2$, we can split the total $S_t$ into components, $S_t=S_{1 t}+S_{2 t}$, where only those rounds where action $u_k$ is used contribute to $S_{k t}$. In doing this, the residual should also be split in some way, as $r=r_1+r_2$, so that
$$S_{k t}=r_k+\sum_{\substack{\tau=0 \ u=u_k}}^{t-1} \gamma^\tau R_{t-\tau} .$$
The RPS learning rule is then to use action $u_k$ in round $t$ with probability $p_{k t}=$ $S_{k, t-1} / S_{t-1}$. For this to always work one requires that the rewards and residuals are positive. A consequence of the rule is that actions that have yielded higher rewards, but also those that have been used more often, have a higher probability of being chosen by the learner.

Although the RPS rule is not directly based on ideas about learning in animal psychology, it is broadly speaking a kind of reinforcement learning. In general, beyond the actor-critic approach described by Sutton and Barto (2018) and which we have used in this chapter, reinforcement learning can refer to any process where individuals learn from rewards that are somehow reinforcing, in a way that does not involve foresight or detailed understanding of the game situation. This broader interpretation has for instance been used by Roth and Erev (1995), and Erev and Roth (1998, 2014) in influential work on human behaviour in experimental games. Their basic reinforcement-learning model corresponds to the RPS learning rule.

## 经济代写|博弈论代写Game Theory代考|Convergence towards an Endpoint of a Game

In the examples in this chapter on the Hawk-Dove, investment, and dominance games (Sections 5.2,5.3, and 5.4), we found that low rates of learning produced learning outcomes near an ESS of a one-shot game, after many rounds of learning. The general question of whether learning will converge to a Nash equilibrium is much studied in economic game theory. The results are mixed, in the sense that there are special classes of games for which there is convergence, but also examples of non-convergence, such as cycling or otherwise fluctuating learning dynamics. Pangallo et al. (2019) randomly generated a large number of two-player games with two or more (discrete) actions per player, and examined the convergence properties for several classes of learning dynamics, including Bush-Mosteller reinforcement learning. They found that for competitive games, where gains by one player tend to come at a loss for the other, and with many actions, non-convergence was the typical outcome. In biology we are mainly interested in games that represent widespread and important interactions, and non-convergence of learning dynamics need not be common for these games. In any case, the relation between learning outcomes and game equilibria should always be examined, because game theoretic analyses add valuable understanding about evolved learning rules and learning outcomes.

The distinction between large and small worlds comes from general theories of rational human decision-making and learning under ignorance (Savage, 1972; Binmore, 2009; Huttegger, 2017). The large-worlds approach is a sort of questioning or criticism of the realism of the Bayesian small-worlds approach (Binmore, 2009). For game theory in biology, the decision-making processes of other individuals are among the most complex aspects of an individual’s environment. So, for instance, in fictitious play individuals are assumed to understand the game they are playing, but they do not have an accurate representation of how other individuals make decisions. For this reason, fictitious play is a large-worlds approach, although it goes beyond basic reinforcement learning.

There are in fact rather few examples where game-theory models of social interactions have achieved a thoroughgoing small-worlds approach when there is uncertainty about the characteristics of other individuals. Some of these examples we present later in the book (Chapter 8). We argue that learning, including actorcritic reinforcement learning, will be especially helpful to study social interactions where individuals respond to each other’s characteristics, as we develop in Section 8.6. It could well be that it is also a realistic description of how animals deal with social interactions. Further, for situations that animals encounter frequently and are important in their lives it could be realistic to assume, as we have illustrated in this chapter, that evolution will tune certain aspects of the learning process, including learning rates.

## 经济代写|博弈论代写Game Theory代考|Approaches to Learning in Game Theory

Harley (1981) 和 Maynard Smith (1982) 提出的相对收益和 (RPS) 学习规则似乎是为了让玩家接近 ESS 而 开发的。它利用不同行动的贴现收益总额 $u$. 对于获得奖励的个人 $R_t$ 圆形 $t$ ，我们可以将总贴现收益写为 $S_0=r$ 和
$$S_t=r+\sum_{\tau=0}^{t-1} \gamma^\tau R_{t-\tau}$$

$$S_{k t}=r_k+\sum_{\tau=0}^{t-1} \gamma^\tau R_{t-u_k} R_{t-\tau} .$$
RPS 学习规则是使用动作 $u_k$ 圆形 $t$ 有概率 $p_{k t}=S_{k, t-1} / S_{t-1}$. 为此，需要奖励和残差是正数。该规则的一 个结果是产生更高奖励的动作，以及那些被更频僌使用的动作，被学习者选择的概率更高。

## 经济代写|博弈论代写Game Theory代考|Convergence towards an Endpoint of a Game

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: