Paying and Persuading


Abstract

I study dynamic contracting where a principal (Sender) privately observes a Markovian state and seeks to motivate an agent (Receiver) who takes actions. Sender can both use payments to augment ex-post payoffs or persuasion to alter the informational environment as ways to provide incentives. For any stage-game payoffs, cost of transfers, rate of future discounting, and Markov transition rule, optimal transfers are backloaded—payments occur only when Sender commits to reveal the state at all continuation histories. In a rideshare example, the optimal contract is a loyalty program: drivers receive the static optimal information structure until a random promotion time, after which the state is fully revealed and only payments are used to motivate the driver.

1 Introduction↩︎

Consider the problem faced by a principal who must convince an agent to take an ex-ante undesirable action. Standard economic theory suggests two ways to motivate such an agent: directly compensate them for taking such an action by means of a monetary transfer (paying), or manipulate their information environment so that the action becomes ex-ante desirable (persuading). In settings where only one such tool is available to the agent2 much is known about the optimal way to pay or persuade the agent, and economists have derived various solutions in increasingly sophisticated models of both, separately.

Yet little is known about the optimal way to pay and persuade an agent—to jointly compensate them for taking certain actions while also changing the expected value of those actions by manipulating their informational environment. That such situations are understudied is, in part, historical; for example, in models of moral hazard, the agent is the one traditionally endowed with the informational advantage, not the principal, and so there is no room for information design. Likewise, in models of information design, such as the celebrated judge-jury example of [1] or drug trials by the FDA, it is unnatural to suppose Sender can also pay Receiver. Consequently, the intersection of these two tools is not very well understood.

Despite this prima facie dichotomy between paying and persuading, there are new economic settings where principals jointly control information and monetary transfers. For example, Uber strategically varies the amount of information drivers can see about recommended trips in order to motivate their drivers to accept rides that drivers may find ex-post undesirable.3 Advertisers may both run advertisements to consumers and also email private offers that discount the good they are advertising in order to incentivize purchases. Managers who privately observe the value or difficulty of a project may both strategically provide feedback to the worker and directly vary their compensation scheme in order to convince them to work when they otherwise might prefer to shirk.

Taking seriously the interaction between paying and persuading leads to a whole host of new questions. What is the optimal way to combine information and transfers in a one-shot or a repeated interaction? Because information generation is free, is it ever profitable to strategically withhold information? When the principal and agent interact repeatedly, what can be said about trade off between using future information and future payments to motivate the agent? What is the value of repeated interaction when the principal has access to both tools, and how does it compare to the one-shot game?

This paper studies dynamic contracting when Sender can both flexibly manipulate information and offer payments to Receiver and answers the above questions. Formally, I consider a model where a principal (Sender) privately observes a Markovian state and then provides information and contingent transfers to an agent (Receiver) who chooses a perfectly monitored action in each period. Sender can commit to the entire path of transfers and information, conditioning payments and persuasion today based on any information in the past history. Information and transfers can thus both be used to directly incentivize an agent today, but can also be rebalanced intertemporally to provide incentives tomorrow: actions today may lead to valuable information tomorrow as a form of dynamic compensation. Thus, Sender must balance the use of information and transfers both within each period and across time.

My main result is a abstract backloading theorem that sharply characterizes the tradeoff between optimal information and transfers in Sender’s dynamic contracting problem. Theorem 1 constructs an optimal contract where, if transfers are ever used, Sender has already committed to fully revealing the state at all continuation histories, i.e. committed to relinquishing their informational advantage at every point in the future. This characterization applies without putting any restrictions on Sender or Receiver stage-game payoffs, the cost of transfers, the discount rate, or the law governing the Markov chain, and thus speaks generally to the way information and transfers trade off against one another in dynamic contracts. Theorem 1 can be interpreted as a sequencing argument: giving Receiver information tomorrow must always be weakly as efficient as simply paying Receiver today, and hence payments will only occur on the optimal path once the former incentive instrument is exhausted.

The key step of the proof is a pullback lemma (Lemma 4), which establishes that the full-information experiment lies on the Pareto frontier (i.e. is an efficient way to provide some level of incentives). Providing more information has two effects: it increases the total efficiency of the decision (increasing total surplus),but also increases Receiver’s relative share of the total surplus. Because Receiver ultimately takes the action, the second force can sometimes dominate the first, rendering information an inefficient way to provide incentives. Thus, without transfers, the slope of the Pareto frontier at full-information can be quite steep. Lemma 4 shows that, so long as Sender has access to transfers, there is a way to counterbalance the surplus redistribution force so that the cost of providing more information in conjunction with transfers is never more inefficient than simply paying Receiver. Thus, optimally paying and persuading tomorrow must be a weakly more efficient method of providing incentives than paying today, implying payments can always be backloaded. Since Lemma 4 requires Sender to be able to jointly pay and persuade in the future, it highlights how transfers augments the cost of information revelation in dynamic persuasion.

Following Theorem 1, I derive the value of paying and persuading in several specialized contexts. First, I consider the case with myopic players (\(\delta = 0\)). Here, I show the value of joint information and transfers takes the form of an augmented concavification (which I call the \(k\)-cavification)—Sender computes their optimal transfer at each of finitely many extremal beliefs (those where Receiver is maximally indifferent between actions), and then linearly interpolates between the value at each of these beliefs. This characterization dramatically simplifies the complexity of finding the value of static joint information and transfers.

Second, I bound value of dynamically paying and persuading as \(\delta \to 1\). When Sender and Receiver become arbitrarily patient and the state transition process is ergodic, I show Sender’s value from the optimal dynamic contract must be bounded from below by their best stationary coupling satisfying a standard Receiver incentive compatibility constraint evaluated at the ergodic distribution of the Markov chain. Such a characterization implies simple conditions under which Sender’s dynamic value strictly outperforms their one-shot persuasion value in the static game, and thus implies conditions where dynamic incentivization benefits Sender. It also allows me to comment on the literature on Markovian persuasion, and show that when Receivers are not myopic, the canonical upper bound established in the literature instead becomes a lower bound. Moreover, this lower bound is the exact value of dynamic persuasion whenever Sender has an optimal stationary strategy.

Finally, I consider an example inspired by optimal rideshare design where Uber (Sender) must repeatedly incentive a driver (Receiver) to accept rides of varying value to Uber. I show that when the value of rides is independently drawn and additive to players, the optimal contract is a tiered loyalty program—Uber provides their static optimum amount of information and withholds transfers until the driver has accepted enough “bad” rides (i.e. “Uber Blue”), after which they turn over all information about the ride beforehand and both (1) pay the driver to incentivize them to take rides, and (2) allow the driver to reject bad rides whenever payments are not valuable (i.e. “Uber Gold”). My simple example thus fits with the publically available stylized description of Uber’s driver loyalty program.

The remainder of this paper is organized as follows. I next review the related literature. Section 2 introduces the general model. Section 3 states and proves my main result. Section 4 gives related results characterizing the value of persuasion with different discount rates. Section 5 concludes. Appendix A contains relevant proofs, while Appendix B contains technical details about the topology on strategies.

1.1 Related Literature↩︎

I relate to several distinct strands of literature. First is the literature on dynamic contracting, as in [3] and [4], who study the time structure of optimal transfers with private information about income and outside options. Most closely related to my paper is [5], who characterizes in a general, abstract model the time structure of optimal transfers; in contrast, I allow the principal to control information about the unknown state and characterize the time structure of optimal transfers and persuasion. Augmenting the model with persuasion is substantial; while the motivation and models undergirding our papers are similar, the resulting economic forces that characterize optimal contracts are quite distinct.4 In the absence of transfers, [6] and [7] characterize optimal dynamic contracts with agent private information in binary state-binary action models; similarly, [8] study dynamic mechanisms with Markovian information on the agent’s side and transfers. I relate to these models by studying the asymptotic features of contracts when the principal has the informational advantage.

As Sender in my model has superior information to Receiver and slowly “unwinds” their informational advantage, my model also has some qualitative features reminiscent of apprenticeship models (see [9] and [10]). Though the specific way information is modeled are distinct, information is inefficiently withheld at the beginning of both our models and then gradually provided to Receiver as a motivating tool. Eventually, this advantage is eventually relinquished and the principal begins to pay the agent. Thus I provide another parsimonious way to model apprenticeship dynamics when information is not literally a stock or irreversible but simply provided via communication.

Importantly, (and relative to the above literature), I suppose the principal can flexibly design the information structure, in the spirit of the Bayesian persuasion literature started by [11] and [1]. A rich literature has since pushed the limits of static persuasion; see [12] for a survey. Yet work on dynamic persuasion is much sparser. [13], [14] and [15] assume a myopic Sender and characterize conditions where greedy policies are optimal, while [16] solves the full dynamic persuasion problem but only for a linear-quadratic game with a Brownian state. In the absence of commitment, [17] and [18] study repeated cheap talk games and characterize when full revelation is possible and the payoff set of the patient game, respectively. Further afield, [19] and [20] study the joint design of information and transfers in auctions, assuming the individual who persuades (Seller) is distinct from the one paying (buyer). To the best of my knowledge, this paper is the first to establish qualitative properties of the optimal mechanism for general Markovian states and arbitrary payoffs, albeit with transfers. I also provide a novel characterization of the value of persuasion, which nest the value of dynamic persuasion (by taking the cost of transfers to \(\infty\)).

Also related is the literature on information design in optimal stopping problems ([21], [22], [23] and [24]), which characterizes the optimal mechanism when Receiver only takes an action once and the state is perfectly persistent. Importantly, they do not allow for transfers, and hence the economic forces driving our models are quite different. Particularly related is [21], who study optimal dynamic information design in a continuous time, work-or-shirk model when the transfer scheme is fixed and paid out only at the end; I show that when transfers can be endogenously combined with information in any flexible fashion, payments are paid out only after the initial period of pure information design.

I also draw from the literature on information design in moral hazard ([25] and [26]). These papers allow for transfers, but must assume a perfectly persistent binary as they seek to characterize the optimal mechanism explicitly. In the case without moral hazard but where the principal has access both instruments, I show the optimal contract has a loyalty contract structure, in contrast to their results.

My geometric characterization of static persuasion with transfers (Proposition 6) relates to the literature on the geometry of persuasion. In particular, I build on work that characterize the value of persuasion geometrically in Bayesian persuasion ([1]), cheap talk with transparent motives ([27]), constrained persuasion ([28]), and communication with flexible information acquisition ([29]); I identify the relevant object to be concavified in the presence of both transfers and persuasion.

Finally, I relate to a newer literature seeking to understand the impact of persuasion in (repeated) games, often with some sort of monetary transfer. [30] and [31] studied mediated games where a designer wishes to implement some optimal outcome among players who repeatedly play a pricing game (i.e. a first price auction or a Bertrand competition market game). [32] study a mean cheap-talk game and show Sender attains their best monotone contract when transfers are voluntary (i.e. relational). In contrast, I analyze a model where the person who persuades is also the one who pays, and give a backloading result when the principal can commit to the entire path of transfers.

2 Model↩︎

2.1 The Stage Game↩︎

There are two players, Sender (S) and Receiver (R). Sender privately observes a state \(\theta\) drawn from some distribution \(\mu_0 \sim \Delta(\Theta)\); Receiver takes an action \(a \in A\). For convenience, assume \(A\) and \(\Theta\) are finite.

Sender can do one of two things to incentivize Receiver. First, they can transmit information, formally a signal structure \(\mathcal{S}: \Theta \to \Delta(M)\) where \(M \subset \mathcal{M}\) for some standard Borel \(\mathcal{M}\). Second, they can commit to a transfer rule, \(t: \Theta \times M \times A \to \mathbb{R}_+\) which specifies a nonnegative5 payment for each pair of states, messages, and actions. Let \(\mathcal{T}\) denote the set of all transfer rules. Having seen the joint information structure and transfer rule \((\mathcal{S}, t)\), and an observed signal \(m\), the agent forms beliefs and takes an action \(a\) maximizing their transfer-augmented utility \[a \in \mathop{\mathrm{\arg\!\max}}_{\tilde{a} \in A}\mathbb{E}_{\theta | m}[u(\tilde{a}, \theta) + t(\theta, m, \tilde{a})]\] where \(u: A \times \Theta \to \mathbb{R}\) is the agents’ payoff. Let the function \(\bar a: M \times \mathcal{T} \to \Delta(A)\) be a best response to \((\mathcal{S}, t)\) if \(\bar a(\cdot)\) always maximizes Receiver’s expected transfer-augmented utility.

Analogously, let \(v: \Theta \times A \to \mathbb{R}\) be Sender’s payoff. Following the approach of [1] (modified for transfers), I will equivalently represent a tuple \((S, t, \bar a(\cdot))\) where \(\bar a(\cdot)\) best responds to \(\mathcal{S}\) as a pair of objects \((\tau, t, \bar a(\cdot))\) where \(\tau \in \Delta_{\mu_0}(\Delta(\Theta))\) is a Bayes plausible distribution of posterior beliefs and \[\bar a(\mu, t) \in \mathop{\mathrm{\arg\!\max}}_{a \in \mathcal{A}} \mathbb{E}_\mu[u(a, \theta) + t(\theta, m^{-1}(\mu, \bar a), a)]\] where \(m^{-1}(\mu, \bar a)\) is the message that induces belief \(\mu\) and action \(\bar a(\cdot)\). Finally, I will also sometimes follow the approach of [33] and represent, when convenient, \((\mathcal{S}, t, \bar a)\) by a joint distribution \(\gamma \in \Delta(\Theta \times M \times A)\) and transfer rule \(t: \Theta \times M \times A\) such that for each fixed \(m\), the induced distribution \(\gamma_m(\Theta \times A)\) is obedient, i.e. \[\mathbb{E}_{\gamma_{m, \theta | a}}[u(a, \theta) + t(\theta, m, a) - (u(a', \theta) + t(\theta, m, a'))] \geq 0 \quad \text{ for all } a, a' \in A.\]

For any transfer rule \(t\), suppose Sender incurs as per-unit cost \(k\) to pay Receiver \(1\) unit of payoff, so that for realized state, signal, and action, \((\theta, m, a)\) Sender’s payoff is given by \[v(a, \theta) - k t(\theta, m, a).\] If \(k = 1\), this is the familiar transferable utility case. As \(k \to 0\), Sender can attain first best. As \(k \to \infty\), transfers become arbitrarily expensive and Sender reverts to their no-transfers persuasion baseline. Consequently, my model can interpolate between (1) the persuasion benchmark, (2) Sender first best, and (3) the transferable utility case. Moreover, instead of complicating the exposition, parametricizing \(k\) will explicitly clarify the incentive role of transfers in motivating the agent relative to payoffs \((u, v)\).

The timing of the stage game formally is as follows.

  1. Sender and Receiver realize a common prior \(\mu_0\) over today’s states and actions.

  2. Sender commits to a signal structure \(\mathcal{S}\) and transfer rule \(t\).

  3. Having seen \((\mathcal{S}, t)\) and a specific realization \(m\), the agent takes an action \(a\).

  4. Interim payoffs \(u(a, \theta) + t(\theta, m, a)\) and \(v(a, \theta) - k t(\theta, m, a)\) are realized.

An immediate simplification allows the transfer rule to be written only as a function of \((m, a)\), integrating out the state. This is because incentive compatibility of any recommended action depends only on the expected transfer given any message, not the specific state-dependent ex-post payment.

Lemma 1. Fix any \((\mathcal{S}, t, \bar a)\). There is an alternative tuple \((\mathcal{S}', t, \bar a)\) where \(t\) is constant in \(\theta\).

Proof. At any state, let \(t(\theta, m, a) = \mathbb{E}_{\theta | m}[t(\theta, m, a)]\). Clearly this induces the same expected transfers and joint distribution of beliefs and actions. ◻

A straightforward modification extends the lemma into the dynamic game. I omit the dynamic generalization for expositional succinctness. In principle, a revelation principle also holds (see [1]) where it is without loss of generality for \(M = A\). However, it turns out that it will be useful in the dynamic case for the message space to be \(M = \Delta(\Theta) \times A\), i.e. recommend an action at each belief (possibly many actions).

2.2 Dynamics↩︎

Suppose this game is played over many periods, \(t = 0, 1, 2, \dots\) between Sender and Receiver, both of whom discount the future at a common rate \(\delta \in (0, 1)\). Importantly, Sender can commit to an entire history-contingent plan of action at the start of the game.

The state, which in period \(0\) is drawn according to some prior \(\mu_0\), thereafter is drawn according to some arbitrary Markov chain represented by the linear operator \(M: \Delta(\Theta) \to \Delta(\Theta)\). In the case where \(M\) is irreducible and aperiodic, let \(\mu^\infty\) be the resulting unique ergodic distribution of this process.

Receiver’s past actions are perfectly observed by Sender, but past states are never observed6. Thus, a time-\(t\) history can then be summarized as a sequence \(h^t = \{m_s, t_s, a_s\}_{s = 0}^{t - 1}\) of past messages, transfers, and actions7. However, because transfers at time \(t\) are measurable in the \(s < t\) history and time-\(t\) messages and actions, it is without loss of generality to set histories to be sequences \(\{m_s, a_s\}\) of messages and actions only. Let \(H^t = (M \times A)^t\) be the set of all time-\(t\) histories and \(H = \bigcup_t H^t\) the set of all histories, which we identify with \(H^\infty = (M \times A)^\infty\), the set of infinite histories in the standard way (see [34]). Endow \(H^\infty\) with the strong topology (see [Appendix32B] ).

Strategies are tuples \(\sigma = (\sigma^S, \sigma^R)\), given by functions \(\sigma^S: H \to \Delta(M)^\Theta \times \mathcal{T}\) and \(\sigma^R: M \times \mathcal{T}\to \Delta(A)\) specifying stage-game strategies for Sender and Receiver at each history. Let \(\Sigma = (\Sigma^S, \Sigma^R)\) be the set of all strategies. For any strategy \(\sigma\), let \(\mathbb{P}^\sigma \in \Delta(H)\) be the distribution over histories it induces and \(\mathbb{Q}^\sigma \in (\Delta(\Theta \times A))^\infty\) be the joint distribution over outcomes it induces. Let \(\mu(h^t; \sigma)\) be Receiver’s prior belief about the state at history \(h^t\) given strategy \(\sigma\), before any information is disclosed at time \(t\). In particular, this implies the distribution of beliefs induced by \(\sigma^S\) at \(h^t\) must be \(\mu(h^t; \sigma)\)-Bayes plausible.

Let \(\sigma(\cdot | h^t) = (\sigma^S(\cdot | h^t), \sigma^R(\cdot | h^t))\) denote the continuation strategies at \(h^t\) and \(\sigma(h^t) = (\sigma^S(h^t), \sigma^R(h^t))\) the strategy profile played at \(h^t\). Given some \(\sigma\) and \(\mathbb{P}^\sigma\), say \(\sigma\) is obedient if at every \(h^t \in \text{supp}(\mathbb{P}^\sigma)\), Receiver’s continuation strategy \(\sigma^R(\cdot | h^t)\) is optimal among the set of all possible continuation strategies: \[\sigma^R(\cdot | h^t) \in \mathop{\mathrm{\arg\!\max}}_{\sigma^{R'}(\cdot | h^t) \in \Sigma^R(\cdot | h^t)} \mathbb{E}_{\mathbb{P}^{(\sigma^S(\cdot | h^t), \sigma^{R'})}}\left[ \mathcal{U}(\sigma^S(\cdot | h^t), \sigma^{R'}) \right]\] where \(\mathcal{U}(\sigma^S(h^t), \sigma^R(h^t))\) is Receiver’s stage-game payoff given strategies at \(h^t\). A strategy profile \(\sigma\) is optimal8 if it is obedient and attains the maximum of Sender’s payoff among the set of all obedient payoffs. Let \(\mathcal{V}^*(\mu_0, \delta)\) be Sender’s payoff at some optimal strategy when the prior is \(\mu_0\) and players discount the future at \(\delta\).

The following lemma will be useful, which provides a dynamic version of the standard revelation/obfuscation principle for this setting.

Lemma 2. Let \(\sigma\) be obedient. There exists obedient \(\sigma^*\) with \(M = \Delta(\Theta) \times A\) such that for each \(m\) reached on the path of play, \(\mathbb{E}_{\theta | (\mu, a)} = \mu\), \(\sigma^R(h^t)(\mu, a) = a\), and \(t((\mu, a), a') \neq 0 \implies a = a'\) and moreover \(\mathbb{P}^\sigma = \mathbb{P}^{\sigma^*}\).

The proof is straightforward but deferred to the ¿sec:l:32dynamic32revelation32principle32proof? Lemma 2 implies that for each obedient strategy there is an outcome equivalent (and hence payoff equivalent) strategy \(\sigma^*\) which is direct in beliefs and actions: it tells Receiver (directly) what belief to hold, and then recommends an incentive compatible action. Moreover, it further simplifies the space of transfers: Sender need only pay Receiver if Receiver follows the recommended strategy. This implies the set of transfers needed to be specified simplifies from one for each pair of actions only to one for each action recommendation, a much lower dimensional space.

Why is it that Lemma 2 does not restrict to direct action recommendations, which are sufficient in the static case? While it is also sufficient in the dynamic case to restrict directly to action recommendations only (randomizing the belief implicitly), this obfuscates policies where Sender induces multiple actions at the same posterior belief (or multiple beliefs at the same action). At the optimum, this is without loss of generality, though it is with loss when one considers instead of the space of deviations that a designer could take (which I use in the proof of Theorem 1).

Note Lemma 2 implies it is equivalent to recast the problem into finding Bayes-plausible distributions of beliefs \(\tau \in \Delta(\Delta(\Theta))\), transfer rules \(t: M \to \mathbb{R}_+\), and recommended (random) actions \(\alpha: \Delta(\Theta) \to \Delta(A)\) such that (1) Receiver finds it incentive compatible to take all actions \(a \in \text{supp}(\alpha)\), (2) \(\text{supp}(\alpha(\mu)) = \{a : (\mu, a) \text{ sent on path}\}\), and (3) beliefs are Bayes plausible \(\mathbb{E}_\tau[\mu] = \mu_0\). Paired with a future utility continuation promise \(u': M \to \mathbb{R}\), this implies the following recursive formulation of the optimal strategy: \[\begin{align} V(\bar u, \mu_0, \delta) \notag \\ = \max_{\{\tau, t, u', \alpha\}} \Big[ \mathbb{E}_\tau\Big[(1 - \delta)\,\mathbb{E}_{\mu, \alpha}&[v(a, \theta) - k t(\mu, a)] + \delta V(u'(\mu, a), M\mu, \delta)\Big] \Big]\\[1em] \text{s.t.}\quad \mathbb{E}_{\mu, \alpha}[(1 - \delta) u(a, \theta) + \delta u'(\mu, a)] &\geq \mathbb{E}_\mu[(1 - \delta) u(a', \theta) + \delta u(M\mu)]\\[-0.25em] &\forall \mu \in \text{supp}(\tau),\; a \in \text{supp}(\alpha(\mu)),\; a' \in A \\[1em] \mathbb{E}_\tau[(1 - \delta)\,\mathbb{E}_{\mu, \alpha}[u(a, \theta)] + \delta u'(\mu, a)] &\geq \bar u\\[1em] \mathbb{E}_\tau[\mu] &= \mu_0\\[1em] \max\{(1 - \delta) t(\mu, a),\, u'(\mu, a)\} &\in [-C, C] \quad \forall a, \mu \end{align}\]

where \[\underline U(\mu) = (1 - \delta)\sum_{t = 0}^\infty u(a^*(M^t\mu, \boldsymbol{0}), \theta) \text{ with } a^*(\tilde{\mu}, \boldsymbol{0}) \in \mathop{\mathrm{\arg\!\max}}_{a \in A} \mathbb{E}_{\tilde{\mu}} [u(a, \theta)]\] is Receiver’s lowest possible payoff, i.e. the value of their outside option. The first two constraints (incentive compatibility and promise keeping) are standard in dynamic contracting, noting the Markovian evolution of the prior given by the belief (and the choice of \(\tau\)). The third constraint, Bayes plausibility, is unique to the persuasion problem. The final constraint, boundedness, is a technical assumption that ensures solutions to the value function satisfy a baseline transversality condition that the agent is not “strung-along forever,” i.e. promised infinite payoff at infinity but never at any finite time. In practice, \(C\) can be quite large (i.e. if \(|u|\) is bounded by \(1\), \(C\) can be \(1000^{100000}\)), and the specific choice of \(M\) will not affect the characterization or main results.

Proposition 1 formalizes the way in which solving for the functional equation above implies finding an optimal \(\sigma\). Throughout, represent \(\sigma^S(h^t)\) as a tuple \(\{\tau(h^t), t(h^t)\}\) and Receiver action \(\sigma^R(h^t)\) as a random action \(\{\alpha(h^t)\}\), and let \(\mathcal{U}(h^t; \sigma)\) be Receiver’s continuation utility at history \(h^t\). When there is any ambiguity about which strategy \(\{\tau(\cdot), t(\cdot), \alpha(\cdot)\}\) represents, I will add an additional argument to specify explicitly, i.e. \(\{\tau(\cdot; \sigma), t(\cdot; \sigma), \alpha(\cdot; \sigma)\) though I suppress the extra notation in the absence of such ambiguity.

Proposition 1. \(V(0, \mu_0, \delta) = \mathcal{V}^*(\mu_0, \delta)\). Moreover, there exists an optimal \(\sigma\) where for any \(h^t \in \text{supp}(\tau)\), \[\{\tau(h^t), t(h^t), \mathcal{U}(\{h^t, \mu, a\}), \alpha(h^t)\}\] solves the functional equation at \(\mathcal{U}(h^t; \sigma)\).

Proposition 1 follows from basic principles in dynamic programming and is proven in the ¿sec:p:32FE32SP32equivalence32proof? Importantly, it implies that it is sufficient to characterize (1) the evolution of solutions to the functional equation at different prior beliefs and utility promises and (2) the evolution of the path of utility promises at optimum in order to obtain a full characterization of the optimal strategy \(\sigma^*\). Towards this end, let \(\mathcal{F}(u, \mu)\) be the solution to the value function (for an implicit fixed discount rate) at utility promise \(u\) and belief \(\mu\). I conclude the description of the model by stating some basic facts about the value function.

Proposition 2. There exists a unique \(V\) solving (FE), and moreover \(V\) is continuous in \(u\) over \([-C, C]\) and \(\mu_0\) on \(\Delta(\Theta)\). Finally, \(V\) is concave nondecreasing in \(u\), and its right derivative satisfies \(V_+^{*'}(u, \mu) \geq -k\) for any \(u, \mu\).

The proof follows standard arguments in dynamic programming (see Chapter 4 of [35]) and is found in the ¿sec:p:32FE32Properties32proof?

3 Time Structure of Payments↩︎

3.1 Transfers as a Last Resort↩︎

Relative to standard dynamic contracting models, Sender in this model need not only resort to payments to incentivize the agent—they now can also change the relative expected value of any action by strategically providing information. Information provision has two effects. First, a static persuasion effect: it changes the value of any action taken today because Receiver expects different state-dependent utilities at the realized posterior belief. Second, a dynamic incentivization effect: because Receiver values more accurate information, Sender can leverage their ability to provide future information as a way to incentivizing effort today without changing todays’ information structure. This tradeoff leads to several novel questions inherent to the model. How does this incentivization effect compare to transfers in effectiveness? How does Sender balance static persuasion with dynamic incentivization? Does Sender ever want to link Receiver’s dynamic incentives—i.e. promise more information tomorrow (even when doing so might be beneficial to Receiver at the cost of Sender’s continuation utility) as a reward for taking less favorable actions today?

A natural and simple class of mechanisms that cleanly resolves the above questions are backloaded mechanisms—those where payments occur only after there is no room for information provision as a dynamic tool. Intuitively, these mechanisms say that whenever Sender wants to link Receiver’s dynamic incentives, doing so via information—giving up static persuasion power in favor of dynamic incentivization—is (weakly) more efficient than just paying Receiver directly. Formally, I define these mechanisms as follows (recall an information structure \(\tau\) reveals the state if it only supports degenerate beliefs):

Definition 1. Transfers are a last resort at \(\sigma\) if, for any \(h^t \in \mathbb{P}^\sigma\), \(t(h^t)(\mu, a) > 0\) implies \(\tau(h^s)\) reveals the state at every on-path successor history \(h^s \succsim \{h^t, \mu, a\}\).

Perhaps surprisingly, there is always an optimal mechanism where transfers are a last resort.

Theorem 1. There is an optimal \(\sigma^*\) where transfers are a last resort.

Theorem 1 is the main conceptual contribution of this paper, and gives a full characterization of when payments should be used as a substitute for information. If Sender ever wants to motivate an agent via transfers, they should instead first turn to dynamic informational incentives first. Only after Sender has completely “drawn down” on their stock of information (by promising future information at all continuation histories) do they start using transfers to motivate the agent.

The intuition behind the result is as follows. Suppose Sender at any point wishes to motivate the agent to take an action, for a fixed induced belief today. They can do this in two ways: by paying the agent, or by giving them a little bit more information. Since more information makes Receiver’s decision more efficient, Sender can always give a little more information to increase Receiver’s utility (and possibly Sender’s at some beliefs). From here, at any of the newly induced posterior beliefs where Receiver’s new action adversely affects Sender’s utility, they can pay Receiver to take an action equal to their original action. The amount they have to pay is exactly equal to the change in Receiver’s payoff along this new posterior relative to the new action, and thus comes at a efficiency cost of \(-k\). Such a future joint pay-and-persuade scheme cannot be worse off than simply paying the agent today (and will do strictly better so long as more information benefits Sender too), and hence so long as such a scheme is feasible at some future continuation history, can be chosen in lieu of payments today along the optimal path. This observation is the substance of the pullback and squeezing lemmas (Lemmas 4 and 5). The remainder of the proof handles the technicalities that arise given the generality of the model and the infinite set of histories.

It is worth explicitly flagging that the above informal argument is distinct from the intuition transfers must be backloaded because information provision is “always free” (since there are no costs to information acquisition). In fact, it can be that in the absence of transfers, the cost of giving the agent more information can be arbitrarily high because it allows Receiver to take actions which can be very damaging to the principal but of minimal benefit to Receiver. However, the pullback shows that by combining payments with persuasion, Sender can bound the cost of transferring utility to Receiver by giving more information by at most \(-k\), which is the marginal cost of paying the agent to fulfill a utility promise.

I caution here that Theorem 1 is conceptually distinct from familiar backloading results in the dynamic contracting literature ([5], [6]). First, it only shows payments are backloaded insofar as information is used first, instead of characterizing the optimal asymptotic contract with only payments. Second, unlike [5], which requires discount factors which are not too high, my characterization holds for all \(\delta\) (and in fact gives a nontrivial payoff predictions as \(\delta \to 1\): see 7). Finally, even with quasilinear transfers, the asymptotic contract neither needs to (1) immiserate or fully compensate Receiver payoffs, or (2) settle into the one which maximizes Receiver payoffs. There are two important differences with canonical problems. First, when Sender can persuade, their choice of information structure endogenously changes Receiver’s expected utility of each action and their outside option, leading to additional subtleties which are absent in standard principal-agent interactions. These interactions are first-order: Lemma 5 critically shows that the cost of mollifying Receiver’s outside action by providing information is bounded from below by the cost of directly paying Receiver. Second, my limited liability constraint holds pathwise, and thus can be more complicated. This is because an optimal experiment may induce beliefs both where the limited liability constraint binds (i.e. incentive are aligned, so Sender would like to be paid by Receiver) and ones where Sender is paying Receiver to make them exactly indifferent between their favorite action and a different one. This asymmetry again shows up in the argument behind Lemma 5: Sender can exactly increase the probability of generating the “more profitable belief” in a way that makes Receiver better off as a way to compensate Receiver in lieu of giving them money.

Several assumptions are vestigial for Theorem 1. For example, it need not be that the payoffs for players is the same in every period—payoffs can evolve in any Markovian fashion as well, so long Receiver’s payoff function is known to themselves at the start of each period9 Second, Sender and Receiver can have differing discount rates \((\delta_S, \delta_R) \in (0, 1)^2\). Finally, the state can be (partially, or noisily) revealed at the end of each period, so long as the revelation or monitoring structure is independent of Sender’s action. The proof will make clear why none of these assumptions are necessary, but also why it would be notationally cumbersome to directly accommodate them.

I conclude this section with the proof of Theorem 1.

Proof. The first step is to characterize the slope of \(V(\cdot)\) when transfers are used.

Lemma 3. For any \((u, \mu_0)\) and \(\{\tau, t, u', \alpha\} \in \mathcal{F}(u, \mu_0)\):

  1. If \(V_+'(u, \mu_0) = -k\), \(V_+'(u'(\mu, a), \mu) = -k\) for all \(\mu \in \text{supp}(\tau)\), \(a \in \text{supp}(\alpha(\mu))\).

  2. If \(t(\mu, a) > 0\), then \(V_+'(u'(\mu, a), \mu) = -k\) for all \(\mu \in \text{supp}(\tau), a \in \text{supp}(\alpha(\mu))\).

The proof can be found in the ¿sec:l:32slope32of32value32at32optimum32proof? Lemma 3 first shows that once transfers are used, then Sender and Receiver must be at the steepest part of the Pareto frontier forever more in the relationship. Moreover, it shows that if payments ever occur, then this must be true. These observations now allows us to characterize a lower bound on the value of providing more information to Receiver at any utility promise.

The second step: once we are at the steepest part of the Pareto frontier, it is without loss of optimality to provide incentives by giving Receiver full information with some probability. To state this step we will need a little bit more notation. At any time \(t\), let \(\mathbb{Q}_t^\sigma \in \Delta(\Theta \times A)\) be the unconditional probability distribution over outcomes (averaging over histories) induced by strategy \(\sigma\) at time \(t\), with \(\mathbb{Q}_t^\sigma(\theta) \in \Delta(A)\) the distribution over actions fixing some state. Moreover, define \[u^{RFI}(\mu) = (1 - \delta)\sum_{t = 0}^\infty \delta^t \mathbb{E}_{M^t \mu} [\max_{a \in A} u(a, \theta)]\] to be Receiver’s expected full-information payoff, starting at belief \(\mu\). Finally, say \(\sigma\) is \(u\)-constrained optimal if it is optimal among all obedient \(\sigma\) which guarantee Receiver a payoff of at least \(u\). Note all continuation strategies \(\sigma(\cdot | h^t)\) are \(\mathcal{U}(h^t | \sigma)\)-constrained optimal. We can now state the following lemma, proven in the ¿sec:l:32pullback32lemma32proof?

Lemma 4 (Pullback Lemma). Fix \((u, \mu_0)\) such that \(V_+'(u, \mu_0) = -k\), and let \(\sigma\) be \(u\)-constrained optimal. Then there exists obedient \(\sigma^*\) such that

  1. \(\tau(h^s; \sigma^*)\) reveals the state for all \(h^s \succ h^t\), \(h^s \in \text{supp}(\mathbb{P}^{\sigma^*})\).

  2. At every time \(t\), \(\mathbb{Q}_s^\sigma(\cdot | h^t) = \mathbb{Q}_s^{\sigma^*}(\cdot | h^t)\).

  3. \(\mathcal{V}(h^t | \sigma) - \mathcal{V}(h^t | \sigma^*) \leq k(u^{RFI}(\mu_0) - \mathcal{U}(\sigma))\).

  4. For every \(\tilde{u} \in [u, u^{RFI}(\mu_0)]\), there exists \(\alpha \in [0, 1]\) such that \(\alpha \sigma^* + (1 - \alpha)\sigma\) is \(\tilde{u}\)-constrained optimal.

The pullback lemma combined with Lemma 3 implies that at any history where transfers are used, it is possible to provide further utility by simply revealing the state with positive probability. This comes at a cost of \(k\), and hence is an efficient way to transfer utility once \(V_+'(u, \mu_0) = -k\).

The third step. From here, fix an optimal \(\sigma^*\) and define the set of histories where transfers are not a last resort, \(\mathcal{H}^t(\sigma^*)\), to be: \[\begin{align} \mathcal{H}^t(\sigma^*) = \{h^t \in \text{supp}(\mathbb{P}^{\sigma^*}) \cap H^t: \text{ } & t(h^t)(\mu, a) > 0, \mu \in \text{supp}(\tau(h^t; \sigma^*)), a \in \text{supp}(\alpha(h^t; \sigma^*)(\mu)) \\ & \text{ but } \exists h^s \succ (h^t, \mu), h^s \in \text{supp}(\mathbb{P}^{\sigma^*}) \text{ s.t. } \tau(h^s; \sigma^*) \neq \tau^{FI}\} \end{align}\] where \(\tau^{FI}\) is the experiment that reveals the state.

Lemma 5 (Squeezing Lemma). Fix optimal \(\sigma^*\) and \(h^s \in \mathcal{H}^s(\sigma^*)\). Then there exists optimal \(\bar \sigma\), \(\mathbb{P}_r^{\sigma^*} = \mathbb{P}_r^{\bar \sigma}\) for all \(r \leq s\) such that \(h^s \not\in \mathcal{H}^s(\bar \sigma)\).

See ¿sec:l:32squeezing32lemma32proof? for the proof. The pullback and squeezing lemmas give an iterative procedure we can use to obtain an optimal strategy where transfers are a last resort. Start with some optimum \(\sigma_0\). Define the function \[\varphi(\sigma) = \inf\{t : \mathcal{H}^t(\sigma) \neq \varnothing\}.\] If \(\varphi(\sigma_0) < \infty\), start from \(\varphi(\sigma_0)\) and apply the squeezing lemma to each history \(h^{\varphi(\sigma_0)} \in \mathcal{H}^{\varphi(\sigma_0)}(\sigma_0)\). This will induce some optimal \(\sigma_1\) where \(\varphi(\sigma_1) > \varphi(\sigma_0)\). If now \(\varphi(\sigma_1) < \infty\), again apply the squeezing lemma to all histories \(h^{\varphi(\sigma_1)} \in \mathcal{H}^{\varphi(\sigma_1)}(\sigma_1)\). Continuing in this way, one of two things must happen.

  1. For some finite \(N\), \(\varphi(\sigma_N) = \infty\). Then \(\sigma_N\) is an optimal strategy where transfers are a last resort, and we are done.

  2. There is a strictly increasing sequence \(\{\varphi(\sigma_k)\}_{k = 1}^\infty\) where \(\{\sigma_k\}_{k = 1}^\infty\) are optimal.

In the latter case, a compactness argument gives a convergent subsequence \(\{\sigma_{k_n}\} \subset \{\sigma_k\}\) in the strong topology. Let \(\sigma_\infty\) be this subsequential limit. Since optimal strategies are closed (because \(\mathbb{P}^\sigma\) is continuous in \(\sigma\)), \(\sigma_\infty\) is also optimal. But \(\varphi(\sigma_\infty) = \infty\) must be true, and so \(\sigma_\infty\) is an optimal strategy where transfers are a last resort, and we are done. ◻

3.2 Effectiveness Ratios↩︎

Theorem 1 is an existence result: it states that there exists an optimum where transfers are a last resort. However, this need not be the only optimum—there may also be ones where Sender pays but does not give information (because the cost of information provision and transfers are perfectly substitutable at the steepest part of the Pareto frontier). The goal of this section is to give a (partial) answer to the following strengthening of Theorem 1: if transfers are used, are there any beliefs which cannot be supported at any time by any optimal strategy once payments have started? To do so, I will make a mild assumption on payoffs and specialize to the i.i.d. case.

First, for the effectiveness ratio to be well-defined, it must be that the denominator is always nonzero; I assume this directly.

Definition 2. Full-information is strictly optimal for Receiver if \[\mathbb{E}_{\tau^{FI}}\left[ \mathbb{E}_\mu[u(a^*(\mu, \mathbf{0}), \theta)] \right] > \mathbb{E}_{\tau}\left[ \mathbb{E}_\mu[u(a^*(\mu, \mathbf{0}), \theta)] \right]\] for any \(\tau \neq \tau^{FI}\).

This is satisfied if, for example, Receiver’s full-information best-response correspondence \(a^*(\theta)\) is single-valued and injective in \(\theta\). Given this definition, I can define the effectiveness ratio of a belief, which will be the relevant object to rule out beliefs once transfers are used.

Definition 3. For any effectiveness ratio to be \[E(\mu) = \min_{a' \in A}\left\{\frac{\mathbb{E}_\mu[v(a^*(\delta_\theta, \mathbf{0}), \theta) - v(a', \theta)]}{\mathbb{E}_\mu[u(a^*(\delta_\theta, \mathbf{0}), \theta) - u(a^*(\mu, \mathbf{0}), \theta)]}\right\}.\]

\(\bar E = \sup_{\mu \in \Delta(\Theta)^o} E(\mu)\). Let \(\mathcal{E}(k) = \{\mu \in \Delta(\Theta)^o : E(\mu) > -k\}\) be the set of all beliefs whose effectiveness ratio is strictly greater than \(k\). Here, \(a^*(\mu, \mathbf{0}) \in \mathop{\mathrm{\arg\!\max}}_{a \in A} [u(a, \theta)]\), abusing notation to refer to the correspondence as a function when there is no loss of ambiguity.

Proposition 3. Suppose the state is drawn i.i.d. from \(\mu_0 \in \Delta(\Theta)^o\)and full-information is strictly optimal for Receiver. Then for any optimal \(\sigma\) and \((h^t, \mu, a) \in \mathbb{P}^\sigma\), if \(t(h^t)(\mu, a) > 0\), then \(\text{supp}(\tau(h^s; \sigma)) \subset \left(\mathcal{E}(k)^c \right)\) for all \(h^s \succ (h^t, \mu)\).

Proof. The proof relies on the following lemma. I defer the (technical) proof to the ¿sec:l:32belief32effectiveness32ratio32proof?

Lemma 6. Fix interior belief \(\tilde{\mu}\) with \(E(\tilde{\mu}) > -k\) and optimal \(\sigma\). If \(\tilde{\mu} \in \text{supp}(\tau(h^s; \sigma))\) for on-path \(h^s\), then \(t(h^t; \sigma)(\mu, a) = 0\) for all \(h^t\) where \(h^s \succ h^t\) and \(\mu \in \text{supp}(\tau(h^t; \sigma))\).

Suppose the result is false. Then there exists optimal \(\sigma\) and interior \(\tilde{\mu} \in \mathcal{E}(k)\) is induced at some history \(h^s\). Thus there exists \(h^t\), \(h^s \succ h^t\) and beliefs \((\tilde{\mu}, \mu)\) where \(\tilde{\mu} \in \text{supp}(\tau(h^s; \sigma))\), \(h^s \succ (h^t, \mu, a)\), and \(t(h^t; \sigma)(\mu, a) > 0\). But \(E(\tilde{\mu}) > -k\). This contradicts Lemma 6. ◻

Proposition 3 shows that the effectiveness ratio of a belief is sufficient to rule out the belief being induced at any optimum where transfers are used. Note the only exception are beliefs which are degenerate: these are cases where Sender would like to give more information, but cannot because of the law of total probability. In this case the effectiveness ratio can still be high at degenerate beliefs, even though they are never used. What does the set of beliefs that are ruled out look like? Proposition 4 gives a weak characterization of these sets.

Proposition 4. \(\mathcal{E}(k)\) is convex and \(\mathcal{E}(k') \subset E(k)\) whenever \(k < k'\).

The proof follows from some straightforward computations (these can be found in the ¿sec:p:32Ek32convex32proof? ) The result should be viewed as a comparative statics result: as transfers become relatively cheaper, the set of beliefs that are ruled out becomes larger in a convex way, so that “more extremal” beliefs are the only ones that remain consistent with payments. Note as \(k \to \infty\), only the degenerate beliefs (i.e. the extreme points of the probability simplex) can be induced.

Characterizing which beliefs are inconsistent with optima also begets a related question: which actions are inconsistent with optimal behavior? As in the case with beliefs, a key sufficient condition is pinned down by an ex-post version of the effectiveness ratio:

Definition 4. Define the feasibly optimal set \(\mathcal{K}\) to be \[\left\{a : \not\exists a' \neq a, \min_{\theta \in \Theta} \left\{ v(a', \theta) - v(a, \theta)\right\} > k \max_{\theta} \left\{u(a, \theta) - u(a', \theta)\right\} \right\}.\]

The feasibly optimal set is the set where across all possible states the benefit of Rearranging and taking expectations, an action \(a\) is feasibly optimal if and only if it is not true for any \(\mu \in \Delta(\Theta)\) and \(a' \neq a \in A\) that \[\mathbb{E}_\mu\left[ \frac{v(a', \theta) - v(a, \theta)}{u(a', \theta) - u(a, \theta)} \right] > -k\] which highlights the relationship between feasibly optimal actions and belief effectiveness ratios. Feasibly optimal actions are thus named because of the following proposition.

Proposition 5. Suppose \(h^t \in \text{supp}(\mathbb{P}^\sigma)\). Then for all \(a \in \alpha(h^t)(\mu)\), \(\mu \in \text{supp}(\tau(h^t))\), \(a \in \mathcal{K}\).

The formal proof is relegated to ¿sec:p:32feasibly32optimal32actions32proof? The basic idea is that so long as an action is not in the feasibly optimal set, than Sender can always benefit by inducing (instead of it) an action which keeps it from being in that set, at the cost of paying Receiver to take that action instead. Proposition 5 is easy to check in the case with state-independent actions, where it gives a simple characterization of when transfers are used on the path of play.

Corollary 1. Let \(a_1 = \mathop{\mathrm{\arg\!\max}}_{a \in A} v(a)\) be Sender’s favorite action. Then at every history \(h^t\), either \(\mathcal{V}(h^t | \sigma) = v(a_1)\) or \(\mathbb{E}^{\mathbb{P}^\sigma}[t(h^t)(\mu, a) | h^t] > 0\).

Proof. Suppose the continuation value is not Sender’s first best. Clearly \(\mathcal{K} = \{a_1\}\). By Proposition 5, \(a_1\) must be the only action supported at every history. But this requires either promising the agent future information (making supporting \(a_1\) in the future more expensive) or simply paying them today. Thus, payments must eventually realize on path. ◻

Corollary 1, while straightforward to state, can be applied in several ways. First, it implies that so long as transfers are sufficiently cheap, they will always be used on the path of play so long as there is room for Sender or Receiver to be motivated. This implies, for example, that transfers will be used at all histories in the rideshare example. Second, it implies that so long as there is an action which is worth being motivated for, then transfers will be used at some point in the future.

4 Valuing Persuasion↩︎

4.1 The Myopic Case↩︎

Consider first the case where \(\delta = 0\) so Sender and Receiver are fully myopic. In this case, it is without loss to focus on only the first period, and so Sender’s strategy is simply a transfer function \(t: \Delta(\Theta) \times \mathcal{A} \to \mathbb{R}_+\) and an experiment \(\tau \in \Delta(\Delta(\Theta))\) which is \(\mu_0\)-Bayes plausible. For any transfer rule \(t\), define Receiver’s best response (breaking ties in favor of Sender) to be \[a^*(\mu, t) = \mathop{\mathrm{\arg\!\max}}_{a \in a^\dagger(\mu, t)} \left\{ \mathbb{E}_\mu[v(a, \theta) - k t(\mu, a) \right\} \text{ where } a^\dagger(\mu, t) = \mathop{\mathrm{\arg\!\max}}_{a \in \mathcal{A}} \left\{ \mathbb{E}_\mu[u(a, \theta) + t(\mu, a)] \right\}\] where we appeal to Lemma 1 to simplify the space of transfers. As before, let \(a^*(\mu, \mathbf{0})\) to be Receiver’s optimal action in the absence of transfers.

Sender’s problem is now \[\max_{\tau \in \Delta_{\mu_0}(\Delta(\Theta)), t} \left\{ \mathbb{E}_\mu[v(a^*(\mu, t), \theta) - k t(\mu, a))] \right\}.\] Maximizing first over transfers and then information policies, and applying the concavification theorem of [1] implies Sender’s value from persuasion can be written as \[V^*(\mu_0) = \text{cav}|_{\mu_0}\left( \max_t \mathbb{E}_\mu[v(a^*(\mu, t), \theta) - k t(\mu, a))] \right)\] where transfers are maximized belief-by-belief and the concavification is evaluated at \(\mu_0\). This however can still be a potentially complicated object, because the optimal transfer rule must be found belief-by-belief, and the resulting indirect utility function will need to be concavified (which is itself potentially a difficult procedure). It turns out however, that the linearity of the transfers gives a simple characterization of \(V^*(\mu_0)\) as the linear interpolation of finitely many points. To state it, I require a few more definitions.

Definition 5. Define the transfer augmented value function to be \[V^t(\mu) = \max_{a \in \mathcal{A}} \left\{ \underbrace{\mathbb{E}_\mu[v(a, \theta) + k u(a, \theta)]}_{\text{\textcolor{budgreen}{Augmented Total Surplus}}} \right\} - \underbrace{k \mathbb{E}_\mu[u(a^*(\mu, \mathbf{0}), \theta)]}_{\textcolor{burgundy}{\text{Receiver Outside Option}}}.\]

Implicitly, \(V^t(\mu)\) specifies realized payments on path as exactly those which make Receiver indifferent between the surplus maximizing action and her own surplus-maximizing action. It will turn out that this class of transfers are exactly those used at the optimum (see the proof of Proposition 6). Modulo transfers, I next need to characterize the beliefs supported by any optimal information policy.

Definition 6. Let \(\mathcal{O}_a = \{\mu : a \in a^\dagger(\mu, \mathbf{0})\}\). \(\mu\) is extremal if it is an extreme point of \(\mathcal{O}_a\).

Extremal beliefs are those which make the agent maximally indifferent, and have been shown to be useful in a variety of other distinct persuasion problems (see [36] and [37]). In this setting, they are useful because they are the “supporting” points of the concavified value function:

Definition 7. Fix a set \(\mathcal{K} \subset \Delta(\Theta)\). The \(\mathcal{K}\)-cavification of a function \(f: \Delta(\Theta) \to \mathbb{R}\), denoted \(f^{\mathcal{K}}\), is the smallest concave function such that \(\text{cav}^{\mathcal{K}}(f)(\mu) \geq f(\mu)\) for all \(\mu \in \mathcal{K}\).

Proposition 6. There exists a finite set of extremal beliefs, \(\mathcal{K}\), such that \(\text{cav}^{\mathcal{K}}(V^t)(\mu) = V^*(\mu)\) for all \(\mu \in \Delta(\Theta)\).

Proposition 6 is proven in the ¿sec:p:32k-cavification32proof? Note even though the transfer-augmented function has Receiver take surplus maximizing actions at any belief, Sender need not fully reveal the state, in contrast to standard intuition. This is because of the limited liability constraint: there may exist beliefs where Sender would prefer to charge Receiver if they could at the surplus maximizing action, but cannot. Consequently, Sender may do better by pooling that belief with one where Receiver is given less surplus (i.e. made maximally indifferent) to persuade Receiver to take actions which are more beneficial to Sender. Hence the ability to persuade can be strictly useful even when full information is ex-post efficient so long as Sender faces a limited liability constraint.

Proposition 6 simplifies the process of finding the transfer-augmented concavification in two ways. First, it shows that it is without loss to focus on a simple class of transfers, and gives an economic intuition for those transfers. Second, it shows that one need only focus on extremal beliefs when computing the optimal persuasion value. As \(k \to \infty\), so that transfers are never used, this implies the following simple fact about finite persuasion models10.

Corollary 2. For any finite \(A\), \(\text{cav}(\mathbb{E}_\mu[v(a^*(\mu, \mathbf{0}, \theta))])\) is piecewise affine and continuous.

4.1.0.1 A \(\mathcal{K}\)-Cavification Example.

I give an example to highlight why the \(\mathcal{K}\)-cavification substantially simplifies the problem of finding the concavified transfer-augmented value function.

Consider the interaction between a manager and an employee who must choose between two differentiated projects to pursue. The two projects are differentiated horizontally by the employees’ fit for the project (\(\theta \in \{\theta_0, \theta_1\}\))11, which the manager privately observes. The manager has a (known) preferred project \(a_1\) that they would always like the employee to work on, regardless of the characteristics of the other project \(a_0\)12. However, the employee would like to work on whichever project is a better fit for them (e.g. \(a_0\) at \(\theta_0\) and \(a_1\) at \(\theta_1\)), though ex-ante they cannot observe the project’s fit (for example, because the manager can conceal information about each project before they decide which one to join). The employee also always has an outside option, \(a_2\) (e.g. transferring to a different project), which guarantees some nonzero safe payoff, which is the worst option for the manager. An example of payoffs capturing the above scenario are given below.

Table 1: No caption
S/R \(a_0\) \(a_1\) \(a_2\)
\(\theta_0\) \(0,1\) \(2.5,-2\) \(-0.5, 0\)
\(\theta_1\) \(0,-2\) \(2.5,1\) \(-0.5, 0\)

Before the interaction, the worker is biased against the manager—they believe the state is likely to be \(\theta_0\) with prior belief \(\mu_0 = \mu_0(\theta = \theta_1) = \frac{1}{6}\). The manager’s indirect value function takes the form \[V(\mu) = \begin{cases} 0 \text{ for } \mu \in [0, \frac{1}{6}) \\ - \frac{1}{2} \text{ for } \mu \in [\frac{1}{6}, \frac{2}{3}) \\ \frac{2}{3} \text{ for } \mu \in [\frac{2}{3}, 1] \end{cases}\] The extremal beliefs are then simply \(\{0, \frac{1}{3}, \frac{2}{3}, 1\}\). At \(0\), it is optimal to pay nothing, as is the case at \(1\), and let the worker act optimally. A computation shows it is optimal to pay to induce action \(a_1\) at \(\frac{1}{3}\), but again pay nothing at \(\frac{2}{3}\). These four simple computations are enough to globally compute the value of transfers and persuasion, both at \(\mu_0 = \frac{1}{6}\) and at any arbitrary prior belief. The result is graphed below.

Figure 1: Value with Transfers

Given the \(\mathcal{K}\)-cavification procedure, it is now easy to see that the manager values persuasion (and transfers) jointly at \([0, \frac{2}{3})\), and their joint value outperforms their value from only persuasion on that same interval. Absent computing the extremal beliefs, this would have required first computing \(V^t(\mu)\) belief-by-belief and then concavifying the resulting (potentially complicated) function.

Suppose there is an equilibrium where payments and persuasion are both used (as at \(\mu_0 = \frac{1}{6}\) in the example). As \(\delta \to 0\), upper hemi-continuity of the best response then implies that there cannot be optima for arbitrarily small \(\delta\) where transfers are used only at histories where the state is fully revealed. Why does this not contradict Theorem 1? Note that in the statement of Theorem 1 there is possibly one period of indeterminancy where transfers are used but the state is not yet fully revealed: full revelation need to occur only after transfers are positive at successor histories. In the static case, that period of indeterminancy is exactly the first period.

4.2 The Patient Ergodic Case↩︎

In general, optimal contracts can be difficult to explicitly characterize because the interplay between the (Markovian) dynamics and intermediate discount rates make the self-generating set intractable to compute outside special cases.13 However, in some simple settings, significant progress can be made in characterizing the value of the optimal policy, in line with the [1] approach of finding the value of persuasion instead of the optimal policy. In this subsection, I give some general bounds on the value of persuasion in the patient case.

Towards doing so, define the set \[\Gamma(u, \mu) = \left\{(\gamma, m) : \mathbb{E}_\gamma[u(a, \theta)] + m \geq u \text{ and } \gamma|_{\Theta} = \mu \right\}\] to be all couplings over states and actions \(\gamma\) and total payments \(m\) where Receiver secures a payoff of at least \(u\), and which satisfy \(\mu\)-Bayes plausibility. Suppose moreover that \(T\) is irreducible and aperiodic with ergodic distribution \(\mu^\infty\). Recall \[\underline U(\mu^\infty) = (1 - \delta)\sum_{t = 0}^\infty u(a^*(\mu^\infty, \mathbf{0}), \theta)\] is the value of Receiver’s outside option at the ergodic distribution \(\mu^\infty\). \(\Gamma(\underline U(\mu^\infty))\) plays an important role in bounding the value of persuasion in Proposition 7. To operationalize the bound, I need one more definition.

Definition 8. \(\sigma\) is stationary if, for all \(h^t, h^s \in \text{supp}(\mathbb{P}^\sigma)\), \(\sigma(h^t) = \sigma(h^s)\).

Proposition 7. Let \(M\) be irreducible and aperiodic with ergodic distribution \(\mu^\infty\). Then \[\lim\limits_{\delta \to 1} \mathcal{V}^*(\mu_0, \delta) \geq \sup_{(\gamma, m) \in \Gamma(\underline U(\mu^\infty), \mu^\infty)} \left\{ \mathbb{E}_\gamma[v(a, \theta) - km] \right\}\] Moreover if there is an optimal stationary \(\sigma^*\), then the inequality holds with equality.

The formal proof can be found in the ¿sec:p:32persuasion32upper32bound32proof? The intuition uses the exponential concentration of distributions to their steady state, regardless of the initial distribution to show that as \(\delta \to 1\), “essentially” only Receiver incentive compatibility at the ergodic steady-state matters. Thus, if (and only if) Receiver prefers the induced joint distribution to their no-information payoff, than some stationary distribution leading to \(\gamma\) is implementable. The upper bound can be seen as implying the individual rationality constraint characterizes the value of optimal stationary contracts14 exactly. However, outside of special cases, knowing when there is an optimal stationary contract can be difficult.

This result is related to the Receiver individual rationality constraint of [18], who characterize the equilibrium payoff set of repeated communication games. There are two important differences. First, my Sender has commitment and hence the set of feasible payoffs is larger, leading to a simpler and cleaner characterization of the best stationary upper bound (they require additional conditions to characterize exactly the payoff set). Second, the addition of transfers modifies Receiver constraints to allow for all joint couplings and transfer profiles where Receiver individual rationality hold. Perhaps surprisingly, this lower bound is independent of transfers insofar as they affect the details of Receiver incentive compatibility—transfers augment the payoff set exactly by allowing additional “lump sum” payments that relax Receiver’s individual rationality constraint at the ergodic distribution.

Bounding Sender’s optimal payoff can also shed qualitative light on the nature of dynamic persuasion away from Receiver’s individual rationality constraint. First, it implies a sufficient condition for Sender to strictly benefit from dynamics is that Receiver’s incentive compatibility constraint is not binding.

Corollary 3. Let \(\gamma\) be the coupling concavifying \(V^t(\mu_0)\). If \(\mathbb{E}_\gamma[u(a, \theta)] > 0\), then \(\mathcal{V}^*(\mu_0, \delta) > \text{cav}(V^t(\mu_0))\) unless \(\text{cav}(V^t(\mu_0)) = \mathbb{E}_{\mu_0}[\max_{a \in A}[v(a, \theta)]]\).

Corollary 3 contrasts with Theorem 1 of [13] and highlights the value of the obedience-based characterization in Proposition 7 over the recursive concavification approach. In particular, unlike [13], I can give easy to verify conditions under which Sender benefits from the ability to dynamically persuade; moreover, these conditions hold for all \(\delta\), and are independent of the underlying Markov chain, and hence are easy to verify.

Second, recall in a Markovian persuasion problem providing information to Receiver can have two effects: a backwards-looking dynamic incentivization effect, where information today is provided as a reward for taking Sender-favorable actions in past periods, and a forward-looking future informativeness effect, where information today is “sticky” and will affect the efficacy of persuasion tomorrow by shifting where Receiver’s posterior belief will be. [15] shut down the dynamic incentivization effect (by considering only myopic Receivers) and claim the static persuasion payoff is an upper bound to Sender’s value from dynamic persuasion, even as the discount rate tends towards \(1\). In contrast, the following holds in the general model. Here, \(\text{cav}(V^t)(\mu)\) is the static \(\mathcal{K}\)-cavification from Proposition 6.

Corollary 4. Let \(M\) be irreducible and aperiodic with prior equal to the ergodic distribution, \(\mu_0 = \mu^\infty\). Then \[\lim\limits_{\delta \to 1} \mathcal{V}^*(\mu_0, \delta) \geq \text{cav}(V^t)(\mu_0).\]

Proof. Let \((\hat{\tau}, \hat{t}, \hat{\alpha})\) be the induced profile at the \(\mathcal{K}\)-cavification. By Lemma 7, \(\mathbb{E}_{\hat{\tau}}[u(a^*(\mu, \mathbf{0}), \theta)] = \mathbb{E}_{\hat{\tau}}[u(\alpha(\mu), \theta) + t(\mu))]\). But also \(\mathbb{E}_{\hat{\tau}}[u(a^*(\mu, \mathbf{0}), \theta)] \geq \mathbb{E}_{\mu_0}[u(a^*(\mu_0, \theta))]\) by Blackwell’s theorem. Thus the \((\gamma, m)\) induced by \((\hat{\tau}, \hat{t}, \hat{\alpha})\) is in \(\Gamma(\underline U(\mu^\infty), \mu^\infty)\). Proposition 7 finishes the proof. ◻

In the case where \(k \to \infty\), so that the \(\mathcal{K}\)-cavification and concavification coincide, Corollary 4 implies dropping the myopia assumption on Receiver implies static persuasion moves from an upper bound into a lower bound. Economically, Corollary 4 implies that the dynamic incentivization effect completely swamps the information effect, under exactly their assumptions. I see this result as shedding some light on the role that myopia plays on disciplining incentive effects in dynamic persuasion and as a cautionary tale against the myopia assumption when dynamics are likely to be first order.

5 Optimal Loyalty Contracts↩︎

Suppose Sender faces multiple decision problems similar to that studied by the introductory example of [1] in each period. Formally, suppose Sender’s problem can be written as \[\left\{\prod_{i = 1}^n \Theta_i, \prod_{i = 1}^n A_i, \sum_{i = 1}^n u_i, \sum_{i = 1}^n v_i, \prod_{i = 1}^n \mu_0^i \right\}\] where \(\Theta_i = A_i = \{0, 1\}\), \(\mu_0^i \in (0, \frac{1}{2})\) is drawn independently across both dimensions \(i\) and is drawn i.i.d. across time, and \(u_i = \mathbf{1}\left\{\theta_i = a_i\right\}\), \(v_i = c_i \mathbf{1}\{a_i = 1\}\). Here, \(\{c_i\}\) is some collection of nonnegative real numbers. Throughout, we suppose there exists some \(c_i > k\) so that the incentive problem is nontrivial (otherwise, the static persuasion solution is optimal).

The specific model formulation is inspired by the rideshare example undergirding the introduction. In each period, Uber (Sender) must assign one of many rides of heterogenous value to a driver, who cares only about the fundamental of the ride itself (rejecting a bad ride and accepting a good one). In this case, Uber is both able to provide information within each period to affect the ex-ante probability of a driver accepting a bad ride, and also leverage dynamic information (in addition to payments) in order to incentivize the driver to accept rides the drive knows ex-post are bad.

While Theorem 1 gives a partial answer to the optimal way to trade off against transfers and information, it cannot give a characterization of the optimal form of dynamic information (in addition to transfers). In this simple case, however, the optimal contract can be explicitly derived and takes a simple form. Throughout the remainder of this section, let \(\tau_i^{BP}\) be the Bayesian persuasion optimal solution in each dimension, splitting beliefs into \(\{0, \frac{1}{2}\}\) with the appropriate Bayes plausible probabilities, and let \(\tau_i^{FI}\) reveal the state.

Definition 9. A tiered loyalty contract is a strategy \(\sigma\) and a sequence of increasing stopping times \(\{T_i^*\}_{i = 1}^n\), \(T_i \leq T_{i + 1}\) such that for \(\mathbb{P}^\sigma\)-almost all histories, \[\tau_i(h^t; \sigma) = \begin{cases} \tau_i^{FI} \text{ if } T > T_i^*(h) \\ \tau_i^{BP} \text{ if } T < T_i^*(h) \end{cases}.\]

Note there may be one period of indeterminancy, as there is with transfers. This is because of the integer programming problem when providing incentives. A tiered loyalty contract gradually unwinds Senders’ information advantage as a function of the history in a stark way: as the relationship matures, there is a discrete transition from the static Bayesian persuasion optimum to the This transition happens “dimension-by-dimension:” Sender slowly unwinds their advantage by transitioning from the static optima directly to full information in each dimension separately; consequently, each transition \(T_i^*\) can be thought of as a distinct “tier” in the relationship.

The main result of this section is that tiered loyalty contracts are exactly optimal.

Proposition 8. There is an optimal strategy which features transfers as a last resort and is a tiered loyalty contract.

The proof can be found in ¿sec:p:32loyalty32contracts32proof? The approximate intuition is as follows. First, note that the Pareto frontier is governed by the summary statistic \(\mathbb{P}(a_i = 1 | \theta_i = 0)\) at each time in each dimension, the rate at which persuasion is “successful.” For Sender to provide incentives via information, it must be that they are decreasing this probability, which gives Receiver a marginal unit of \(-1\) while costing Sender \(-c_i\). So long as this probability is interior (between the static optimum and full information), the Pareto frontier is linear, and hence dynamic incentives can be “backloaded” using a similar squeezing procedure as Lemma 5 to provide information in dimension \(i\) tomorrow instead of today. Hence incentives today are provided by promising full information in some dimension tomorrow, until eventually the informational advantage is unwoven and transfers come in as a last resort. The assumption there is a dimension where \(c_i > k\) finally guarantees that in each period, Sender wishes to provide intertemporal incentives to motivate Receiver to take action \(a_i = 1\) even when the state is bad.

Proposition 8 mirrors the loyalty contracts employed by Uber, who run a driver loyalty program with four tiers (“Blue,” “Gold,” “Platinum,” and “Diamond”) each of which have different benefits and differential access to information. I thus see my model as providing a potential explanation for Uber’s driver loyalty program. Alternatively, with two projects (like the project selection example succeeding Proposition 6 without a safe outside option), the time \(T_i^*\) can be thought of as promotion: once the worker has chosen the “bad-fit” project enough times, the manager gives them information (and thus discretion) to choose projects which the employee would prefer, even if doing so runs against the manager’s own interests.

6 Discussion↩︎

This paper studied a model of dynamic contracting where Sender can both access payments and persuasion as incentive tools to motivate Receiver. Theorem 1 characterizes the time structure of optimal transfers—payments occur on path only if accompanied by full information at every history afterwards. Consequently, the ability to dynamically intertwine transfers endows Sender with a “stock” of informational incentives that they can draw down on before turning to transfers—which, given the ability to jointly use both, are weakly less efficient at transferring utility than giving Receiver information. In simple settings—such as the example—this intuition translated to a driver loyalty program, where static persuasion remains optimal until the informational stock is exhausted, after which Sender fully reveals the state. Finally, I characterized the value of persuasion in several asymptotic regimes, and use this to comment on the existing literature on static and dynamic persuasion.

There are several natural extensions under which the result continues to hold. First, a modified Theorem 1 extends to any restricted set of experiments where there is a unique Blackwell maximal element. My backloading result should also extend to the case where transfers are nonlinear in cost but differentiable, with bounded derivative. Third, the result extends to cases where Sender’s discount rate is distinct from Receivers, payoffs evolve in every period, and the state is noisily observed, as discussed after Theorem 1.

Several questions about the model remain open and are fruitful areas for future research. First, information was free to generate in my model; I conjecture that for suitably defined costs of information (such that full information is not infinitely costly), an adapted version of the result should hold. Second, understanding the model under moral hazard seems like a natural enrichment of the model. Importantly, Lemma 4 may not hold anymore since the action is not perfectly monitored and hence it is not clear immediately what the right way to “pullback” utility would be at the exact optimum. Despite this, tools from the quota mechanisms literature could prove effective in characterizing approximately optimal contracts where payments are a last resort. Third, it is natural to consider the case with multiple Receivers, who are playing some type of coordination game against Sender (i.e. a dynamic version of [39] and [40] with [41] and [42]. Finally, characterizing explicitly dynamic contracts outside of the simple cases studied in the example remains an open question. Indeed, [15] call the simpler version of this question, without transfers, the “grand question” of dynamic information design.

Appendix A: Omitted Proofs↩︎

PROOF OF LEMMA 2↩︎

Proof. Fix any obedient strategy. At each on-path history, replace the message \(m\) with a distribution of random messages \(\{(\theta | m, a)\}_{a \in \text{supp}(\alpha)}\) where the probability of each message sent is exactly \(\alpha(a)\). Clearly this does not affect Receiver’s information or their continuation incentives and hence the realized belief is \(\theta | m\) and the recommended action is obeyed. Second, note decreasing transfers as much as possible at actions not induced on path can only weaken incentive compatibility constraints and help support a conjectured equilibrium profile. ◻

PROOF OF PROPOSITION 1↩︎

Proof. The first part is standard, noting that the maximizing set of the static problem is nonempty and that payoffs satisfy a transversality condition (by (BD)). The second follows if \(u'(\mu, a) = \mathcal{U}(\{h^t, \mu, a\})\), then the choice of future utility promise and payoffs today must be part of an optimum to the sequential formulation, as otherwise a profitable one-shot deviation exists in the sequential problem. ◻

PROOF OF PROPOSITION 2↩︎

Proof. We appeal to Blackwell’s sufficient conditions for a contraction mapping. Define an operator \(\mathcal{O}: \mathcal{B}([-C, C] \times \Delta(\Theta)) \to \mathcal{B}([-C, C] \times \Delta(\Theta))\), where \(\mathcal{B}(X)\) is the set of bounded functionals from \(X \to \mathbb{R}\). Let \(\mathcal{O}\) be defined by \[\begin{align} \mathcal{O}(V)(\bar u, \mu_0, \delta) \notag \\ = \max_{\{\tau, t, u', \alpha\}} \Big[ \mathbb{E}_\tau\Big[(1 - \delta)\,\mathbb{E}_{\mu, \alpha}&[v(a, \theta) - k t(\mu, a)] + \delta V(u'(\mu, a), M\mu, \delta)\Big] \Big] \notag \\[1em] \text{s.t.}\quad \mathbb{E}_{\mu, \alpha}[(1 - \delta) u(a, \theta) + \delta u'(\mu, a)] &\geq \mathbb{E}_\mu[(1 - \delta) u(a', \theta) + \delta \underline U(\mu)] \notag \\[-0.25em] &\forall \mu \in \text{supp}(\tau),\; a \in \text{supp}(\alpha(\mu)),\; a' \in A \notag \\[1em] \mathbb{E}_\tau[(1 - \delta)\,\mathbb{E}_{\mu, \alpha}[u(a, \theta)] + \delta u'(\mu, a)] &\geq \bar u \notag \\[1em] \mathbb{E}_\tau[\mu] &= \mu_0 \notag \\[1em] \max\{(1 - \delta) t(\mu, a),\, u'(\mu, a)\} &\in [-C, C] \quad \forall a, \mu \end{align}\] Take any \(W \geq V\); clearly monotonicity is satisfied. Similarly, any constant \(\beta < 1\) can be pulled out of the maximum without affecting the constraints and hence discounting is also satisfied. Blackwell’s sufficient conditions for a contraction mapping then imply that there is a unique fixed point of \(\mathcal{O}\), which is exactly the value function \(V\). Moreover, \(V\) is continuous, as the mapping \(\mathcal{O}\) restricted to the (closed) space of continuous functions maps into itself, because the objective and constraints vary continuously in \(\bar u, \mu_0\).

That \(V\) is nondecreasing in \(\bar u\) follows immediately from the fact that the set of feasible solutions is weakly decreasing in \(\bar u\) (it can only strictly tighten the promise keeping constraint). Concavity follows by noting that \(\mathcal{O}(V)(u, \mu_0, \delta)\) is concave in \(u\) whenever \(V\) is concave since the convex combination of a feasible solution at \(u\) and \(u'\) is feasible at \(\alpha u + (1 - \alpha)u'\) (as the constraints are linear in the relevant arguments). Hence, \(\mathcal{O}\) maps the set of concave functions to itself, and because the space of concave functions is a closed subset (in the supremum norm) of all functions, we can again apply Banach’s fixed point theorem as before.

Finally, the bound on its right derivative. Fix any \(\mu\), \(u\), and \(\varepsilon > 0\), and let \(\{\tau, t, u', \alpha\} \in \mathcal{F}(u, \mu)\). Note \(\{\tau, t + \frac{\varepsilon}{1 - \delta}, u', \alpha\}\) is feasible at \((u + \varepsilon, \mu)\) and gives Sender a payoff of \(V(u, \mu) - k\varepsilon\) exactly. Thus, \(V(u + \varepsilon, \mu) > V(u, \mu) - k\varepsilon\). Then \[\lim\limits_{\varepsilon \to 0_+} \frac{V(u + \varepsilon, \mu) - V(u, \mu)}{\varepsilon} \geq \lim\limits_{\varepsilon \to 0_+} \frac{V(u, \mu) - k\varepsilon - V(u, \mu)}{\varepsilon} = \lim\limits_{\varepsilon \to 0_+} -\frac{k\varepsilon}{\varepsilon} = -k\] as desired. This finishes the proof. ◻

PROOF OF LEMMA 3↩︎

Proof. The first statement. For any \((u, \mu_0)\) and \(\{\tau, t, u', \alpha\} \in \mathcal{F}(u, \mu_0)\), and \(\varepsilon > 0\), define \(\tilde{V}(u + \varepsilon, \mu)\) be the value for Sender by choosing at inputs \((u + \varepsilon, \mu_0)\) the (feasible) tuple \(\{\tau, t, u' + \frac{\varepsilon}{\delta}, \alpha\}\). This tuple strictly increases Receiver utility to meet the higher utility promise without otherwise affecting any IC constraints, so in particular it must do weakly worse than the optimum: \(V(u + \varepsilon, \mu_0) \geq \tilde{V}(u + \varepsilon, \mu_0)\).

Suppose now the lemma is false, so \(V_+'(u'(\mu, a), \mu) > -k\) for some \(u'(\mu, a)\) where \(\mu \in \text{supp}(\tau)\), \(a \in \text{supp}(\alpha)\). By construction, we have \[\begin{align} V_+'(u, \mu_0) = \lim\limits_{\varepsilon \to 0} \frac{V(u + \varepsilon, \mu_0) - V(u, \mu_0)}{\varepsilon} \geq \lim\limits_{\varepsilon \to 0} \frac{\tilde{V}(u + \varepsilon, \mu_0) - V(u, \mu_0)}{\varepsilon}. \end{align}\] Moreover, by construction, one has for any \(\varepsilon > 0\) that \[\tilde{V}(u + \varepsilon, \mu_0) - V(u, \mu_0) = \delta \mathbb{E}_{\tau, \alpha} \left[ V\left(u'(\mu, a) + \frac{\varepsilon}{\delta}, \mu\right) - V(u'(\mu, a), \mu) \right].\] Putting these two expressions together and utilizing the fact \(V_+'(u'(\mu)) > -k\) implies \[\begin{align} V_+'(u, \mu_0) & \geq \lim\limits_{\varepsilon \to 0} \frac{\tilde{V}(u + \varepsilon, \mu_0) - V(u, \mu_0)}{\varepsilon} \\ & = \lim\limits_{\varepsilon \to 0} \frac{\delta}{\varepsilon} \mathbb{E}_{\tau, \alpha} \left[ V\left(u'(\mu, a) + \frac{\varepsilon}{\delta}, \mu\right) - V(u'(\mu, a), \mu) \right] \\ & = \mathbb{E}_{\tau, \alpha} \left[ V_+'(u'(\mu, a), \mu) \right]> -k \end{align}\] where the penultimate inequality interchanges the (finite) expectation and evaluates the right derivative explicitly. The final inequality uses the fact that \(V_+'(\cdot, \mu) \geq -k\) everywhere with strict inequality (by assumption) for at least one \((\mu, a) \in \text{supp}(\tau, \alpha)\). This contradicts our assumption about \(V_+'(u, \mu_0)\).

The second statement. Suppose not. Then we can find tuple \(\{\tau, t, u', \alpha\} \in \mathcal{F}(u, \mu_0)\) at some \((u, \mu_0)\) where \(t(\mu, a) > 0\), \(\mu \in \text{supp}(\tau)\) \(a \in \text{supp}(\alpha)\), but \(V_+'(u'(\mu, a), \mu) > -k\). Recall Receiver’s incentive compatibility constraint when recommended \(a\) at \(\mu\) takes the form \[(1 - \delta) \mathbb{E}_\mu[u(a, \theta) + t(\mu, a)] + \delta u'(\mu, a) \geq (1 - \delta)\mathbb{E}_\mu[u(a', \theta), \theta)] + \delta \underline U(\mu)\] Consider now the alternative compensation scheme \(\{\tau, \tilde{t}, \tilde{u}, a\}\) which leaves the information and action recommendations unchanged by changing the compensation scheme only at \(\mu\): for some \(\varepsilon > 0\) (restricted to be small enough that \(\tilde{t}(\mu)\) remains positive), set \(\tilde{t}(\mu, a) = t(\mu, a) - \varepsilon\), and set \(\tilde{u}(\mu, a) = u(\mu, a) + \frac{1 - \delta}{\delta} \varepsilon\) where otherwise \(\tilde{t} = t\) and \(\tilde{u} = u\). Then by construction \[(1 - \delta)\tilde{t}(\mu) + \delta \tilde{u} = (1 - \delta)t(\mu) + \delta u\] which implies Receiver’s incentive compatibility constraints are unperturbed under \(\{\tau, \tilde{t}, \tilde{u}, a\}\) (as is their promise-keeping constraint). Hence, \(\{\tau, \tilde{t}, \tilde{u}, a\}\) is feasible at \((u, \mu_0)\). Whenever belief \(\mu\) is realized and \(a\) is recommended, however, Sender’s payoff differs by a value of \[\begin{align} (1 - \delta)(\tilde{t}(\mu, a) - t(\mu, a)) + \delta(V(\tilde{u}(\mu, a), \mu) - V(u(\mu, a), \mu)) \\ = (1 - \delta)k\varepsilon + \left[ \delta V\left( u(\mu, a) + \frac{1 - \delta}{\delta} \varepsilon, \mu \right) - \delta V(u(\mu, a), \mu)\right]. \end{align}\] Dividing through by \(\varepsilon\) and taking the limit as \(\varepsilon\to 0\) then yields that the change in Sender’s payoff is given by \[\lim\limits_{\varepsilon \searrow 0} \left( (1 - \delta) k + \delta \frac{V\left( u(\mu, a) + \frac{1 - \delta}{\delta} \varepsilon, \mu \right) - V(u(\mu, a), \mu)}{\varepsilon} \right) = (1 - \delta) k + (1 - \delta) V_+'(u(\mu, a)).\] Since \(V_+'(u) > -k\) by assumption, there exists some \(\varepsilon > 0\) at which this change in utility is strictly positive, and hence using \((\tilde{t}, \tilde{u})\) instead of \((t, u)\) as payment at this history is a profitable deviation. This contradicts the assumption \(\{\tau, t, u', \alpha\} \in \mathcal{F}(u)\). ◻

PROOF OF LEMMA 4↩︎

Proof. Start from \(u\)-constrained optimal \(\sigma\) and define \(\bar \sigma\) as follows. Let \(\tau^{FI}\) be the experiment that reveals the state, and suppose \(\bar \tau(h^t; \bar \sigma) = \tau^{FI}\) always. Suppose Sender recommends some action for sure \(\bar \alpha(\delta_\theta; \bar \sigma) = \delta_{a(\theta)}\), where \(a(\theta) \in \mathop{\mathrm{\arg\!\max}}_{a \in A} u(a, \theta)\) is some action that maximizes Receiver’s utility at that state, and which promises Receiver \(u(\delta_\theta, a) = u^{RFI}(\delta_\theta)\) in the future at all recommended actions (and \(\underline U(\delta_\theta)\) otherwise) continuation utility otherwise). Suppose moreover that payments never occur on the path of play, \(\bar t \equiv 0\). Clearly \(\{\bar \tau, \bar t, \bar \alpha\}\) is a feasible strategy at any \(u \leq u^{RFI}(\mu_0)\) and gives Receiver a payoff of exactly \(u^{RFI}(\mu_0)\).

Now define a strategy \(\sigma'\) as follows. Still always reveal the state at each history, and let \(\alpha'\) be defined as follows: for each state \(\theta\) and on-path \(h^t\), set \[\alpha'(h^t; \sigma')(\delta_\theta) = \mathbb{Q}_t^{\sigma}(\theta)\] and let transfers \(t'\) be such that for each \(a' \in \text{supp}(\alpha'(h^t)(\delta_\theta; \sigma'))\), \[t(h^t; \sigma')(\delta_\theta, a') = \max_{a \in A} u(a, \theta) - u(a', \theta)\] be the difference in payoffs between following the recommendation at \(\bar \sigma\) and under \(a'\). Note that at every history, Receiver’s stage game payoff under \(\{\bar \tau, t', \alpha'\}\) is given by \[u(a', \theta) + t(h^t; \sigma') = \max_{a \in A} u(a, \theta)\] given our choice of transfers. But this is their full-information payoff and hence they obtain \(u^{RFI}(\mu_0)\); obedience of \(\sigma'\) then follows from obedience of \(\bar \sigma\), which follows since \[(1 - \delta) \max_{a \in A} u(a, \theta) + \delta u^{RFI}(\mu_0) \geq (1 - \delta) u(\tilde{a}, \theta) + \underline U(\delta_\theta)\] for any \(\tilde{a}\) as the terms in \((1 - \delta)\) on the left are greater by definition and the term on the left against \(\delta\) is greater by Blackwell’s theorem.

We thus now have a profile \(\{\bar \tau, t', \alpha'\}\) that, by construction, satisfies \(\mathbb{Q}_t^\sigma = \mathbb{Q}_t^{\sigma'}\) at every possible history and where the state is fully revealed, i.e. properties (1) and (2). Moreover, it gives (by construction) Sender the same expected payoff absent transfers. How much must Sender pay Receiver in expectation under \(t'\)? Note by construction payments are exactly equal to \[(1 - \delta)\sum_{t = 0}^\infty \delta^t \mathbb{E}_{Q_t^{\sigma}} \left[ \max_{a \in A} u(a, \theta) - u(a', \theta) \right] = u^{RFI}(\mu_0) - \mathcal{U}(\sigma),\] where we use the fact \[(1 - \delta)\sum_{t = 0}^\infty \delta^t \mathbb{E}_{Q^\sigma}[\max_{a \in A} u(a, \theta)] = (1 - \delta)\sum_{t = 0}^\infty \delta^t \mathbb{E}_{\theta | t}[\max_{a \in A} u(a, \theta)] = (1 - \delta)\sum_{t = 0}^\infty \delta^t \mathbb{E}_{M^t\mu_0} \max_{a \in A} u(a, \theta)\] because beliefs are a martingale so the ex-ante expected distribution of states at time \(t\) is exactly \(M^t\mu_0\). This implies that Sender’s value \[\mathcal{V}(\sigma') = \mathcal{V}(\sigma) - k\left(u^{RFI}(\mu_0) - \mathcal{U}(\sigma) \right)\] and thus implies \(\sigma'\) satisfies the third property.

Finally, the fourth property follows immediately by noting that because \(V_+'(u, \mu_0)\) is in the steepest (and hence linear from the right) part of the Pareto frontier by assumption, any contract which transfers utility at rate at least \(k\) (which is the third property) must be optimal. Taking \(\sigma' = \sigma^*\) thus completes the proof. ◻

PROOF OF LEMMA 5↩︎

Proof. Let \(\sigma^*\) be optimal and fix any \(h^s \in \mathcal{H}^s(\sigma^*)\). Modify \(\sigma^*\) in the following way: at any history \(h^r\) where \(h^s \succ h^r\) or \(h^r\) and \(h^s\) are incomparable, do not change \(\sigma^*\). If \(t(h^s)(\mu, a) = 0\) or \(t(h^s)(\mu, a) > 0\) but the state is revealed always afterwards, do not change \(\sigma^*\). Finally, fix the \((\mu, a)\) such that \(t(h^s)(\mu, a) > 0\) but there exists an experiment in the future which is not full information. Here we want to change \(\sigma^*\): there are two cases.

First, if \(\mathcal{U}(\{h^s, \mu, a\} | \sigma^*) \geq u^{RFI}(\mu)\), then the utility promise already exceeds the maximum possible payoff form full-information. In this case, because by Lemma 3
\(V_+'(\mathcal{U}(\{h^s, \mu, a\} | \sigma^*)) = -k\), the logic of the pullback lemma implies there is a \(\mathcal{U}(\{h^s, \mu, a\} | \sigma^*)\)-constrained optimum, \(\tilde{\sigma}\), which gives full information and pays Receiver an additional amount (over the potential payoff from the pullback) to meet the utility promise constraint which is also optimal. Modifying \(\sigma^*\) so that \(\sigma^* = \bar \sigma\) at any history which is not succeeded by \(h^s\) on the path of play, and otherwise replacing the continuation strategy \(\sigma^*(\cdot | h^s)\) with \(\bar \sigma(\cdot | h^s) = \tilde{\sigma}\) delivers the desired continuation strategy \(\bar \sigma\) with \(h^s \not\in \mathcal{H}^s(\bar \sigma)\).

Second, suppose \(\eta = u^{RFI}(\mu) -\mathcal{U}(\{h^s, \mu, a\} | \sigma^*) > 0\). Take \(\tilde{u} = \frac{\delta}{1 - \delta} t(h^s; \sigma)(\mu, a)\). If \(\eta > \tilde{u}\), then replace \(t(h^s)(\mu, a)\) with \(0\), but replace \(\mathcal{U}(\{h^s, \mu, a\})\) with \(\tilde{u}\) exactly. Note Sender is indifferent to this change since \(V_+'(\mathcal{U}(\{h^s, \mu, a)) = -k\). The replacement procedure in the first case and the fourth property of the pullback lemma then imply there exists a sequentially optimal continuation strategy starting from \(h^s\) at which payments are \(0\) under \((\mu, a)\). If instead \(\eta < \tilde{u}\), set \(t(h^s; \bar \sigma)(\mu, a) = t(h^s; \sigma)(\mu, a) - \frac{\delta}{1 - \delta}(\tilde{u} - \eta)\) and set continuation strategies to the obedient \(\sigma^*\) in the pullback lemma. Because \(\tilde{u} > \eta\), the \(\alpha = 1\) in the fourth property of the pullback lemma, i.e. there is full revelation at all future continuation histories. As before, this maintains optimality of the continuation strategy without affecting the joint distribution of actions in histories before \(s\) (or incomparable to \(h^s\)). In either case, transfers are now a last resort at \(h^t\) conditional on \((\mu, a)\). Repeating this procedure for all \((\mu, a)\) that appear on path implies the argument. ◻

PROOF OF LEMMA 6↩︎

Proof. Suppose not, so \(E(\tilde{\mu}) > -k\) but there exists an optimal \(\sigma\) where \(\mu \in \text{supp}(\tau(h^s; \sigma))\) but payments occur prior to \(h^s\). This implies there exists \(h^t\) where \(t(h^t; \sigma)(\tilde{\mu}, a) > 0\) for \(a \in \text{supp}(\alpha(h^t; \sigma)(\tilde{\mu}))\), \(\tilde{\mu} \in \text{supp}(\tau(h^t; \sigma))\), but \(h^s \succ (h^t, \mu, a)\). Recall \(\tilde{\mu}\) is nondegenerate; consider an experiment \(\tilde{\tau}\) which further reveals the state when \(\tilde{\mu}\) would have been realized under \(\tau(h^s; \sigma)\), but otherwise is the same as \(\tau(h^s; \sigma)\); that is, \[\tilde{\tau}(\mu) = \begin{cases} \tau(h^s; \sigma)(\mu) \text{ if } \mu \in \text{supp}(\tau(h^s; \sigma)) \setminus \{\tilde{\mu}\} \\ \tilde{\mu}(\theta) \text{ if } \mu = \delta_\theta, \theta \in \text{supp}(\tilde{\mu}) \\ 0 \text{ otherwise } \end{cases}.\] Here, we abuse notation to allow for \(\tilde{\tau}(\mu)\) to “duplicate” two beliefs (if \(\delta_\theta\) was, for example, already supported by \(\tau(h^s; \sigma)\)). This is without loss of generality since Sender can recommend a mixed action at \(\delta_\theta\), though the proof is clearer by allowing for these duplicate beliefs. Suppose the action recommendation allows Receiver to take their favorite action at the split belief and otherwise recommends the same action: \[\tilde{\alpha} = \begin{cases} \alpha(h^s; \sigma)(\mu) \text{ if } \mu \in \text{supp}(\tau(h^s; \sigma)) \setminus \{\tilde{\mu}\} \\ \delta_{a^*(\mu, \mathbf{0})} \text{ if } \mu = \delta_\theta, \theta \in \text{supp}(\tilde{\mu}) \end{cases}.\] That is, \(\tilde{\alpha}\) lets Receiver take their favorite action when the state is revealed under \(\tilde{\tau}\) but not \(\tau(h^s; \sigma)\) and otherwise does not change the action recommendation.

Moreover, consider payments \(\tilde{t}\) and utility promises \(\tilde{u}\) defined by \[\begin{align} \tilde{t}(\mu, a) = \begin{cases} t(h^s; \sigma)(\mu, a) \text{ if } \mu \in \text{supp}(\tau(h^s; \sigma)) \setminus \{\tilde{\mu}\}, a \in \text{supp}(\tilde{\alpha}(h^s; \sigma)(\mu)) \\ t(\tilde{\mu}, a) \text{ if } \mu = \delta_\theta, \theta \in \text{supp}(\tilde{\mu}), a \in \text{supp}(\tilde{\alpha}(h^s; \sigma)(\mu))\end{cases} \\ \text{and } \tilde{u}'(\mu, a) = \begin{cases} u'(h^s; \sigma)(\mu, a) \text{ if } \mu \in \text{supp}(\tau(h^s; \sigma)) \setminus \{\tilde{\mu}\}, a \in \text{supp}(\tilde{\alpha}(h^s; \sigma)(\mu)) \\ u'(\tilde{\mu}) \text{ if } \mu = \delta_\theta, \theta \in \text{supp}(\tilde{\mu}), a \in \text{supp}(\tilde{\alpha}(h^s; \sigma)(\mu)) \end{cases}. \end{align}\] Note that \((\tilde{\tau}, \tilde{t}, \tilde{u}', \tilde{\alpha})\) maintain both the same expected transfers and the same path of utility promises as \((\tau(h^s; \sigma), t(h^s; \sigma), u'(h^s; \sigma), \alpha(h^s; \sigma))\) and thus have no effect on payments or Sender’s continuation value after \(h^s\).

Together, this gives a tuple \(\{\tilde{\tau}, \tilde{t}, \tilde{u}', \tilde{\alpha}\}\) which increases Receiver’s utility by \[\eta = \mathbb{E}_{\tilde{\mu}}[u(a^*(\delta_\theta, \mathbf{0}), \theta)) - u(a(h^s; \sigma)(\tilde{\mu}), \theta)] \geq \mathbb{E}_{\tilde{\mu}}[u(a^*(\delta_\theta, \mathbf{0}), \theta)) - u(a^*(\tilde{\mu}, \mathbf{0}), \theta)] > 0.\] The first inequality follows from the definition of \(a^*(\tilde{\mu}, \mathbf{0})\), and the second from the fact full information is strictly optimal for Receiver. Moreover, this move decreases Sender’s utility at history \(h^s\) by \[\varepsilon = \mathbb{E}_{\tilde{\mu}}[v(a^*(\delta_\theta, \theta)) - v(a(h^s; \sigma)(\tilde{\mu}), \theta)] \geq \min_{a' \in A} \{ \mathbb{E}_{\tilde{\mu}}[v(a^*(\delta_\theta, \theta)) - v(a', \theta)] \}.\] Together, this implies \(\frac{\varepsilon}{\eta} > -k\). If \(\varepsilon\geq 0\) then this is obvious. Otherwise, if \(\varepsilon< 0\), then \[\begin{align} \frac{\varepsilon}{\eta} & \geq \frac{\varepsilon}{\mathbb{E}_{\tilde{\mu}}[u(a^*(\delta_\theta, \mathbf{0}), \theta)) - u(a^*(\tilde{\mu}, \mathbf{0}), \theta)]} \\ & \geq \min_{a' \in A} \frac{\mathbb{E}_{\tilde{\mu}}[v(a^*(\delta_\theta, \theta)) - v(a', \theta)]}{\mathbb{E}_{\tilde{\mu}}[u(a^*(\delta_\theta, \mathbf{0}), \theta)) - u(a^*(\tilde{\mu}, \mathbf{0}), \theta)]} = E(\tilde{\mu}) > -k. \end{align}\] The first inequality decreases the denominator of a (negative) fraction; the second decreases the numerator; the third by the definition of \(E(\tilde{\mu})\), and the fourth by assumption.

From here, fix some \(\beta \in (0, 1)\) be chosen so that \(\beta \delta^{s - t} \eta \mathbb{P}^\sigma(h^s | (h^t, \mu)) \leq t(h^t; \sigma)(\mu)\). Consider the alternative strategy \(\tilde{\sigma}\), which modifies \(\sigma\) in the following way:

  1. For any history \(h^r\), \(h^r \not\succsim h^t\), \(\tilde{\sigma} = \sigma\).

  2. At \(h^t\), \(t(h^t; \tilde{\sigma})(\mu, a) = t(h^t; \sigma)(\mu, a) - \beta \delta^{s - t} \eta \mathbb{P}^\sigma(h^s | (h^t, \mu))\). At any continuation strategy \(h^r \succ h^t\) where \(h^r \not\succsim h^s\), do not change the strategy from \(\sigma\).

  3. At \(h^s\), with probability \(\beta\), set \(\sigma^S(h^s) = \{\tilde{\tau}, \tilde{t}\}\) with continuation payoffs \(\tilde{u}'\), and set \(\sigma^R(h^s) = \tilde{\alpha}(h^s)\). With complementary probability \(1 - \beta\), do not change the continuation strategy; that is, \(\sigma^S(\cdot | h^s) = \tilde{\sigma}^S(\cdot | h^s)\).

  4. When the probability \(\beta\) event in Step (3) occurs, at every belief \(\mu \in \text{supp}(\tilde{\tau})\), choose \(\tilde{\sigma}(\cdot | (h^s, \mu))\) to be some \(\tilde{u}'\)-constrained optimal continuation strategy.

Note that by choice of \(\eta\) and construction of \(\tilde{\sigma}\), Receiver incentive compatibility constraints remain unchanged at \(h^t\) and thus their actions only change at \(h^s\). Thus \(\tilde{\sigma}\) is feasible.

This observation then implies Sender’s payoff changes only at two histories: \(h^t\) and \(h^s\). At \(h^t\), they pay \(k \beta \delta^{s - t} \eta \mathbb{P}^\sigma(h^s | (h^t, \mu))\) less in transfers, while at \(h^s\) they forego \(-\beta \varepsilon\) total payoff in order to do so. This changes their total expected payoff by \[\delta^t \mathbb{P}^{\tilde{\sigma}}(h^t) \left(k \beta \delta^{s - t} \eta \mathbb{P}^\sigma(h^s | (h^t, \mu)) + \beta \delta^{s - t} \varepsilon \mathbb{P}^{\tilde{\sigma}}(h^s | (h^t, \mu))\right).\] where we note \(\mathbb{P}^\sigma(h^s | (h^t, \mu)) = \mathbb{P}^{\tilde{\sigma}}(h^s | (h^t, \mu))\). This expression is strictly positive if and only if \(k > \frac{-\varepsilon}{\eta}\), which is true by the way we have constructed \((\varepsilon, \eta)\). Hence \(\tilde{\sigma}\) must give Sender a strictly higher payoff than \(\sigma\), contradicting our assumption \(\sigma\) was optimal. ◻

PROOF OF PROPOSITION 4↩︎

Proof. The second statement is obvious; if \(k < k'\), then \(E(\mu) > -k'\) implies \(E(\mu) > -k\). The first statement. Fix \(\mu, \mu' \in \mathcal{E}(k)\). Note that \[\underline v(\mu) = \min_{a' \in A} \mathbb{E}_\mu[v(a^*(\delta_\theta, \mathbf{0}), \theta) - v(a', \theta)]\] is the minimum of linear functions in \(\mu\) and hence concave. Moreover, we know that \[\underline u(\mu) = \min_{a \in A} \mathbb{E}_\mu[u(a^*(\delta_\theta, \mathbf{0}), \theta) - u(a, \theta)] = \mathbb{E}_\mu[u(a^*(\delta_\theta, \mathbf{0}), \theta) - u(a^*(\mu, \mathbf{0}, \theta))]\] is also concave in \(\mu\). Setting \(\beta \mu + (1 - \beta)\mu' = \mu \beta \mu'\), we have \[\frac{\underline v(\mu \beta \mu')}{\underline u(\mu \beta \mu')} > -k \iff \underline v(\mu \beta \mu') > -k \underline u(\mu \beta \mu').\] Because \(k > 0\) and \(\underline u(\cdot)\) is concave, we have that \[-k \beta \underline u(\mu) - k(1 - \beta)\underline u(\mu') \geq -k \underline u(\mu \beta \mu').\] But we also know \(\underline v(\mu \beta \mu') \geq \beta \underline v(\mu) + (1 - \beta) \underline v(\mu')\) by concavity of \(\underline v\). Finally, since \(\mu, \mu' \in \mathcal{E}(k)\), \(\underline v(\mu) > -k \underline u(\mu)\) and \(\underline v(\mu') > -k \underline u(\mu')\). Putting everything together, we have that \[\underline v(\mu \beta \mu') \geq \beta \underline v(\mu) + (1 - \beta) \underline v(\mu') > -k \beta \underline u(\mu) - k (1 - \beta) \underline u(\mu') \geq -k \underline u(\mu \beta \mu')\] which gives the desired argument. ◻

PROOF OF PROPOSITION 5↩︎

Proof. Suppose not, so there exists \(a \in \text{supp}(\alpha(h^t)(\mu))\) occurring on path such that the condition fails. Define \(a'\) to be one such action where \(v(a', \theta) - v(a, \theta)\) where \[\mathbb{E}_\mu[v(a', \theta) - v(a, \theta)] \geq \min_{\theta \in \Theta} v(a', \theta) - v(a, \theta) \geq k \max_{\theta \in \Theta} u(a, \theta) - u(a', \theta) \geq k \mathbb{E}_\mu[u(a, \theta) - u(a', \theta).\] Let \(\tilde{t}(\mu) = \mathbb{E}_\mu[u(a, \theta) - u(a', \theta)]\) be the additional payment necessary to engender \(a'\) instead of \(a\) at belief \(\mu\) noting. Note \(\tilde{t}(\mu) \geq 0\) since we assumed we were at an optimum.

But now consider the alternative one-shot deviation to \(\sigma\), occurring at \(h^t\), where \(a'\) is recommended (and payments are augmented by \(\tilde{t}(\mu)\) at belief \(\mu\)), changing nothing else. By choice of payments, this is obedient for Receiver. This changes Sender’s utility at \(h^t\) by \[\mathbb{E}_\mu[v(a', \theta) - v(a, \theta)] - k\mathbb{E}_\mu[u(a, \theta) - u(a', \theta)] > 0\] contradicting our assumption we were at optimum. ◻

PROOF OF PROPOSITION 6↩︎

Proof. The proof follows two steps. First, we show that it is sufficient to focus on canonical transfers, those defined by \[t^I(a, \mu) = \mathbb{E}_\mu[u(a^*(\mu, \mathbf{0}), \theta) - u(a, \theta)] \mathbf{1}\left\{a \in a^S(\mu)\right\}\] where \[a^S(\mu) = \mathop{\mathrm{\arg\!\max}}_{a \in \mathcal{A}} \mathbb{E}_\mu[v(a, \theta) - k(u(a^*(\mu, \mathbf{0}), \theta) - u(a, \theta))].\]

Lemma 7. For any prior \(\mu_0\) and optimal \((\tau^*, t^*)\), the tuple \((\tau^*, t^I)\) is optimal as well.

Proof. Fix an optimal \((\tau^*, t^*)\), and suppose \(a^*(\mu, t^*)\) is induced at some belief \(\mu \in \text{supp}(\tau^*)\) inducing \(a^*(\mu, t^*)\). First suppose \(a^*(\mu, t^*) \in a^S(\mu)\). Since Receiver takes action \(a^*\), this implies \[\mathbb{E}_\mu[u(a^*(\mu, t^*), \theta)] + t^*(a^*(\mu, t^*), \mu) \geq \mathbb{E}_\mu[u(a, \theta)] + t^*(a, \mu) \text{ for all } a \in \mathcal{A}\] Taking differences implies \[\mathbb{E}_\mu[u(a^*(\mu, t^*), \theta) - u(a, \theta)] \geq t^*(a^*(\mu, t^*), \mu) - t^*(a, \mu) \text{ for all } a \in \mathcal{A}\] Since \(a^*(\mu, \mathbf{0})\) maximizes Receiver’s payoff without transfers, we have that for any \(a \not\in a^S(\mu)\) (so that \(t^I(a, \mu) = 0\)) \[\begin{align} \mathbb{E}_\mu[u(a^*(\mu, t^*), \theta) - u(a, \theta)] \geq \mathbb{E}_\mu[u(a^*(\mu, t^*), \theta) - u(a^*(\mu, \mathbf{0})), \theta)] = t^I\mu) \end{align}\] and so in particular under payments \(t^I\) Receiver will not want to take any \(a \not\in a^S(\mu)\). Suppose now that \(a^*(\mu, t^*) \not\in a^S(\mu)\). Then if at belief \(\mu\) Sender paid \(t^I(a, \mu)\) for some \(a \in a^S(\mu)\), they could attain a strictly higher payoff then the induced pair under \((\tau^*, t^*)\) at \(\mu\), a contradiction to the optimality of the original tuple. This finishes the proof. ◻

Given that canonical transfers are sufficient, it is clear that \[\max_t \mathbb{E}_\mu[v(a^*(\mu, t), \theta) - k t(\mu, a))] = \max_{a \in \mathcal{A}} \mathbb{E}_\mu[v(a, \theta) + u(a, \theta) - u(a^*(\mu, \mathbf{0}), \theta)]\] and hence \(V^*(\mu_0)= \text{cav}|_{\mu_0}(V^t(\mu))\) always. The next step shows that this concavification is equal to the k-cavification on extremal beliefs.

To do so, we need to prove a quick geometric lemma.

Lemma 8. \(\mathcal{O}_a\) is a convex polytope.

Proof. \(\mathcal{O}_a\) is compact and convex by Lemma A.1 of [37]; that \(\mathcal{O}_a\) is a polytope follows by noting \[\mathcal{O}_a = \Delta(\Theta) \cap \left( \bigcap_{a' \in \mathcal{A} \setminus \{a\}} \left\{ m: \int u(a, \theta) - u(a', \theta) dm \geq 0 \right\} \right)\] \(\mathcal{K} = \bigcup_a \text{ext}(\mathcal{O}_a)\). where \(m\) is any measure (not necessarily a probability measure). This is a finite intersection of half-spaces and thus each \(\mathcal{O}_a\) has finitely many extreme points by Theorem 19.1 of [43]. ◻

We can now prove the final step. For any fixed \(\tilde{a}\), recall that on \(\mathcal{O}_{\tilde{a}}\), \[V^t(\mu) = \max_{a \in \mathcal{A}} \left\{ \mathbb{E}_\mu[v(a, \theta) - k (u(\tilde{a}, \theta) - u(a, \theta))] \right\},\] which is the maximum of linear functions over a finite index. This implies \(V^t(\mu)\) is convex over the interior of each \(O_{\tilde{a}}\). Moreover, \(V^t(\mu)\) is globally upper semi-continuous over all of \(\mu\) since it is the finite upper envelope of continuous functions. Thus, we have that \(\lim\limits_{\mu \to \bar \mu} V^t(\mu) \leq V^t(\bar \mu)\) for any \(\bar \mu \in \Delta(\Theta)\), with strict inequality only potentially possible on the interior of \(O_a\).

Now set \(\mathcal{K} = \bigcup_a \text{ext} O_a\) be the set of extremal beliefs; this is finite. By definition, it must be that \(\text{cav}^{\mathcal{K}}(V^t)(\mu)\) is a concave function such that \(\text{cav}^{\mathcal{K}}(V^t)(\mu) \geq V^t(\mu)\) for all \(\mu \in \mathcal{K}\), which itself is piecewise convex and globally upper semi-continuous (with jumps at most on points in \(\mathcal{K}\)). But then because \(\text{cav}^{\mathcal{K}}(V^t)(\mu) \geq V^t(\mu)\) on the boundary of each \(\mathcal{O}_a\) and affine on the interior (by the definition of the concavification), it must be that \(\text{cav}^{\mathcal{K}}(V^t)(\mu) \geq V^t(\mu)\) for each belief \(\mu \in \Delta(\Theta)\). Since we know also \(\text{cav}^{\mathcal{K}}(V^t)(\mu) \leq V^*(\mu)\) for all \(\mu \in \Delta(\Theta)\) it must be that \(\text{cav}^{\mathcal{K}}(V^t)(\mu) = \text{cav}(V^t)(\mu)\). This finishes the proof. ◻

PROOF OF PROPOSITION 7↩︎

Proof. The lower bound. Define \(\Gamma^{int}(\mu_0) = \{(\gamma, m) : \mathbb{E}_\gamma[u(a, \theta)] + m > \underline U(\mu_0) \text{ and } \gamma|_{\Theta} = \mu_0\}\). We show for any \(\varepsilon > 0\) there exists \((\gamma, m) \in \Gamma^{int}(\underline U(\mu^\infty))\) and \(\bar \delta < 1\) such that there is some obedient \(\sigma^*\) which gives Sender a payoff of at least \(\mathbb{E}_\gamma[v(a, \theta) - km - \varepsilon]\) for all \(\delta > \bar \delta\). Since payoffs are continuous over the (closed) set \(\Gamma(u)\), this then implies the result.

Fix some \((\gamma, m) \in \Gamma^{int}(\mu^\infty)\). Let \(\sigma^*\) be the stationary strategy where \(\tau\) reveals the state and recommends mixed action \(\gamma_\theta = \mathbb{E}[\gamma | \theta] \in \Delta(A)\) at \(\delta_\theta\) at every history on-path. If Receiver deviates from any recommendation, then give no information forevermore and recommend Receiver’s no-information optimum. At any history, any state \(\theta\), and any action \(a\), Receiver’s deviation gain is bounded by \[B = \max_{a', \theta'} u(a', \theta') - \min_{a, \theta} u(a, \theta)\] today. Sender’s incentive compatibility requirement can be written at \(\mu = \delta_\theta\) and recommendation \(a\) as \[(1 - \delta) u(a, \theta) + \delta \left( \mathbb{E}_\gamma[u(a, \theta)] + m \right) \geq (1 - \delta) u(a^*(\delta_\theta), \theta) + \delta \sum_{t = 1}^\infty u(a^*(M^t\delta_\theta, \mathbf{0}), \theta).\] Recall now the following theorem about the mixing times of Markov chains:

Proposition 9 (Levin and Peres, Theorem 4.9). Let \(M\) be irreducible and aperiodic. There exist \(C, \alpha > 0\) such that \(\max_{\theta \in \Theta} ||M^n \delta_\theta - \mu^\infty|| \leq C \alpha^t\) for \(C, \alpha \geq 0\).

Proposition 9 implies that for all \(t\) and any \(\eta > 0\), there exists \(\bar T\) such that (1) for any \(s \geq \bar T\), \(||\mathbb{Q}_t^{\sigma^*}(h^s | h^t) - \gamma||_\infty < \eta\) and (2) \(|u(a^*(M^t \delta_\theta, \mathbf{0}), \theta) - u(a^*(\mu^\infty, \mathbf{0}), \theta)| < \eta\). Hence given any history, both players belief play to be close to \(\gamma\) and that the distribution of the state will be close to \(\mu^\infty\). This implies the value of deviating and receiving no information can be bounded above by \[(1 - \delta) \sum_{t = 0}^{\bar T} B + \delta^{\bar T} \left( \underline U(\mu^\infty) - \mathbb{E}_\gamma[u(a, \theta) + m] - \frac{1}{2} \varepsilon(\eta) \right)\] where \(\varepsilon(\eta)\) is a function vanishing as \(\eta \to 0\). This holds regardless of the realized state. This then implies we can find \(\bar \delta(\eta) < 1\) such that whenever \(\delta > \bar \delta(\eta)\), the value of deviating is no more than \(\varepsilon(\eta)\). Setting payments \(m = \varepsilon(\eta)\) and taking \(\eta \to 0\) then implies \(\sigma^*\) is obedient.

Finally, since \(||\mathbb{Q}_t^{\sigma^*} - \gamma||_\infty\) vanishes uniform as \(t \to \infty\), Sender attains a payoff arbitrarily close to \(\gamma\) as \(t \to \infty\). This implies the lower bound (taking \(\bar \delta\) big enough).

The upper bound. Suppose there is an optimal stationary \(\sigma^*\), so that there is a fixed \(\{\tau, t, \alpha\}\) realized after every on-path history. By Proposition 9, there exists \(\bar T\) sufficiently large such that for all \(t \geq \bar T\), \(||Q_t^{\sigma^*} - \gamma||_\infty < \varepsilon\), where \(\gamma\) is the coupling induced by \(\sigma^*\) at the ergodic distribution. Let \(m\) be the induced total transfers at \(\gamma\). The argument for the lower bound implies \(\mathbb{E}_\gamma[u(a, \theta)] + m > \underline U(\mu^\infty)\), as otherwise Receiver can deviate and secure at least their no-information payoff essentially always as \(\delta \to 1\). This also implies Sender cannot do better in this stationary equilibrium: in particular, for every \(\varepsilon > 0\), there exists \(\hat{T}\), independent of \(\delta\), such that their payoff is bounded from above by \[(1 - \delta) \sum_{t = 0}^{\hat{T}} \delta^t \max_{a, \theta} v(a, \theta) + \delta^{\hat{T}} \left( \mathbb{E}_\gamma[v(a, \theta)] - km + \varepsilon\right).\] Taking \(\delta \to 1\) implies the upper bound and hence the proof. ◻

PROOF OF PROPOSITION 8↩︎

Proof. The proof proceeds in four steps. As a preamble, we note that Sender never benefits from intratemporally coordinating because decisions are simultaneous and independent across periods. The first step is to characterize the way incentives are provided along the Pareto frontier.

Lemma 9. Let \(\sigma\) be optimal. Then for any \(h^t \in \text{supp}(\mathbb{P}^\sigma)\), \(\mathbb{P}(a_i = 0 | \theta_i = 1, h^t, \sigma) = 0\).

Proof. Suppose not. Then there exists \(h^t\) on-path history where \(\mathbb{P}(a_i = 0 | \theta_i = 1, h^t, \sigma) > 0\). This implies there is a belief \(\tilde{\mu}_i \in \text{supp}(\tau_i(h^t; \sigma))\) such that \(a_i(\tilde{\mu}) = 0\) but \(\tilde{\mu}_i \neq \delta_0\).

Consider the experiment \(\tilde{\tau}_i\) which splits beliefs along \(\tilde{\mu}_i\) in the following way. First, for any \(\mu_i \neq \tilde{\mu}_i\), let \(\tau_i(\mu_i) = \tilde{\tau}_i(\mu_i)\). Along \(\tilde{\mu}_i\), split beliefs so that \(\tilde{\tau}_i(\tilde{\mu}_i) = 0\), and \(\tilde{\tau}_i(\delta_0) = \tau_i(\tilde{\mu}_i(0))\), and \(\tilde{\tau}_i(\delta_1) = \tau_i(\tilde{\mu}_i(1))\). This is clearly Bayes plausible. In a similar way, modify payments and utility promises so that they are the same at beliefs where \(\tau_i = \tilde{\tau}_i\), and equal to \(t_i(\tilde{\mu}_i)\) and \(u_i'(\tilde{\mu}_i)\) at the newly induced beliefs. Finally, allow Receiver to take her favorite action at the newly induced beliefs, and otherwise the same action as before.

Such an experiment must be incentive compatible because Receiver is myopically maximizing at the new beliefs, and at the old ones (i.e. those supported by \(\tau\)), they are the same. Do not change any \(\tau_j\), \(j \neq i\). This alternative strategy strictly increases \(\mathbb{P}(a_1 = 1 | \theta_1 = 1, h^t, \sigma)\) while decreasing \(\mathbb{P}(a_1 = 0 | \theta_1 = 1, h^t, \sigma)\) without affecting decisions in any other dimension. Moreover, it does not change Sender’s expected payments or continuation value (since the distribution of \(\tilde{t}\) and \(\tilde{u}'\) are the same as \(t\) and \(u'\)). Thus this deviation is strictly profitable, contradicting supposed optimality of \(\sigma\). ◻

Lemma 9 implies that every optimal experiment can only induce action \(a_i = 0\) at belief \(\delta_0\); pooling all beliefs at which \(a_i = 1\) is recommended then implies that the optimal information takes the form of \(\text{supp}(\tau_i) = \{0, \mu_i\}\), with \(a_i = 1\) at \(\mu_i\). The next step is to bound the cost of providing incentives by increasing \(\mu_i\) in some dimension \(i\). For the next lemma, let \(V_i(u)\) be the value function of the problem in dimension \(i\) only, and let \(u_i^{FI}\) be the value of full information to Receiver in only dimension \(i\).

Lemma 10.

We have \[V_{i+}'(u) = \begin{cases} - c_i \text{ if } u \leq u^{FI}, c_i < k \\ -k \text{ else. } \end{cases}\]

Proof. Note that by Lemma 9, it must be that the only way to provide incentives via information is by increasing \(\mu_i\). This linearly decreases \(\mathbb{P}(a_1 = 0 | \theta_1)\), benefiting Receiver by \(1\) while costing Sender \(-c_i\). ◻

Lemma 10 and separability across dimensions then implies that the total Pareto frontier has slopes \(V_+'(u) = \{\max\{-c_i, -k\}\}_{i = 1}^n\), each (except for \(-k\)) over a region of length \(u_i^{FI}\). At the slope of \(-k\), either tool can be used. This is illustrated below.

Figure 2: Pareto Frontier

Lemma 10 also implies the stopping times \(T_i^*(h)\), if they exist, are nested in the sense that \(T_i^*(h) \leq T_j^*(h)\) if \(c_i, c_j < k\) and \(c_i < c_j\) (since we provide incentives along cheaper dimensions first).

The third step is to show that only \(\tau^{BP}\), \(\tau^{FI}\) are induced on the path of play less a single time in each dimension. The proof strategy is similar to the squeezing lemma.

Lemma 11 (Squeezing Lemma, Redux). There exists an optimal strategy \(\sigma\) where for \(\mathbb{P}^\sigma\) almost all histories \(h^t\), \[\#\{t : \tau_i(h^t) \not\in \{\tau^{BP}, \tau^{FI}\}\} \leq 1.\]

Proof. Suppose not. Let \[\mathcal{G}_i(h, \sigma) = \#\{t : \tau(h^t; \sigma) \not\in \{\tau^{BP}, \tau^{FI}\} \text{ and } \mathcal{G}_i(\sigma) = \{h: \mathcal{G}_i(h, \sigma) \geq 2\}.\] Moreover, define \[\tilde{T}_i^*(h; \sigma) = \inf\{t: \tau_i(h^t; \sigma) \not\in \{\tau^{BP}, \tau^{FI}\}\}\] to be the first time at which \(\tau_i(h^t; \sigma)\) is neither full information or Bayesian persuasion.

Now fix some on-path history \(h \in \mathcal{G}_i(\sigma) \cap \mathbb{P}^\sigma\), so that for \(t = \tilde{T}_i^*(h; \sigma)\), one has \(\tau_i(h^r; \sigma) \not\in \{\tau^{BP}, \tau^{FI}\}\) for \(r = t, s\) where \(s \geq t\), \(h^s \succ h^t\). Suppose \(\text{supp}(\tau(h^t; \sigma)) = \{0, \mu_i(h^t)\}\) and \(\text{supp}(\tau(h^s; \sigma)) = \{0, \mu_i(h^s)\}\). Consider now small \(\varepsilon < \mu_i(h^t) - \frac{1}{2}\) and experiments \(\text{supp}(\tilde{\tau}(h^t; \sigma))) = \{0, \mu_i(h^t) - \varepsilon\}\) and \(\text{supp}(\tilde{\tau}_i(h^s; \sigma)) = \{0, \mu_i(h^s) + \eta(\varepsilon)\}\) for some \(\eta(\varepsilon)\) vanishing as \(\varepsilon \to 0\). By Lemma 9, both of these changes are experiments that remain on the Pareto frontier. By Lemma 10, there is moreover some \(\eta(\varepsilon)\) such that by changing \(\tau_i(h^r; \sigma) \to \tilde{\tau}_i(h^r; \sigma)\) for \(r \in \{t, s\}\) and identifying the utility promises after \(\mu_i\) to the newly induced beliefs (i.e. changing nothing else), one remains on the Pareto frontier and in particular promises the same amount of utility to the agent at all histories.

Thus, by logic similar to that in Lemma 5 (exploiting the fact the Pareto frontier is linear with slope equal to \(\max\{-c_i, -k\}\) at histories \(h^{\tilde{T}_i^*(h; \sigma)}\)), it is possible to modify \(\sigma\) to some optimum \(\tilde{\sigma}\) such that either \(\tau(h^t; \tilde{\sigma}) = \tau^{BP}\) or \(h \not\in \mathcal{G}_i(\sigma)\). ◻

Now take an optimum satisfying Lemma 11 and applying the process in Theorem 1 to ensure transfers are a last resort; call this optimum \(\sigma^*\) where both transfers are a last resort and \(\mathcal{G}_i(\sigma)\) is empty. Define the time \[T_i^*(h) = \inf\{t : \tau_i(h; \sigma^*) = \tau^{FI}\} - 1.\] Then clearly this stopping time satisfies the condition; we are done if we can show that \(T_i^*(h) < \infty\) for \(\mathbb{P}^{\sigma^*}\) almost all histories.

Suppose not. This implies that \(\tau_i = \tau^{BP}\) always. If \(c_i < k\) for all \(i\), then this is clearly optimal. Otherwise, if there exists \(c_i > k\), then by Proposition 5 it must be that \(\alpha_i(h^t)(\mu) = 1\) for all \(h^t\) and \(\mu\). But for this to be obedient at \(\mu_i = \delta_0\) either the agent’s continuation utility must be greater than \(0\) following this realization or transfers must occur. Since transfers are a last resort and \(\tau_i = \tau^{BP}\) always, transfers cannot motivate the agent. Since \(\tau_i = \tau^{BP}\) always, the agents’ payoff in each dimension is equal to their no-information outside option baseline. Hence it is impossible for this to be obedient, a contradiction. Thus transfers are used along almost every history. ◻

Appendix B: The Strong Topology on Strategies↩︎

In this appendix we formally define the strong topology for strategies \(\sigma: H^\infty \to \mathcal{X}^\infty\) where \(\mathcal{X}\) is a compact topological space, \(H\) is some compact, countable alphabet, and \(H^\infty\) and \(\mathcal{X}^\infty\) denote the product topological spaces over \(H\) and \(\mathcal{X}\), respectively. Suppose \(\sigma\) has a basis decomposition if \[\sigma(h) = \{\sigma(h^t)\}_{t = 0}^\infty\] where \(h^t\) is the projection of \(h \in H^\infty\) onto its time-\(t\) cylinder and \(\sigma(h^t): H^t \to \mathcal{X}\) maps the time-\(t\) projection onto an individual element of \(\mathcal{X}\). Throughout, we will assume \(\sigma\) has a basis decomposition.

To give an economic interpretation, say, \(H^\infty\) is the set of infinite histories, \(H^t\) is the set of time-\(t\) histories, \(H\) is the finite alphabet of public signals, and \(\mathcal{X}\) is the action set. \(\sigma\) is then a strategy in the corresponding repeated game. This is the context in which these objects are used in this paper, and a version of this topology appears in [44]’s Lemma 3. In this appendix, we formally prove and collect several relevant definitions and facts about this topological space. Throughout, let \(||\cdot||_\infty\) metrize the discrete topology on \(\mathcal{X}\).

Definition 10. Say that \(\sigma_n \to \sigma\) in the strong topology if and only if \[d(\sigma_n, \sigma) = \sup_{h \in H^\infty} \sum_{t = 0}^\infty \frac{1}{2^t} ||\sigma_n(h^t) - \sigma(h^t)||_\infty.\]

Proposition 10. Let \(\Sigma\) be the set of all strategies. \(\Sigma\) is compact in the strong topology.

Proof. The goal is to adapt a sufficiently general version of the functional Arzela-Ascoli theorem to prove that \((\Sigma, d)\) is sequentially compact. First, the space \(\Sigma\) is equicontinuous; note that \[d(\sigma(h), \sigma(h')) = \sum_{t = 0}^\infty \frac{1}{2^t} ||\sigma(h^t) - \sigma(h^{'t})||_\infty\] and so in particular if any two histories \(h, h'\) agree for the first \(\frac{1}{2^T}\) periods (in which case the distance \(d(h, h')\) is at most \(\frac{1}{2^{T - 1}}\)) then it must be that the distance of their images is separated by at most \(\frac{1}{2^{T - 1}}\). Since this is true for any \(\sigma\) and at any history, the modulus of continuity for functions in \(\Sigma\) is independent both of the choice of \(h\) and of the choice of \(\sigma\) and hence \(\Sigma\) is equicontinuous. Moreover, this space of strategies is closed; for any sequence \(\{\sigma_n\}\) which converges to some \(\sigma\), \(\sigma\) is itself a function defined on the cylinders as \[\sigma(h^t) = \lim\limits_{n \to \infty} \sigma_n(h^t)\] where convergence of \(\sigma_n(h^t)\) is guaranteed as \(\mathcal{X}\) is compact. Finally, for any fixed \(h \in H\), the closure of the set \(\{\sigma(h)\}_{\sigma \in \Sigma}\) is a closed subset of a compact set (namely, \(\mathcal{X}^\infty\)) and hence compact. This implies we satisfy Theorem 47.1 of [45] and hence \(\Sigma\) is sequentially compact (though it is proven in the topology of compact convergence—this coincides with our topology when the ambient space of the domain is compact). ◻

Now suppose \(\mathcal{X} = \Delta(H)\), so that each time-\(t\) history is mapped to a (random) new letter of the alphabet. Then for any any function \(\sigma\), note that there is a natural measure defined on \(\mathcal{H}^\infty\) generated by the cylinders \(\mathbb{P}_t^{\sigma}(h^t) = \mathbb{P}_{t - 1}^\sigma(h^{t - 1})\sigma(h^{t - 1})(x)\), where \(h^t = \{h^{t - 1}, x\}\) for \(x \in H\). Let the tail probability measure associated with this sequence be \(\mathbb{P}^\sigma\) (that is, \(\mathbb{P}^\sigma(h) = \lim\limits_{t \to \infty} \mathbb{P}_t^\sigma(h)\)). We conclude with the following result.

Proposition 11. If \(\sigma_n \to \sigma\) in the strong topology, \(\mathbb{P}^{\sigma_n} \to \mathbb{P}^\sigma\) pointwise.

Proof. We first show \(\mathbb{P}_t^{\sigma_n} \to \mathbb{P}_t^{\sigma}\). Clearly this is true when \(t = 0\), since the starting measure is trivial. Inductively, suppose \(\mathbb{P}_{t - 1}^{\sigma_n} \to \mathbb{P}_{t - 1}^\sigma\) for every history \(h\). Then \[\mathbb{P}_{t}^{\sigma_n}(h) = \mathbb{P}_t^{\sigma_n}(h^t) = \mathbb{P}_{t - 1}^{\sigma_n}(h^{t - 1})\sigma_n(h^{t - 1})(x) \to \mathbb{P}_{t - 1}^\sigma(h^{t - 1})\sigma(h^{t - 1})(x) = \mathbb{P}_t^\sigma(h^t)\] as desired. Taking \(t \to \infty\) then gives the result. ◻

References↩︎

[1]
E. Kamenica and M. Gentzkow, “Bayesian persuasion,” American Economic Review, vol. 101, no. 6, pp. 2590–2615, 2011, doi: 10.1257/aer.101.6.2590.
[2]
Uber, Accessed September 12, 2025“Privacy protection for riders.” https://www.uber.com/us/en/ride/overview/data-and-privacy/, 2025.
[3]
J. Thomas and T. Worrall, “Self-enforcing wage contracts,” The Review of Economic Studies, vol. 55, no. 4, pp. 541–553, 1988, doi: 10.2307/2297404.
[4]
J. Thomas and T. Worrall, “Income fluctuation and asymmetric information: An example of a repeated principal-agent problem,” Journal of Economic Theory, vol. 51, no. 2, pp. 367–390, 1990, doi: 10.1016/0022-0531(90)90023-D.
[5]
D. Ray, “The time structure of self-enforcing agreements,” Econometrica, vol. 71, no. 6, pp. 1877–1893, 2003, doi: 10.1111/1468-0262.00295.
[6]
Y. Guo and J. Hörner, “Dynamic allocation without money,” Toulouse School of Economics (TSE), TSE Working Papers 20-1133, 2020.
[7]
E. Lipnowski and J. Ramos, “Repeated delegation,” Journal of Economic Theory, vol. 188, p. 105040, 2020, doi: 10.1016/j.jet.2020.105040.
[8]
A. Pavan, I. Segal, and J. Toikka, “Dynamic mechanism design: A myersonian approach,” Econometrica, vol. 82, no. 2, pp. 601–653, 2014, doi: 10.3982/ECTA10269.
[9]
D. Fudenberg and L. Rayo, “Training and effort dynamics in apprenticeship,” American Economic Review, vol. 109, no. 11, pp. 3780–3812, 2019.
[10]
D. Fudenberg, G. Georgiadis, and L. Rayo, “Working to learn,” Journal of Economic Theory, vol. 197, p. 105347, 2021, doi: 10.1016/j.jet.2021.105347.
[11]
L. Rayo and I. Segal, “Optimal information disclosure,” Journal of Political Economy, vol. 118, no. 5, pp. 949–987, 2010, doi: 10.1086/657922.
[12]
D. Bergemann and S. Morris, “Information design: A unified perspective,” Journal of Economic Literature, vol. 57, no. 1, pp. 44–95, 2019, doi: 10.1257/jel.20181489.
[13]
J. C. Ely, “Beeps,” American Economic Review, vol. 107, no. 1, pp. 31–53, 2017, doi: 10.1257/aer.20150218.
[14]
J. Renault, E. Solan, and N. Vieille, “Optimal dynamic information provision,” Games and Economic Behavior, vol. 104, pp. 329–349, 2017.
[15]
E. Lehrer and D. Shaiderman, Forthcoming“Markovian persuasion,” Theoretical Economics, 2025.
[16]
I. Ball, “Dynamic information provision: Rewarding the past and guiding the future,” Econometrica, vol. 91, no. 4, pp. 1363–1391, 2023, doi: 10.3982/ECTA201294.
[17]
M. Golosov, V. Skreta, A. Tsyvinski, and A. Wilson, “Dynamic strategic information transmission,” Journal of Economic Theory, vol. 151, pp. 304–341, May 2014, doi: 10.1016/j.jet.2013.12.012.
[18]
J. Renault, E. Solan, and N. Vieille, “Dynamic sender-receiver games,” Journal of Economic Theory, 2013.
[19]
D. Ravid, A.-K. Roesler, and B. Szentes, “Learning before trading: On the inefficiency of ignoring free information,” Journal of Political Economy, vol. 130, no. 2, pp. 431–464, 2022, doi: 10.1086/717350.
[20]
S. Terstiege and C. Wasser, “Competitive information disclosure to an auctioneer,” American Economic Journal: Microeconomics, vol. 14, no. 3, pp. 622–664, 2022, doi: 10.1257/mic.20200027.
[21]
J. C. Ely and M. Szydlowski, “Moving the goalposts,” Journal of Political Economy, vol. 128, no. 2, pp. 468–506, 2020.
[22]
D. Orlov, A. Skrzypacz, and P. Zryumov, “Persuading the principal to wait,” Journal of Political Economy, vol. 128, no. 7, pp. 2542–2578, 2020, doi: 10.1086/706687.
[23]
A. Koh and S. Sanguanmoo, “Attention capture,” 2022.
[24]
A. Koh, S. Sanguanmoo, and W. Zhong, “Persuasion and optimal stopping,” 2024.
[25]
J. C. Ely, G. Georgiadis, S. Khorasani, and L. Rayo, Advance access publication 28 October 2022“Optimal feedback in contests,” Review of Economic Studies, vol. 0, pp. 1–25, 2022, doi: 10.1093/restud/rdac074.
[26]
J. Ely, L. Rayo, and G. Georgiadis, Forthcoming, accepted for publication“Feedback design in dynamic moral hazard,” Econometrica, 2024.
[27]
E. Lipnowski and D. Ravid, “Cheap talk with transparent motives,” Econometrica, vol. 88, no. 4, pp. 1631–1660, 2020.
[28]
L. Doval and V. Skreta, “Constrained information design,” Mathematics of Operations Research, vol. 49, no. 1, 2023, doi: 10.1287/moor.2022.1346.
[29]
L. Barros, “Information acquisition in cheap talk,” arXiv preprint, 2025.
[30]
J. Ortner, T. Sugaya, and A. Wolitzky, “Mediated collusion,” Journal of Political Economy, vol. 132, no. 4, pp. 1247–1289, May 2024, doi: 10.1086/727710.
[31]
T. Sugaya and A. Wolitzky, Working paper“Collusion with optimal information disclosure,” 2025.
[32]
A. Kolotilin and H. Li, “Relational communication,” Theoretical Economics, vol. 16, no. 4, pp. 1391–1430, 2021, doi: 10.3982/TE3841.
[33]
D. Bergemann and S. Morris, “Bayes correlated equilibrium and the comparison of information structures in games,” Theoretical Economics, vol. 11, pp. 487–522, 2016, doi: 10.3982/TE1850.
[34]
J.-F. Mertens, S. Sorin, and S. Zamir, Repeated games, vol. 17. Cambridge: Cambridge University Press, 2015.
[35]
N. L. Stokey, Jr. Robert E. Lucas, and E. C. Prescott, Recursive methods in economic dynamics. Cambridge, MA: Harvard University Press, 1989.
[36]
E. Lipnowski and L. Mathevet, Manuscript, University of Chicago and New York University“Simplifying bayesian persuasion,” Mar. 2017.
[37]
E. Gao and D. Luo, “Prior-free predictions for persuasion,” arXiv preprint arXiv:2312.02465, 2025.
[38]
J. Levin, “Relational incentive contracts,” American Economic Review, vol. 93, no. 3, pp. 835–857, 2003, doi: 10.1257/000282803322157035.
[39]
L. Mathevet, J. Perego, and I. Taneva, “On information design in games,” Journal of Political Economy, vol. 128, no. 4, pp. 1346–1382, 2020, doi: 10.1086/705332.
[40]
A. Smolin and T. Yamashita, “Information design in smooth games,” 2025.
[41]
E. Winter, “Incentives and discrimination,” American Economic Review, vol. 94, no. 3, pp. 764–773, 2004.
[42]
M. Halac, E. Lipnowski, and D. Rappoport, “Rank uncertainty in organizations,” American Economic Review, vol. 111, no. 3, pp. 757–786, 2021.
[43]
R. Rockafellar, Convex analysis. Princeton University Press, 1996.
[44]
D. Luo and A. Wolitzky, arXiv:2411.15317 [econ.TH]“Marginal reputation,” Econometrica, 2025, [Online]. Available: https://doi.org/10.48550/arXiv.2411.15317.
[45]
J. R. Munkres, Topology, 2nd ed. Prentice Hall, 2000.

  1. Massachusetts Institute of Technology, daniel57@mit.edu. I am especially grateful to Drew Fudenberg, Stephen Morris, and Alex Wolitzky for invaluable guidance. I am thankful to Zihao Li for many enlightening discussions in the early stages of this project, and Ian Ball, Abhijit Banerjee, Alessandro Bonatti, Yifan Dai, Eric Gao, Andrew Koh, Anton Kolotilin, Andrew Komo, Ellen Muir, Harry Pei, Ryo Shirakawa, Eric Tang, Frank Yang, the Fall 2025 3B Conference, MIT Theory Lunch, MIT Organizational Economics Lunch, and Stonybrook 2025 for helpful suggestions and comments. I acknowledge financial support from the NSF Graduate Research Fellowship. All errors are mine alone.↩︎

  2. In moral hazard or incomplete contracting for the former, or information design for the latter.↩︎

  3. Uber writes “The type of approximate pickup and dropoff location shown to drivers varies based on address structures in your market, local regulations and driver loyalty programs.” ([2]).↩︎

  4. See the discussion succeeding Theorem 1 for a detailed discussion.↩︎

  5. In the absence of the nonnegative limited liability constraint, Sender can obtain first best by “informationally” selling the firm to the agent—offering full information for a payment equal to their value of information, then paying them to take the surplus maximizing action.↩︎

  6. This assumption will not affect the conclusion of Theorem 1, though relaxing it substantially complicates the remainder of the analysis outside of the i.i.d. case. For simplicity I maintain this assumption throughout.↩︎

  7. I suppose players do not observe their payoffs, so that the state is not revealed at the end of each period. When the state is drawn i.i.d., as in the case of the leading example, this assumption is without loss of generality.↩︎

  8. The formulation here implicitly presupposes Sender commitment.↩︎

  9. For example, in the motivating rideshare example, the driver’s value for rides may vary with each draw.↩︎

  10. So, for example, the famous Figure 1 of Kamenica and Gentzkow could not have been generated by a finite persuasion problem.↩︎

  11. For example, how much the employee would enjoy being on that project, how complementary their skill sets are to the objective, the difficulty of the project, etc.↩︎

  12. This can be, for example, because one of the two projects matters for the manager’s promotion package while the other does not.↩︎

  13. Indeed, [15], who study a related question under the stronger assumption Receiver is myopic, call characterizing the value of persuasion in the general case “The grand question...[We do] not offer a solution for the optimal disclosure of information by a patient sender when the disclosure has future payoff implications. This question remains open and it seems challenging to tackle.”↩︎

  14. With transfers, these are often those which trace out the efficient set when agents have relational incentives; see [38] and [32].↩︎