Risk-Aware Real-Time Task Allocation for

Stochastic Multi-Agent Systems under STL Specifications ^{1}

April 02, 2024

This paper addresses the control synthesis of heterogeneous stochastic linear multi-agent systems with real-time allocation of signal temporal logic (STL) specifications. Based on previous work, we decompose specifications into sub-specifications on the individual agent level. To leverage the efficiency of task allocation, a heuristic filter evaluates potential task allocation based on STL robustness. Subsequently, an auctioning algorithm determines the definite allocation of specifications. Finally, a control strategy is synthesized for each agent-specification pair using tube-based Model Predictive Control (MPC), ensuring provable probabilistic satisfaction. We demonstrate the efficacy of the proposed methods using a multi-bus scenario that highlights a promising extension to autonomous driving applications like crossing an intersection.

Multi-agent systems are frequently employed in complex tasks that require close collaborations. Formal specifications, such as signal temporal logic (STL), are increasingly utilized to define complex collaborative task objectives [1]. The collaborative control of multi-agent systems satisfying STL specifications has been studied using mixed-integer programming [2], funnel functions [3], and control barrier functions [4]. Nevertheless, a critical challenge for the practical implementation of these methods is their reliability in uncertain and dynamic environments. For systems subject to stochastic disturbances, it is important to evaluate the impact of these disturbances on the task accomplishment of individual agents, so that the associated risks are restricted [5]. When tasks are assigned at runtime [6], task allocation and control become challenging in the multi-agent framework.

Despite recent efforts, there exists a notable gap in the literature regarding the control of stochastic multi-agent systems under real-time allocated STL specifications with probabilistic guarantees. Such systems have significant practical value, exemplified by scenarios such as the two-vehicle intersection case in Figure 1. In such a scenario, each vehicle must navigate while considering its stochastic disturbances and avoiding collisions with its opponent. The complexity of the problem necessitates decomposing the overarching driving specification into two sub-specifications tailored to each vehicle. These sub-specifications, characterized by different time intervals, may require real-time assignment. Hence, this paper addresses the control synthesis of stochastic multi-agent systems under collaborative STL specifications assigned at runtime with probabilistic guarantees.

A crucial aspect of addressing this challenging problem involves task assignment or allocation (TA), which aims to determine optimal agent-task pairs in a multi-agent system [7]. This process yields a linear programming problem [8], efficiently solvable using auction algorithms [9]. Dynamically assigned tasks have been described using real-time STL specifications [10], [11]. TA necessitates decomposing specifications assigned to multiple collaborative agents into individual agent specifications, requiring the decomposition of STL specifications both among agents [12], [13] and across time [14]. With decomposed real-time specifications, control synthesis for stochastic multi-agent systems holds promise with the utilization of shrinking-horizon model predictive control (MPC) [15] to ensure bounded risks [16]. In previous work, probabilistic reachable tubes have been utilized to optimize risk bounds for both predefined specifications [17] and real-time specifications [6]. These methods underpin the risk-aware control synthesis of stochastic multi-agent systems with decomposed real-time specifications.

This paper proposes a novel approach to allocating STL specifications, involving subgroups of agents, and synthesizing control strategies for individual agents in real-time. By decomposing newly assigned specifications into agent-level tasks, we employ a heuristic approach based on STL robustness [18] to assess agent-specification pairs and attack task assignment via an auctioning scheme formulated as a mixed integer linear program. Subsequently, a tube-based model predictive control strategy is employed where an optimization is solved at every time step at agent level. The solution of this optimization ensures the satisfaction of running and newly allocated tasks with a desired risk level, while discarding tasks not adhering to the risk specification. We prove recursive feasibility of this MPC scheme, leading to an open-loop system with provable probabilistic guarantees and a closed-loop system for which guarantees are suggested.

This section introduces the preliminaries, class of systems, temporal logics, and basic assumptions, culminating in a problem statement and an approach.

For a given probability measure \(\mathbb{P}\) defined over Borel measurable space \((\mathbb{X},\mathcal{B}(\mathbb{X}))\), we denote the probability of an event \(\mathcal{A} \in \mathcal{B}(\mathbb{X})\) as \(\mathbb{P}(\mathcal{A})\). In this paper, we will work with Euclidean spaces and Borel measurability. Further details on measurability are omitted, and we refer the interested reader to [19]. Additionally, any half-space \(H \subset \mathbb{R}^n\) is defined as \(H:=\{x \in \mathbb{R}^n \mid g^Tx\geq b\}\) with \(g \in \mathbb{R}^n\) and \(b \in \mathbb{R}\). A polyhedron is the intersection of finitely many half-spaces, also denoted as \(H:=\{x \in \mathbb{R}^n \mid Gx\geq b\}\) with \(G \in \mathbb{R}^{q\times n}\) and \(b \in \mathbb{R}^q\). A polytope is a bounded polyhedron. The 2-norm is defined by \(||x||=\sqrt{x^Tx}\) with \(x \in \mathbb{R}^n\). The identity matrix is denoted by \(I_n \in \mathbb{R}^{n \times n}\). The vector of all elements one is denoted by \(\boldsymbol{1}_{n} \in \mathbb{R}^{n}\), but if the context is clear, it will instead be denoted by \(\boldsymbol{1}\). The zero vector is denoted by \(\boldsymbol{0}_n \in \mathbb{R}^n\), but if the context is clear, it will instead be denoted by \(\boldsymbol{0}\). For any finite set \(I\), the cardinality is denoted by \(|I|\). \(\mathrm{Proj}_X(Y)\) denotes the projection of elements in \(Y\) onto \(X\).

In this paper, we consider a finite ordered set \(\mathcal{I}\) of heterogeneous agents where each agent \(i \in \mathcal{I}\) has linear time-invariant dynamics with additive noise,
given by \[\label{Sys} x^i(k+1)=A^ix^i(k)+B^iu^i(k)+w^i(k),\tag{1}\] where matrix pair \((A^i,B^i)\) is stabilizable. We assume all agents have the
same state and input dimensions. For the \(i\)-th agent, \(x^i \in \mathbb{X}^i \subseteq \mathbb{R}^n\) is the state, \(x^i(0) \in \mathbb{X}^i\) is an
initial state, \(u^i \in \mathbb{U}^i \subseteq \mathbb{R}^m\) is the input and \(w^i \in \mathbb{R}^n\) is an independent, identically distributed (i.i.d.) noise disturbance with
distribution \(\mathcal{Q}^i_w\), i.e., \(w^i(k) \sim \mathcal{Q}^i_w\), which can have infinite support. We will assume that the distribution has at least a known mean and variance, with
the latter assumed to be strictly positive definite. Additionally, we assume that the distribution is central convex unimodal^{2}. Finally, we assume that any two disturbances
\(w^i(k)\) and \(w^j(l)\) are independent.

For each agent \(i \in \mathcal{I}\), we assume there is a local controller. We define the local controllers as a sequence of policies \[\boldsymbol{f}^i:= \{f^i_0, f^i_1,\dots\},\] such that \(f^i_k \!\!:\!\! \mathbb{H}^i_k\!\! \to \!\! \mathbb{U}^i\) maps the history of states and inputs to inputs. Here \(\mathbb{H}^i_k\!:=\!(\mathbb{X}^i \times \mathbb{U}^i)^k \times \mathbb{X}^i\) with elements \(\eta^i(k)\!:=\!(x^i(0),u^i(0), \cdots, u^i(k-1), x^i(k))\). By implement-ing controller \(\boldsymbol{f}^i\) upon the corresponding agent, we obtain the controlled form of 1 for which the control input satisfies the feedback law \(u^i(k)= f^i_k(\eta^i(k))\) with \(\eta^i(k) \in \mathbb{H}^i_k\). We indicate the input sequence of agent \(i\) by \(\boldsymbol{u}^i:=\{u^i(0),u^i(1),\dots\}\) and define its executions as sequences of states \(\boldsymbol{x}^i:=\{x^i(0),x^i(1),\dots\}\) referred to as signals.

We define the suffix, segment and signal element of any signal \(\boldsymbol{x}^i:=\{x^i(0),x^i(1), \ldots\}\), respectively, by \(\boldsymbol{x}^i_k=\{x^i(k), x^i(k+1), \dots\}\), \(\boldsymbol{x}^i_{[a,b]}:=\{x^i(a), \ldots, x^i(b)\}\) with \(a\leq b\), and \(\boldsymbol{x}^i(k):=\boldsymbol{x}^i_{[k,k]}\). We define the concatenation of \(\boldsymbol{x}^i\), for all agents \(i \in I \subseteq\mathcal{I}\), by a higher dimensional signal \(\boldsymbol{x}^I\), where \(\boldsymbol{x}^I(k):=[\boldsymbol{x}^{i_1}(k)^T, \cdots, \boldsymbol{x}^{i_l}(k)^T]^T\), \(i_j \in I\), and \(l=|I|\) is the number of elements in \(I\). Any signal \(\boldsymbol{x}^i\) can be interpreted as a realization of the probability distribution induced by implementing controller \(\boldsymbol{f}^i\), denoted by \(\boldsymbol{x}^i \sim \mathbb{P}_{\boldsymbol{f}^i}\). Similar realizations exist for segments, i.e., \(\boldsymbol{x}^i_k \sim \mathbb{P}_{\boldsymbol{f}^i,k}\) and concatenations, i.e., \(\boldsymbol{x}^I \sim \mathbb{P}_{\boldsymbol{f}^I}\), with \(\boldsymbol{f}^I:=\{\boldsymbol{f}^{i} \mid i \in I\}\).

To mathematically describe tasks, e.g., reaching a target within a given time frame, we consider specifications given by signal temporal logic. Here, we assume that all STL specifications adhere to the *negation normal form* (NNF), see [1]. This assumption does not restrict the overall framework since [21] proves that every STL specification can be rewritten into the negation normal form.

STL consists of predicates \(\mu^l\) for subgroups of \(l\) agents that are either true (\(\top\)) or false (\(\bot\)). Each predicate \(\mu^l\) is described by a function \(h:\mathbb{R}^{nl} \to \mathbb{R}^q\), \(l \leq |\mathcal{I}|\), as follows \[\mu^l:=\begin{cases} \top \text{ if } \forall i \in \{1, \cdots, q\}, \;h_i(x) \geq 0 \\ \bot \text{ if } \exists i \in \{1, \cdots, q\}, \;h_i(x) < 0. \end{cases}\] Here \(h_i: \mathbb{R}^{nl} \to \mathbb{R}\) indicates the \(i\)-th element of function \(h\). We will assume that all predicate functions are affine functions of the form \(h(x)=Gx - b\) with \(G \in \mathbb{R}^{q \times nl}\), \(b\in \mathbb{R}^q\) and \(||g_j||=1\) for all \(j \in \{1, \cdots, q\}\), where \(g_j\) is the \(j\)-th row of \(G\). Furthermore, we assume that \(H_{\mu^l}:=\{ x \in \mathbb{R}^{nl} \mid G x \geq b\}\) describes a polytope. Let the set of all predicates be given by \(\mathcal{P}\). The STL syntax will be given by \[\phi::= \top\mid \mu^l \mid \lnot \mu^l \mid \phi_1 \wedge \phi_2 \mid \phi_1 \vee \phi_2 \mid \square_{[a,b]} \phi \mid \phi_1 \;U_{[a,b]} \;\phi_2,\] where \(\mu^l \in \mathcal{P}\), \(\phi\), \(\phi_1\) and \(\phi_2\) are STL formula, \(a,b \in \mathbb{N}\), and \(a \leq b\). The semantics are given next, where \(\boldsymbol{x}^{\mathcal{I}}_k \vDash \phi\) denotes the satisfaction of \(\phi\) verified over the suffix of signal \(\boldsymbol{x}^{\mathcal{I}}\).

**Definition 1**. *The STL semantics, for a specification jointly assigned to a set of agents, are recursively given by: \[\begin{align} & \boldsymbol{x}^{\mathcal{I}}_k \vDash \mu^l & \iff &
\exists I \subseteq \mathcal{I}: \;|I|=l, \;h({\boldsymbol{x}^I(k)})\geq \boldsymbol{0} \\ & \boldsymbol{x}^{\mathcal{I}}_k \vDash \lnot \mu^l & \iff & \nexists I \subseteq \mathcal{I}: \;|I|=l, \;h({\boldsymbol{x}^I(k)})\geq \boldsymbol{0} \\
& \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_1 \wedge \phi_2 & \iff & \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_1 \text{ and } \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_2 \\ & \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_1 \vee \phi_2 &
\iff & \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_1 \text{ or } \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_2 \\ & \boldsymbol{x}^{\mathcal{I}}_k \vDash \square_{[a,b]} \;\phi & \iff & \forall k' \in [k+a,k+b]:
\boldsymbol{x}^{\mathcal{I}}_{k'} \vDash \phi \\ & \boldsymbol{x}^{\mathcal{I}}_k \vDash \phi_1 \;U_{[a,b]} \;\phi_2 & \iff & \exists k_1 \in [k+a,k+b]: \boldsymbol{x}^{\mathcal{I}}_{k_1} \vDash \phi_2 \\ & & & \text{ and }
\forall k_2 \in [k,k_1], \;\boldsymbol{x}^{\mathcal{I}}_{k_2} \vDash \phi_1.
\end{align}\]*

Here, we introduced a slight abuse of the notation as \([a,b]\) is used to describe the set of all natural numbers within the real interval \([a,b]\). Additional operators can be derived such as the eventually-operator \(\lozenge_{[a,b]} \phi := \top U_{[a,b]} \phi\).

Since all agents behave stochastically, each STL specification can only be satisfied probabilistically. Should a specification be assigned at time \(k\), the probability can be determined based on the state measurement \(x^{\mathcal{I}}(k)\), the controller \(\boldsymbol{f}^{\mathcal{I}}\), and the system dynamics 1 . Accordingly, we consider the probability that suffix fragment \(\boldsymbol{x}^{\mathcal{I}}_k\) satisfies specification \(\phi\), given state \(x^{\mathcal{I}}(k)\). We denote this by \[\label{Eq:Multi95Prop} \mathbb{P}_{\boldsymbol{f}}(\phi,k):=\mathbb{P}_{\boldsymbol{f}^{\mathcal{I}},k}(\boldsymbol{x}^{\mathcal{I}}_k \vDash \phi \mid x^\mathcal{I}(k)).\tag{2}\] For each specification \(\phi\), we assume that given is the maximal allowable risk \(r_{\phi, \max}\) of not satisfying specification \(\phi\). In the remainder, we will refer to this as the maximal risk. Note that for \(|\mathcal{I}|=1\), the above definitions of STL and probability will become the standard ones as seen in [6].

We first consider an example (see Figure 2) to illustrate the problem and approach covered within this paper. Next, we give a problem description and a problem statement. Lastly, we elaborate upon the approach as seen in Figure 3.

**Example 1.** *We examine a multi-robotic scenario in which robots are assigned joint tasks. Upon receiving a new task, only its description and the required number of robots are known. A procedure decides which group of robots, if any,
will perform the task. Given the assumption that all robots’ behaviour is independent, new tasks are initially decomposed into sub-tasks at the individual level; that is, each robot performs part of the task without requiring information from other robots.
Subsequently, the procedure prompts potential robot sub-task combinations to investigate an update to the individual control strategy without disrupting ongoing sub-tasks. Among all combinations, the optimal assignment is selected and all control
strategies are updated accordingly.*

**Problem Description.** Assume each agent \(i \in \mathcal{I}\) has dynamics 1 with noise distribution \(\mathcal{Q}^i_w\) and state update \(x^i(k)\), at each time step \(k \in \mathbb{N}\). The objective is to update the suffix controller \(\boldsymbol{f}^{\mathcal{I}}_k\) at each time step \(k \in \mathbb{N}\). Here, any newly provided specification is either accepted or rejected based on the maximal risks of the newly provided specification and previously accepted specifications.

**Problem Statement.** Given a sequence of STL specifications \(\boldsymbol{\phi}:=\{\phi_1, \cdots, \phi_t\}\) assigned, respectively, at time instances \(\{k_1, \cdots,
k_t\}\) with \(k_i<k_j\) for \(i<j\), with maximal risk \(r_{\phi_i,\max}\), develop a method for *updating*, at each time step \(k\in \mathbb{N}\), suffix \(\boldsymbol{f}^{\mathcal{I}}_k\) such that \(\mathbb{P}_{\boldsymbol{f}}(\phi_i, k_i)\geq1-r_{\phi_i,\max}\) if \(k_i\leq k\), where \(\phi_i\) with \(k_i=k\) is allowed to be rejected.

**Approach.** First, a newly provided STL specification and maximal risk are decomposed onto the single-agent level (Section 4.1). Subsequently, a filter determines which agent-specification pairs are
favorable based on STL robustness and heuristics (Section 4.2). Afterwards, for each remaining agent-specification pair, a potential control strategy is determined together with the local risk value (Section 3). Based on the local risk value for each feasible agent-specification pair, auctioning determines the definitive assignment (Section 4.3). Should no valid assignment exist, the
specification will be rejected. Finally, the control strategy for each agent is updated.

Before the multi-agent case, this section considers the single-agent case. Consequently, every specification is on the single-agent level, and only a single agent-specification pair exists. Thus, decomposition of the specification, filtering of the pairs, and auctioning are omitted. Instead, this section will cover control synthesis and local risk value computation for any individual agent-specification pair.

The section will be structured as follows. First, we will consider decomposing the single-agent dynamics, define probabilistic reachable tubes and relate each specification to its maximal risk via (mostly) linear constraints. Second, we utilize the previously mentioned to formulate a tube-based MPC problem to synthesize a control strategy and determine the local risk value for any given agent-specification pair. This section may be regarded as a self-contained summary of our previous work on risk-aware MPC for single-agent systems [6]. We refer the reader to [6] for more details.

In this paper, we decompose dynamics 1 , given by \[x^i(k+1)=A^ix^i(k)+B^iu^i(k)+w^i(k),\] into a nominal and an error part [22]. The nominal dynamics, denoted as \(z^i\), contain no stochasticity, and the (stochastic) error dynamics, denoted as \(e^i\), are autonomous. This yields \[\tag{3} \begin{align} z^i(k+1) & = A^iz^i(k)+B^iv^i(k), \tag{4}\\ e^i(k+1) &= A^i_Ke^i(k)+w^i(k), \tag{5}\\ withx^i(k) &= z^i(k)+ e^i(k),\\ u^i(k) &= v^i(k) +K^ie^i(k). \end{align}\] Here \(A^i_K=A^i+B^iK^i\) and \(K^i\) is a stabilizing feedback gain meant to keep the error \(e^i\) small. Throughout this section, we omit superscripts indicating agent indices from our notation.

We define probabilistic reachable sets similar to [22].

**Definition 2** (Probabilistic Reachable Sets). *A probabili-stic reachable set (PRS) of dynamic probability level \(p\) for error dynamics 5 , denoted by \(\mathcal{R}^p\), is a set that satisfies \[\label{Eq:PRT} e(0) = \boldsymbol{0} \Rightarrow \mathbb{P}(e(k) \in \mathcal{R}^p) \geq p, \;\forall k \geq
0.\qquad{(1)}\] *

Definition 2 states that a PRS of probability level \(p\) will contain at any time instance \(k\) the accumulating error \(e(k)\) with at least probability \(p\), if the initial error satisfies \(e(0)=\boldsymbol{0}\). Multiple representations for probabilistic reachable sets exist, including the ellipsoidal representation. The ellipsoidal representation is obtained from the multivariable Chebyshev inequality; details are given in [22], [23]. For simplicity, we will make the following assumption.

**Assumption 1**. *Distribution \(\mathcal{Q}_w\) has zero mean.*

The above assumption is not necessary to obtain an ellipsoidal representation but will simplify most of the computations. Under Assumption 1, the ellipsoidal representation is
given by \[\label{EllipNot} \mathcal{R}^p= \{x \in \mathbb{R}^{n} \mid x^T \Sigma_{\infty}^{-1}x \leq \tilde{p}\},\tag{6}\] where \(\Sigma_{\infty}\) solves the Lyapunov equation \(A_K\Sigma_{\infty}A_K^T+\text{var}(\mathcal{Q}_w)=\Sigma_{\infty}\), \(\tilde{p}=\frac{n}{1-p}\), and \(p\) is the probability level. Since the variance of \(\mathcal{Q}_w\) is assumed to be strictly positive definite and \(A_K\) is stable, Lyapunov theory states
that \(\Sigma_{\infty}\) is strictly positive definite. In the sequel, we consider the transformed state and constraints for which \(\Sigma_{\infty}=I_n\). These can be obtained by a simple
coordinate transformation \(y=\Sigma_{\infty}^{-\frac{1}{2}}x\). We call the transformed dynamics \(y\) the *normalized dynamics*, and in the remainder, we maintain our notations as
if the dynamics are normalized.

By defining a PRS \(\mathcal{R}^{p_k}\) at each time instance \(k \in \{0, \cdots, N\}\), we obtain a finite sequence of PRS called a probabilistic reachable tube (PRT) \(\mathcal{R}:=\{\mathcal{R}^{p_0}, \cdots, \mathcal{R}^{p_N}\}\). We define the risk level \(r_k=1-p_k\) as an upper bound on the probability that \(\mathcal{R}^{p_k}\) does not contain the error \(e(k)\).

In this subsection, we aim to relate each STL specification \(\phi\) to its maximal risk \(r_{\phi, \max}\) via (mostly) linear constraints. To achieve this, we relate each STL specification \(\phi\) assigned at time \(k\), by means of a PRT, to the risk \(r_\phi^k\) of failing specification \(\phi\) assigned at time \(k\). Subsequently, the desired relation can be derived by imposing that \(r_{\phi}^k \leq r_{\phi,\max}\).

We first define the active horizon \(\mathcal{H}_{\phi}^k\) of an STL formula \(\phi\) as the set of all time instances needed to evaluate \(\boldsymbol{x}_k \vDash \phi\) (see [6]). Here, it should be noted that any active horizon consists of only finite elements. Notice that the suffix \(\boldsymbol{x}_{N+1}\) with \(N=\max{\mathcal{H}_\phi^k}\), does not influence the satisfaction of \(\boldsymbol{x}_k \vDash \phi\). Hence, in the remainder of this section, we consider only the prefix \(\boldsymbol{x}_{[k,N]}:=\{x(k), \cdots, x(N)\}\) and note that \(\mathbb{P}_{\boldsymbol{f}, k}({\boldsymbol{x}}_k \vDash \phi \mid x(k))=\mathbb{P}_{\boldsymbol{f}, k}({\boldsymbol{x}}_{[k,N]} \vDash \phi \mid x(k))\).

Given an STL specification \(\phi\) assigned at time \(k\), [1] explains that mixed integer linear constraints can represent any STL specification in negation normal form. As can be observed in the following equations, for each time instance \(\tau\in \mathcal{H}_{\phi}^k\), these STL specifications define a linear constraint that is parametrized with binary variable \(s\), as follows \[\tag{7} \begin{align} G(\tau)x(\tau)+MF(\tau)s &\geq b(M,\tau) + \rho(\tau)\boldsymbol{1},\\ Cs \geq d, \;\epsilon \leq \rho(\tau) &\leq \frac{M}{2}, \;\forall \tau \in \mathcal{H}_{\phi}^k, \tag{8} \end{align}\] where \(G(\tau)\), \(E(\tau)\), \(C\) and \(b(M,\tau)\), \(d\) are, respectively, matrices and vectors of appropriate sizes; \(M\) is a large positive number; \(\epsilon\) is a small positive number; and \(\rho(\tau)>0 \in \mathbb{R}\) is a positive real number.

We consider the following proposition to quantify risk \(r_{\phi}^k\). The proof follows from [6], which utilizes a PRT with risk levels \(r(\tau)=\frac{n}{\rho(\tau)^2}\).

**Proposition 1**. *Let specification \(\phi\) be assigned at time \(k\) to the system 3 with normalized dynamics. If the nominal trajectory
\(\{z(k), \cdots, z(N)\}\) with \(z(k)=x(k)\) and nominal controls \(\{v(k), \cdots, v(N-1)\}\) satisfies 7 with \(\{\rho(k), \cdots, \rho(N)\}\) strictly positive, then \[\mathbb{P}_{\boldsymbol{f}, k}({\boldsymbol{x}}_k \vDash \phi \mid x(k))\geq 1-r_{\phi}^k,\] where risk \(r_{\phi}^k\) is at most \[\label{Eq:Risk} r_{\phi}^k =\sum_{\tau\in \mathcal{H}_\phi^k \backslash \{k\} } \frac{n}{\rho(\tau)^2}.\qquad{(2)}\] *

A relation between specification \(\phi\) assigned at time \(k\) and risk \(r_{\phi}^k\) follows from constraints 7 and ?? .

In this subsection, we compute the local risk value and synthesize a controller for a given agent-specification pair. We assume a new specification \(\phi\) is provided at time \(k\) together with maximal risk \(r_{\phi, \max}\). For any specifications \(\psi\), we denote with \(k_{\psi}\) the time at which the specification was assigned. We denote with \(\mathcal{P}^k\) the set of all previously accepted specifications \(\psi\) with \(k_{\psi}< k\), together with the newly assigned specification \(\phi\), i.e., \(\phi \in \mathcal{P}^k\). The objective is to synthesize a controller \(\boldsymbol{f}\) such that \(\varphi=\wedge_{\psi \in \mathcal{P}^k}\lozenge_{[k_{\psi}, k_{\psi}]} \psi\) is satisfied in accordance to the maximal risk of each conjunction element, i.e., \(\mathbb{P}_{\boldsymbol{f}}(\psi,k_\psi)\geq 1-r_{\psi,\max}, \;\forall \psi \in \mathcal{P}^k\).

We are interested in implementing a tube-based MPC algorithm that at each time instance \(k\in[0,N]\) recomputes the optimal nominal trajectory \(\boldsymbol{z}_{[k, N]}\), the optimal nominal input \(\boldsymbol{v}_{[k,N-1]}\), and the optimal risk levels \(\boldsymbol{r}_{[k,N]}\). For this, we assume given \(x(k)\) together with \(\boldsymbol{z}_{[0,k]}\), \(\boldsymbol{v}_{[0,k-1]}\), \(\boldsymbol{\rho}_{[0,k]}\) and \(\boldsymbol{r}_{[0,k]}\) computed at time \(k-1\). Furthermore, for all \(\psi\in \mathcal{P}^k\), we assume \(\mathcal{H}_{\psi}^{k_{\psi}} \subseteq [0, N]\) without lose of generality.

Let us consider the following tube-based MPC problem \[\tag{9} \begin{align} \min_{\Xi} \; J(&\boldsymbol{z}_{[0,N]}, \boldsymbol{v}_{[0,N-1]}, \boldsymbol{r}_{[0,N]}), \tag{10}\\ \text{s.t. } \;z(k) &= x(k) \tag{11}\\ z(\tau+1) &= Az(\tau)+Bv(\tau), \forall \tau \in \{k, \cdots, N-1\},\!\! \tag{12} \\[0.4em] G(j)z(j)&+MF(j)s(k) \geq b(M,j) + \rho(j)\boldsymbol{1}, \tag{13}\\ Cs(k) &\geq d, \;\epsilon \leq \rho(j)\leq \textstyle \frac{M}{2},\;\forall j\in \{0, \cdots, N\}, \tag{14}\\[0.4em] n &= \rho(j)^2 r(j), \forall j\in \{0, \cdots, N\}, \tag{15}\\ r_{\psi,\max} &\geq \textstyle \sum_{j \in \mathcal{H}_\psi^{k_\psi}\setminus \{k_\psi\}} r(j), \forall \psi \in \mathcal{P}^k, \tag{16} \end{align}\] where \(\Xi:=\{\boldsymbol{z}_{[k, N]}, \boldsymbol{v}_{[k,N-1]}, \boldsymbol{r}_{[k+1,N]}, \boldsymbol{\rho}_{[k+1,N]}, s(k)\}\), \(J\) is either a linear or quadratic cost function, the constraints 13 -14 are obtained from \(\varphi=\wedge_{\psi \in \mathcal{P}^k}\lozenge_{[k_{\psi}, k_{\psi}]} \psi\) and the constraints 15 -16 are obtained from Prop. 1. From the above tube-based MPC, an (updated) control strategy is obtained from \(\boldsymbol{v}_{[k,N-1]}\), and a local risk value \(r_{\phi,mpc}\) is obtained from the right-hand side of 16 , i.e., \[\label{Eq:RiskVal} r_{\phi,mpc}=\sum_{j \in \mathcal{H}_\phi^{k}\setminus \{k\}} r(j).\tag{17}\]

A solution to 9 , by Prop. 1, ensures that for all \(\psi \in \mathcal{P}^k\), \(\mathbb{P}_{\boldsymbol{f}}(\psi,k_\psi)\geq 1-r_{\psi,\max}\). Should there exist no solution to problem 9 , the corresponding agent-specification pair is deemed infeasible. Consequently, in the single-agent case, we will then update the control strategy based only on measurement \(x(k)\), as per constraint 11 . This can be achieved by considering the MPC problem 9 with \(\mathcal{P}^k=\mathcal{P}^k\setminus\{\phi\}\). However, should also the measurement fail to produce a controller, the previously computed control strategy will be implemented instead.

In this section, we consider the multi-agent case. More specifically, this section will primarily cover real-time task allocation for multi-agent systems. We refer to section 3 regarding control synthesis and local risk value computation for individual agent-specification pairs.

The section will be structured as follows. First, we will consider specification decomposition and heuristic agent-specification filtering. Second, we utilize auctioning, based on local risk values, to obtain a definitive agent-specification assignment. The section concludes with an implementation algorithm and theoretical analysis of the overall method.

Let \(\phi\) be a newly assigned STL specification with maximal risk \(r_{\phi,\max}\), involving a distinct number of agents, \(\nu\leq |\mathcal{I}|\). The objective is to decompose the specification \(\phi\) into agent-level sub-specifications, \(\phi^j\), \(j\in\mathcal{I}\), and allocate individual maximal risks, \(r_{\phi^j,\max}\), in accordance to \(r_{\phi,\max}\). First, we define \(\mathrm{dec}(\phi,\nu)=\wedge_{j=1}^\nu\phi^j\) as the decomposition of \(\phi\) into a conjunctive form, with each conjunct representing a sub-specification, \(\phi^j\), the satisfaction of which depends on a single agent \(j\), and \(\nu\) being the number of agents needed to satisfy \(\phi\). Second, we assign a maximal risk to \(\phi^j\), following a union-bound argument, which requires \(\sum_{j=1}^\nu r_{\phi^j,\max}\geq r_{\phi,\max}\). In this paper, we select a uniform allocation of risks across sub-specifications, obtaining \[\label{Eq:RiskAve} r_{\phi^j,\max}=\frac{r_{\phi,\max}}{\nu},\tag{18}\] for all agents \(j\in \{1,\ldots,\nu\}\).

The decomposition \(\mathrm{dec}(\phi,\nu)\) acts as a transformation on \(\phi\) returning a conjunctive formula, the satisfaction of which implies satisfaction of \(\phi\). However, the converse is not necessarily true. There are a few methods [12], [13], [24] for splitting STL specifications into sub-specifications. In this paper, given a specification \(\phi\), we convert Boolean and temporal operators to conjunctive forms based on the heuristics in [24] and decompose predicates following a similar approach to [12]. Therein, predicates are defined on general convex sets, whereas negated forms are not studied. Here, we focus on polytopic settings and allow negations. For our setting, we assume the following.

**Assumption 2**. *Let \(\phi\) be an STL specification assigned at time \(k\). There exists decomposition \(\mathrm{dec}(\phi,\nu)=\wedge_{j=1}^{\nu}\phi^j\), where \(\phi^j\) indicates a task involving a single agent.*

In view of existing decomposition approaches, where sub-specifications are assigned to subteams of agents, Assumption 2 is restrictive. However, in the setting of polytopic predicates, we can produce agent-level sub-specifications by projecting the associated predicates onto the agents’ state space. Before showing this, we define the following.

**Definition 3**. *Let \(H:=\{x \mid h(x)\geq \boldsymbol{0}\}\) be a polytope in \(\mathbb{R}^n\). \(B_{\mathrm{in}}(H):=\{x\mid
\underline{b}_j^{\mathrm{in}}\leq x_j\leq \overline{b}_j^{\mathrm{in}},\; j=1,\ldots,n\}\) is an orthotope inscribed in \(H\), i.e., \(B_{\mathrm{in}}(H)\subseteq H\), and \(B_{\mathrm{out}}(H):=\{x\mid \underline{b}_j^{\mathrm{out}}\leq x_j\leq \overline{b}_j^{\mathrm{out}},\; j=1,\ldots,n\}\) is an orthotope circumscribing \(H\), i.e., \(B_{\mathrm{out}}(H)\supseteq H\).*

**Lemma 1**. *Let \(\mu:=(h(x)\geq \boldsymbol{0})\) be a predicate, \(H:=\{x \mid h(x)\geq \boldsymbol{0}\}\) be a polytope, with \(x=[x_1^\intercal\;\ldots\;x_\nu^\intercal]^\intercal\), \(x_i\in\mathbb{R}^{n}\), \(i\in \{1,\ldots,\nu\}\), and \(B_{\mathrm{in}}(H)\subseteq H\) and \(B_{\mathrm{out}}(H)\supseteq H\) be the associated orthotopes inscribed in and circumscribing \(H\), respectively. Then, it
holds that (i) \(\mathrm{dec}(\mu,\nu)=\wedge_{i=1}^\nu \mu_{\mathrm{in}}^i\) satisfies \(\mu\), with \(\mu_{\mathrm{in}}^i:=(x_i\in
\mathrm{Proj}_{x_i}(B_{\mathrm{in}}(H))\), and (ii) \(\mathrm{dec}(\neg\mu,\nu)=\wedge_{i=1}^\nu \neg\mu_{\mathrm{out}}^i\) satisfies \(\neg \mu\), with \(\mu_{\mathrm{out}}^i:=(x_i\in \mathrm{Proj}_{x_i}(B_{\mathrm{out}}(H))\).*

*Proof.* (i) Let \(\mu_{\mathrm{in}}:=(x\in B_{\mathrm{in}}(H))\). By construction, it holds that \(B_{\mathrm{in}}(H)\equiv \mathrm{Proj}_{x_1}(B_{\mathrm{in}}(H))\times \cdots\times
\mathrm{Proj}_{x_\nu}(B_{\mathrm{in}}(H))\), implying that \(\mu_{\mathrm{in}}\equiv \wedge_{i=1}^\nu\mu_{\mathrm{in}}^i\). Since, \(B_{\mathrm{in}}(H)\subseteq H\), \(\mu_{\mathrm{in}}\) satisfies \(\mu\), which leads to \(\wedge_{i=1}^\nu\mu_{\mathrm{in}}^i\) satisfying \(\mu\) as needed. (ii)
Let \(\mu_{\mathrm{out}}:=(x\in B_{\mathrm{out}}(H))\). Similarly to (i), it holds that \(\mu_{\mathrm{out}}\equiv \wedge_{i=1}^\nu\mu_{\mathrm{out}}^i\), since \(B_{\mathrm{out}}(H)\equiv \mathrm{Proj}_{x_1}(B_{\mathrm{out}}(H))\times \cdots\times \mathrm{Proj}_{x_\nu}(B_{\mathrm{out}}(H))\), by construction. Then, we have \(\neg
\mu_{\mathrm{out}}\equiv\lor_{i=1}^{\nu}\neg\mu_{\mathrm{out}}^i\). Since \(\wedge_{i=1}^{\nu}\neg\mu_{\mathrm{out}}^i\) satisfies \(\lor_{i=1}^{\nu}\neg\mu_{\mathrm{out}}^i\) and
\(\neg\mu_{\mathrm{out}}\) satisfies \(\neg \mu\), we have that \(\wedge_{i=1}^{\nu}\neg\mu_{\mathrm{out}}^i\) satisfies \(\neg
\mu\), completing the proof. ◻

**Example 2:** *Consider formulas \(\phi=\square_{[a,b]}\mu\) and \(\psi=\square_{[a,b]}\neg\mu\), where \(\mu:=(h(x)\geq
\boldsymbol{0})\) is a predicate, with \(h(x)=\begin{bmatrix} c_1 & c_2 \end{bmatrix}x+\boldsymbol{1}_4\), \(c_1=\begin{bmatrix} -\boldsymbol{1}_2^\intercal &
\boldsymbol{1}_2^\intercal \end{bmatrix}^\intercal\), \(c_2=\begin{bmatrix} 1 & -1 & 1 & -1 \end{bmatrix}^\intercal\). We wish to convert \(\phi\) and \(\psi\) into conjunctive forms of two conjuncts (subformulas). First, we define the polytope \(H=\{x\mid h(x)\geq \boldsymbol{0}\}\), and construct orthotopes \(B_{\mathrm{in}}(H)=\{x\mid b_{\mathrm{in}}(x)\geq \boldsymbol{0}\}\) and \(B_{\mathrm{out}}(H)=\{x\mid b_{\mathrm{out}}(x)\geq \boldsymbol{0}\}\), with \(b_{\mathrm{in}}(x)=2\begin{bmatrix} \boldsymbol{I}_2 & -\boldsymbol{I}_2 \end{bmatrix}^\intercal x+\boldsymbol{1}_4\) and \(b_{\mathrm{out}}(x)= \begin{bmatrix} \boldsymbol{I}_2 &
-\boldsymbol{I}_2 \end{bmatrix}^\intercal x+\boldsymbol{1}_4\), respectively, as in Definition 3. Polytope \(H\) and orthotopes
\(B_{\mathrm{in}}(H)\), \(B_{\mathrm{out}}(H)\) are depicted in Figure 4. Then, using Lemma 1, we define predicates \(\mu_{\mathrm{in}}^i:=(x_i\in \mathrm{Proj}_{x_i}(B_{\mathrm{in}}(H))=(-0.5 \leq x_i \leq 0.5)\), \(\mu_{\mathrm{out}}^i:=(x_i\in
\mathrm{Proj}_{x_i}(B_{\mathrm{out}}(H))=(-1 \leq x_i \leq 1)\), with \(i\in \{1,2\}\), allowing us to decompose \(\mu\) and \(\neg\mu\) as \(\mathrm{dec}(\mu,2)=\mu_{\mathrm{in}}^1 \wedge \mu_{\mathrm{in}}^2\) and \(\mathrm{dec}(\neg\mu,2)=\neg\mu_{\mathrm{out}}^1 \wedge \neg \mu_{\mathrm{out}}^2\). Finally, \(\mathrm{dec}(\phi,2)=\phi^1\wedge\phi^2\) and \(\mathrm{dec}(\psi,2)=\psi^1\wedge\psi^2\), where \(\phi^i= \square_{[a,b]} \mu_{\mathrm{in}}^i\) and \(\psi^i= \square_{[a,b]} \neg\mu_{\mathrm{out}}^i\), with \(i\in \{1,2\}\), are valid decompositions of \(\phi\) and \(\psi\),
respectively.*

STL is equipped with robustness metrics for assessing the robust satisfaction of a formula [18]. A scalar-valued function \(\rho^\phi(\boldsymbol{x},k)\) of a signal \(\boldsymbol{x}\) and time \(k\) indicates how robustly a signal \(\boldsymbol{x}\) satisfies a formula \(\phi\) at time \(k\). The robustness function is defined recursively in [18].

Let \(\phi_k\) be a newly assigned STL specification at time \(k\), and assume that \(\mathrm{dec}(\phi_k,\nu_k)=\wedge_{j=1}^{\nu_k}\phi_k^j\). In the following, we assume that the total number of agents, denoted by \(|\mathcal{I}|\), satisfies \(|\mathcal{I}|>\nu_k\), otherwise, the following filtering procedure is ignored. Although, \(\phi_k\) is split into \(\nu_k\) agent-level sub-specifications, the assignment of \(\phi_k^j\) to a specific agent \(i\in\mathcal{I}\) is not determined by \(\mathrm{dec}(\phi_k,\nu_k)\). In fact, each sub-specification, \(\phi_k^j\), may be assigned to any agent \(i\in\mathcal{I}\) resulting in \(\nu_k|\mathcal{I}|\) agent-specification pairs. To reduce the number of possible pairs, we utilize the following heuristic approach: Based on the quantitative semantics of STL introduced earlier, we compute the robustness function of \(\phi_k^j\) over the \(i\)th agent’s trajectory. This metric indicates roughly the distance between the trajectory of agent \(i\) and the space of signals that satisfy \(\phi_k^j\) [25], thereby facilitating a preliminary agent-specification assessment. We approximate the \(i\)th agent’s trajectory by \(\boldsymbol{z}^{i,k-1}=\{z_0^{i,k-1},z_1^{i,k-1},\ldots\}\), \(i\in \mathcal{I}\), that is, the signal associated with the (deterministic) nominal dynamics of agent \(i\), 4 , computed at time \(k-1\), which is available at time \(k\). Specifically, for each \(j\in\{1,\ldots,\nu_k\}\), and \(i\in\mathcal{I}\), we efficiently calculate \(\rho^{\phi_k^j}(\boldsymbol{z}^{i,k-1},k)\), which requires negligible computational cost. Subsequently, for each \(\phi_k^j\), we collect the largest \(\nu_k\) values of \(\rho^{\phi_k^j}(\boldsymbol{z}^{i,k-1},k)\) among the \(|\mathcal{I}|\) options, resulting in \((\nu_k)^2\) agent-specification pairs. Should \(v_k<<|\mathcal{I}|\), the result is a significant reduction.

Let \(\phi\) be a specification which can be decomposed into \(\nu\) sub-specifications. After filtering agent-specification pairs and attempting optimization problem 9 with the remainder, a list \(\mathbb{L} \subset \mathcal{I} \times \{1, \cdots, \nu\}\) of feasible agent-specification pairs is generated. Each feasible agent-specification pair has a local risk value 17 denoted by \(r^i_{\phi_j,mpc}\), with \(i \in \mathcal{I}\) and \(j \in \{1, \cdots, \nu\}\). Our objective is to auction sub-specifications to agents based on the local risk values, obtaining a definitive assignment in the process. Here, each agent can only receive one specification, and all specifications must be allocated.

To determine an agent-specification assignment, we solve the following binary optimization problem, given by \[\tag{19} \begin{align} \min_{\boldsymbol{\lambda}} \;\sum_{(i,j) \in \mathbb{L}} &\lambda_{i,j} r^i_{\phi_j,mpc},\\ \text{s.t. } \sum_{i \in \mathcal{I}} \lambda_{i,j} &= 1, \;\forall j \in \{1, \cdots, \nu\} \tag{20}\\ \sum_{j \in \{1, \cdots, \nu\}} \lambda_{i,j} &\leq 1, \;\forall i \in \mathcal{I}, \tag{21}\\ \lambda_{i,j} &= 0, \;\;\;\;\;\; \forall (i,j) \notin \mathbb{L},\\ \lambda_{i,j} &\in \{0,1\}, \;\forall (i,j) \in \mathbb{L}, \end{align}\] where \(\boldsymbol{\lambda} =\{\lambda_{i,j} \mid (i,j) \in \mathbb{L}\}\). Here, constraint 20 entails that each specification must be assigned to exactly one agent; constraint 21 entails that each agent can at most receive one specification; and \(\lambda_{i,j}=1\) implies agent \(i\) and sub-specification \(j\) are chosen as an agent-specification pair. The cost minimizes the sum of the local risk values, thereby ensuring the best probability satisfaction of specification \(\phi\). Should no solution be found, specification \(\phi\) will be rejected.

To enact the proposed method, we consider Algorithm 5, which demonstrates multi-agent task allocation and control strategy updates for each agent at every time step. For a more detailed algorithm on single-agent control synthesis, we defer to [6] and the algorithm therein.

Let us consider the recursive feasibility of updating a control strategy for each agent at any time.

**Theorem 2** (Recursive feasibility). *If there exists a feasible control strategy \(\boldsymbol{f}^{\mathcal{I},k}\), at time \(k\), then there exists a feasible control
strategy \(\boldsymbol{f}^{\mathcal{I},k+1}\) at time \(k+1\).*

*Proof.* It is sufficient to prove that the existence of \(\boldsymbol{f}^{i,k}\) implies the existence of \(\boldsymbol{f}^{i,k+1}\) for each agent \(i
\in \mathcal{I}\). Notice that should an agent either accept a new sub-specification or measurement only, an updated control strategy is obtained, respectively, as \(\boldsymbol{f}^{i,k+1}=\boldsymbol{f}^{(i,j)}\) or
\(\boldsymbol{f}^{i,k+1}=\boldsymbol{f}^{i}_{meas}\). Should neither be available, the previous control strategy is again implemented, \(\boldsymbol{f}^{i,k+1}=\boldsymbol{f}^{i,k}\),
completing the proof. ◻

For each specifications \(\phi\) accepted at time \(k\), by subgroup \(I \subseteq \mathcal{I}\), we need that \(\mathbb{P}_{\boldsymbol{f}}(\phi, k) \geq 1-r_{\phi, \max}\). Here, it is sufficient to show that \(\forall i \in I\), \(\mathbb{P}_{\boldsymbol{f}}(\phi^i, k) \geq 1-r_{\phi^i, \max}\). The latter, in relation to optimization problem 9 , is discussed within open-loop and closed-loop implementation in our previous work [6]. In said paper, we established the validity of open-loop implementation and suggested that closed-loop implementation follows naturally from the convex unimodality of the disturbance.

**Remark 1**. *Note that the local risk value \(r_{\phi^j, mpc}\) compared to the average maximal risk \(r_{\phi^j,\max}\) is conservative, \(r_{\phi^j, mpc} \geq r_{\phi^j,\max}\), due to a union-bound argument used in proving Prop. 1. Also, the obtained task allocation may not always be
optimal or feasible due to our heuristic choice of filtering, even if an allocation exists.*

We investigate a multi-bus routing scenario in a tourism attraction point to validate the effectiveness of the proposed method. The scenario contains four bus terminals denoted as T-X, where X \(\in \{\)A, B, C, D\(\}\), four tourist gathering points denoted as GP-Z, where Z \(\in \{\)I, II, III, IV\(\}\), and a single unloading point denoted as ULP (see Figure 6). Four buses, A to D, are tasked with picking up tourists from the gathering points and transporting them to the unloading point. These buses initiate operations from their respective terminals and are expected to return when required. All buses are confined within the attraction point symbolized as ‘BOX’. The buses must avoid running into two buildings denoted as B-Y, with Y \(\in \{1, 2\}\). When the tourists at a gathering point reach a certain number, a bus should be available to transport them to the unloading point within a tolerable time limit. Therefore, this study needs to handle dynamically allocated routing tasks, implying that new routing tasks may be assigned at any time \(k \in \{0,\cdots,N-1\}\). We define a finite control horizon with \(N = 25\) and use predicates \(\mu_{\mathrm{R}} := (x_k \in \mathrm{R})\) to indicate that the state \(x_k\) of a bus \(X\) resides within a region \(\mathrm{R}\). Note that all regions \(\mathrm{R}\) are rectangular and can be represented as normalized polyhedral predicates.

In addition to global and local specifications investigated in this paper, the scenario also allows for specifications explicitly assigned to individual agents, such as routing a specific bus to a designated region. The overall dynamically allocated specifications in this scenario are listed as follows.

When initialized, each bus X starts from T-X and is assigned a safety specification \(\varphi_{(0,X)} \!:=\! \square_{[0, N]} (\mu_{\mathrm{BOX}} \wedge \lnot \mu_{\mathrm{B}\boldsymbol{-}1} \wedge \lnot \mu_{\mathrm{B}\boldsymbol{-}2})\).

At time \(k=2\), a global task is given to the buses, asking them to transport tourists from GP-I, II, IV to ULP within 15 time steps. We decompose the global specification into three local specifications given by:

\(\varphi_{(2,\mathrm{I})}\!:=\!\lozenge_{[2, 9]} \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{I}} \!\wedge\! \square_{[2, 9]} ( \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{I}} \!\rightarrow\! \lozenge_{[0, 7]} \mu_{\mathrm{ULP}})\),

\(\varphi_{(2,\mathrm{II})}\!:=\!\lozenge_{[3, 10]} \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{II}} \!\wedge\! \square_{[3, 10]} ( \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{II}} \!\rightarrow\! \lozenge_{[0, 7]} \mu_{\mathrm{ULP}})\),

\(\varphi_{(2,\mathrm{IV})}\!:=\!\lozenge_{[4, 11]} \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{IV}} \!\wedge\! \square_{[4, 11]} ( \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{IV}} \!\rightarrow\! \lozenge_{[0, 7]} \mu_{\mathrm{ULP}})\).At time \(k=8\), a global task is given to the buses, asking them to transport tourists from GP-II, III to ULP within 15 time steps. We decompose the global specification into two local specifications given by:

\(\varphi_{(8,\mathrm{III})}\!:=\!\lozenge_{[8, 15]} \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{III}} \!\wedge\! \square_{[8, 15]} ( \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{III}} \!\rightarrow\! \lozenge_{[0, 7]} \mu_{\mathrm{ULP}})\),

\(\varphi_{(8,\mathrm{II})}\!:=\!\lozenge_{[9, 16]} \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{II}} \!\wedge\! \square_{[9, 16]} ( \mu_{\mathrm{GP}\boldsymbol{-}\mathrm{II}} \!\rightarrow\! \lozenge_{[0, 7]} \mu_{\mathrm{ULP}})\).At time \(k=15\), each bus \(X\) is assigned with a ‘back to terminal’ specification \(\varphi_{(15,X)}\!:=\!\lozenge_{[15, 23]} \square_{[0, 2]} \mu_{\mathrm{T}\boldsymbol{-}\mathrm{X}}\).

All four buses have the same dynamic models described by system 1 with \(A\!=\!B\!=\!I_2\) and \(w^i(k) \sim \mathcal{N}(0,\sigma I_2)\) for all \(i \in \mathcal{I}\), with \(\sigma=0.001\). The states and the control inputs are their positions and velocities, respectively. To incorporate the heterogeneity of practical systems, we assume that the buses have different speed limits, i.e., \(4\,\)m/s, \(5\,\)m/s, \(6\,\)m/s, and \(7\,\)m/s for buses A to D, respectively. We consider a stabilizing feedback gain \(K\approx -0.618 I_2\) for Eq. 5 , a maximal risk \(r_{*, \max}=0.5\) for any global specification or explicit specification (\(k=0\), \(k=15\)), individual maximal risk based on Eq. 18 for any local specification (\(k=2\), \(k=8\)) and a cost \(J(\boldsymbol{z}, \boldsymbol{v}, \boldsymbol{r}) = u^TRu +\sum_{i=1}^N r(i)\) for Eq. 10 with \(R=0.001I_2\). A coordinate transformation ensures all PRS are spherical and the dynamics are normalized.

Similar to [6], instead of the non-linear constraints in Eq. 15 , we impose the following equality and inequality constraints, implying that the original constraints hold. These constraints are computationally preferable, as they are quadratic convex equality and linear inequality constraints. We replace the non-linear equality 15 with a quadratic equality \(r(k) = a \rho(k)^2 +b\) and a linear inequality \(0.01 \leq r(k) \leq 1\), with \(a=-0.005\) and \(b=1.01\). One can verify that \(a \rho(k)^2 +b \geq \frac{n}{\rho(k)^2}\) always holds for \(r(k) \in [\,0.01, 1\,]\). Therefore, the transformed optimization problem provides a sound solution to obtaining local risk values and controllers.

After decomposing, filtering, control synthesis, and auctioning, we obtain the results as seen in Figure 6.

The logging information of the experiment indicates that: 1) at time \(k=2\), buses A, B, and D are assigned with \(\varphi_{(2,\mathrm{II})}\), \(\varphi_{(2,\mathrm{IV})}\), and \(\varphi_{(2,\mathrm{I})}\), respectively; 2) at time \(k=8\), buses A and D are assigned with \(\varphi_{(8,\mathrm{II})}\) and \(\varphi_{(8,\mathrm{III})}\), respectively; 3) at time \(k=15\), buses B, C, and D have accepted the return specification, while bus A has rejected it. Bus C is not assigned any task throughout the experiment and, as a result, stays at its terminal. Since buses A and D accept additional tasks at \(k=8\), our method allows agents to adjust their behaviours according to newly assigned tasks. As mentioned, bus A has rejected the return task at time \(k=15\). This is due to the incapability to move long distances considering its small speed limit. Additionally, all buses are confined within the ‘BOX’ region and kept outside the buildings, implying the compatibility of our method to explicitly assigned specifications. Part of the trajectory of bus D intersects with B-2 due to naive linear interpolation. In practice, critical safety can be guaranteed using nonlinear interpolation which may generate curved trajectories navigating around the building.

**Remark 2**. *In this experiment, the computation of control inputs by solving problem 9 is distributed to each agent, leading to a computational complexity linear to the number of assigned specifications. In
this sense, our proposed method allows for a comparable computational load of each agent to a conventional single-agent case, allowing it to be implemented in practical scenarios.*

This paper considered real-time task allocation and control synthesis for stochastic heterogeneous linear multi-agent systems with probabilistic satisfaction of signal temporal logic specifications. The proposed method first decomposed the specification onto the single-agent level, to next use robustness of STL to filter the most promising agent-specification pairs. Subsequently, tube-based model predictive control was used to synthesize a control strategy and determine the local risk value for each remaining agent-specification pair. Finally, an auctioning algorithm based on the local risk values determined the optimal assignment. The proposed method was demonstrated through a multi-bus scenario.

In future work, we will consider resolving conflict due to the rejection of specifications and consider a fully centralized control synthesis. Additionally, improvements upon the heuristic approach and the conservativeness due to union bound argument will be considered.

[1]

V. Raman, A. Donzé, M. Maasoumy, R. M. Murray, A. Sangiovanni-Vincentelli, and S. A. Seshia, “Model predictive control with signal temporal logic
specifications,” in *53rd IEEE Conference on Decision and Control*.1em plus 0.5em minus 0.4emIEEE, 2014, pp. 81–87.

[2]

D. Sun, J. Chen, S. Mitra, and C. Fan, “Multi-agent motion planning from signal temporal logic specifications,” *IEEE Robotics and Automation Letters*, vol. 7, no. 2,
pp. 3451–3458, 2022.

[3]

S. Liu, A. Saoud, P. Jagtap, D. V. Dimarogonas, and M. Zamani, “Compositional synthesis of signal temporal logic tasks via assume-guarantee contracts,” in *2022 IEEE 61st
Conference on Decision and Control (CDC)*, 2022, pp. 2184–2189.

[4]

L. Lindemann and D. V. Dimarogonas, “Barrier function based collaborative control of multiple robots under signal temporal logic tasks,” *IEEE Transactions on Control of
Network Systems*, vol. 7, no. 4, pp. 1916–1928, 2020.

[5]

T. Yang, Y. Zou, S. Li, and Y. Yang, “Distributed model predictive control for probabilistic signal temporal logic specifications,” *IEEE Transactions on Automation
Science and Engineering*, pp. 1–11, 2023.

[6]

M. H. W. Engelaar, Z. Zhang, M. Lazar, and S. Haesaert, “Risk-aware mpc for stochastic systems with runtime temporal logics,” *arXiv preprint arXiv:2402.03165*,
2024.

[7]

E. Nunes, M. Manner, H. Mitiche, and M. Gini, “A taxonomy for task allocation problems with temporal and ordering constraints,” *Robotics and Autonomous Systems*,
vol. 90, pp. 55–70, 2017.

[8]

H. Liu, P. Zhang, B. Hu, and P. Moore, “A novel approach to task assignment in a cooperative multi-agent design system,” *Applied Intelligence*, vol. 43, pp. 162–175,
2015.

[9]

D. P. Bertsekas, “Auction algorithms for network flow problems: A tutorial introduction,” *Computational optimization and applications*, vol. 1, pp. 7–66, 1992.

[10]

A. Zhu and S. X. Yang, “A neural network approach to dynamic task assignment of multirobots,” *IEEE transactions on neural networks*, vol. 17, no. 5, pp. 1278–1287,
2006.

[11]

G. Bruno and D. Antonelli, “Dynamic task classification and assignment for the management of human-robot collaborative teams in workcells,” *The International Journal of
Advanced Manufacturing Technology*, vol. 98, pp. 2415–2427, 2018.

[12]

M. Charitidou and D. V. Dimarogonas, “Signal temporal logic task decomposition via convex optimization,” *IEEE Control Systems Letters*, vol. 6, pp. 1238–1243,
2021.

[13]

K. Leahy, A. Jones, and C.-I. Vasile, “Fast decomposition of temporal logic specifications for heterogeneous teams,” *IEEE Robotics and Automation Letters*, vol. 7,
no. 2, pp. 2297–2304, 2022.

[14]

Z. Zhang and S. Haesaert, “Modularized control synthesis for complex signal temporal logic specifications,” in *2023 62nd IEEE Conference on Decision and Control
(CDC)*.1em plus 0.5em minus 0.4emIEEE, 2023, pp. 7856–7861.

[15]

S. S. Farahani, R. Majumdar, V. S. Prabhu, and S. Soudjani, “Shrinking horizon model predictive control with signal temporal logic constraints under stochastic
disturbances,” *IEEE Transactions on Automatic Control*, vol. 64, no. 8, pp. 3324–3331, 2019.

[16]

K. A. Mustafa, O. de Groot, X. Wang, J. Kober, and J. Alonso-Mora, “Probabilistic risk assessment for chance-constrained collision avoidance in uncertain dynamic
environments,” *arXiv preprint arXiv:2302.10846*, 2023.

[17]

M. Engelaar, S. Haesaert, and M. Lazar, “Stochastic model predictive control with dynamic chance constraints,” in *27th International Conference on System Theory, Control
and Computing*, 2023, pp. 356–361.

[18]

A. Donzé and O. Maler, “Robust satisfaction of temporal logic over real-valued signals,” in *International Conference on Formal Modeling and Analysis of
Timed Systems*.1em plus 0.5em minus 0.4emSpringer, 2010, pp. 92–106.

[19]

D. Bertsekas and S. E. Shreve, *Stochastic optimal control: the discrete-time case*.1em plus 0.5em minus 0.4emAthena Scientific, 1996, vol. 5.

[20]

S. Dharmadhikari and K. Jogdeo, “Multivariate unimodality,” *The Annals of Statistics*, pp. 607–613, 1976.

[21]

S. Sadraddini and C. Belta, “Robust temporal logic model predictive control,” in *2015 53rd Annual Allerton Conference on Communication, Control, and Computing
(Allerton)*.1em plus 0.5em minus 0.4emIEEE, 2015, pp. 772–779.

[22]

L. Hewing and M. N. Zeilinger, “Stochastic model predictive control for linear systems using probabilistic reachable sets,” in *2018 IEEE Conference on Decision and
Control (CDC)*, 2018, pp. 5182–5188.

[23]

M. Farina, L. Giulioni, and R. Scattolini, “Stochastic linear model predictive control with chance constraints–a review,” *Journal of Process Control*, vol. 44, pp.
53–67, 2016.

[24]

K. Leahy, M. Mann, and C.-I. Vasile, “Rewrite-based decomposition of signal temporal logic specifications,” in *NASA Formal Methods Symposium*.1em plus 0.5em minus
0.4emSpringer, 2023, pp. 224–240.

[25]

G. E. Fainekos and G. J. Pappas, “Robustness of temporal logic specifications for continuous-time signals,” *Theoretical Computer Science*, vol. 410, no. 42, pp.
4262–4291, 2009.

This work is supported by the Dutch NWO Veni project CODEC under grant number 18244 and the European project SymAware under grant number 101070802. \(^1\)Department of Electrical Engineering (Control Systems Group), Eindhoven University of Technology, The Netherlands. Emails:{m.h.w.engelaar, z.zhang3, m.lazar, s.haesaert}

**tue.nl?**. \(^2\)Division of Decision and Control Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Sweden. Email: vlahakis@kth.se↩︎\(\mathcal{Q}^i_w\) is in the closed convex hull of all uniform distributions on symmetric compact convex bodies in \(\mathbb{R}^n\) (c.f. [20]).↩︎