Tensorized NeuroEvolution of Augmenting Topologies

for GPU Acceleration

April 02, 2024

The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm’s computational efficiency remains an impediment, limiting its scalability potential. In response, this paper introduces a tensorization method for the NEAT algorithm, enabling the transformation of its diverse network topologies and associated operations into uniformly shaped tensors for computation. This advancement facilitates the execution of the NEAT algorithm in a parallelized manner across the entire population. Furthermore, we develop TensorNEAT, a library that implements the tensorized NEAT algorithm and its variants, such as CPPN and HyperNEAT. Building upon JAX, TensorNEAT promotes efficient parallel computations via automated function vectorization and hardware acceleration. Moreover, the TensorNEAT library supports various benchmark environments including Gym, Brax, and gymnax. Through evaluations across a spectrum of robotics control environments in Brax, TensorNEAT achieves up to 500x speedups compared to the existing implementations such as NEAT-Python. Source codes are available at: https://github.com/EMI-Group/tensorneat.

<ccs2012> <concept> <concept_id>10003752.10003809.10003716.10011136.10011797.10011799</concept_id> <concept_desc>Theory of computation Evolutionary algorithms</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10003752.10003809.10010170.10010173</concept_id> <concept_desc>Theory of computation Vector / streaming algorithms</concept_desc> <concept_significance>500</concept_significance> </concept> </ccs2012>

Neuroevolution has emerged as a distinct branch within the field of artificial intelligence (AI). Unlike the common approach in machine learning that uses stochastic gradient descent, neuroevolution employs evolutionary algorithms for network optimization. Beyond the traditional limit of parameter optimization, it also involves improving elements such as activation functions, hyperparameters, and the overall network structure. Moreover, compared to standard machine learning methods which often converge to a single solution, neuroevolution continually promotes a varied set of solutions throughout its exploration [1]. These defining attributes not only give neuroevolution a strong sense of novelty and diversity but also empower it for certain open-ended challenges [2], [3].

The NeuroEvolution of Augmenting Topologies (NEAT) [4] is well-recognized in the neuroevolution literature. Since its introduction in 2002, NEAT has been shown to be useful in various areas such as game AI [5], [6], robotics [7], [8] and self-driving systems [9]. The original NEAT algorithm provides a basic framework, and the subsequent works have kept exploiting its potential. For instance, offshoots like HyperNEAT [10] and ES-HyperNEAT [11] adopted indirect encoding for expansive networks; DeepNEAT and CoDeepNEAT [12] combined gradient descent methods to delve deeper into neural architecture. Recently, RankNEAT [13] used neuroevolution for preference learning tasks, effectively optimizing network architectures in subjectively labeled data environments. Such ongoing development and flexibility highlight NEAT’s continued importance and appeal in the field.

During the past years, GPU acceleration has been a driving force behind the rapid advancements in AI, especially in the field of deep learning. The advancement in such hardware acceleration has facilitated the expansion of deep learning, with modern large language models incorporating hundreds of billions of parameters [14]. Using GPUs for tasks like inference and back-propagation has led to significant speed improvements. Meanwhile, within the field of neuroevolution, there is also a growing interest in leveraging GPUs for hardware acceleration. Building upon the JAX framework [15], some pioneering works such as EvoJAX [16], evosax [17], and EvoX [18] represent this trend, seeking to tap into the robust computational capabilities of GPU to reduce the runtime of neuroevolution algorithms, especially when handling large population sizes or extensive problem scales.

However, despite the rapid emergence of these GPU-accelerated libraries, NEAT is left behind. Our analysis showed that though the NEAT algorithm has been implemented in various programming languages [19]–[21], few of them harness GPUs to boost their performance. On those rare
occasions when GPUs are used [22], the focus has primarily been on accelerating the network inference process of the NEAT algorithm, while other
crucial parts, such as network search processes, are often overlooked. Moreover, they do not efficiently utilize GPU parallel computing, as key operations like fitness evaluation and mutation are executed sequentially in their frameworks. This limitation
can be attributed to the unique nature of NEAT: It employs networks with *continuously evolving topologies* during the algorithm’s execution. This characteristic poses a challenge for efficient GPU implementation.

To bridge this gap, we develop TensorNEAT, a tensorized NEAT library optimized for GPU acceleration. TensorNEAT employs a new tensorization method that transforms networks of varying topologies into tensors with uniform shape, ensuring that operations in the NEAT algorithm can be executed in parallel across the entire population. Implemented within the JAX framework, TensorNEAT enables automatic GPU acceleration without any specific configuration. In comparison with the existing popular open-source implementation of the NEAT algorithm, TensorNEAT achieves up to 500x speedups. Overall, our contributions are summarized as follows.

We propose a tensorization method, which enables the transformation of networks with various topologies and their associated operations in the NEAT algorithm into uniformly structured tensors for tensor computation. This method allows operations within the NEAT algorithm to be executed in parallel across the entire population, thereby enhancing the efficiency of the process.

We develop TensorNEAT, a GPU-accelerated NEAT library based on JAX, characterized by high efficiency, flexible adaptability, and rich capabilities. TensorNEAT supports full GPU acceleration of representative NEAT algorithms, including the original NEAT algorithm, CPPN [23], and HyperNEAT [10]. It also provides seamless interfaces with advanced control benchmarks, including Brax [24] and Gymnax [25], featuring GPU-accelerated environments with various classical control and robotics control tasks.

We assessed TensorNEAT’s performance in a spectrum of complex robotics control tasks, benchmarked against the NEAT-Python library [19]. The results show that TensorNEAT significantly outperforms in terms of execution speed, especially under high computational demands and across various population sizes and network scales.

Introduced by Kenneth O. Stanley and Risto Miikkulainen in 2002, the NeuroEvolution of Augmenting Topologies (NEAT) algorithm [4] represents a novel approach in neuroevolution. The NEAT algorithm manages a range of neural networks and simultaneously optimizes their topologies and weights to identify the most effective networks tailored for designated tasks. Algorithm 7 in appendix 7 outlines NEAT’s core procedure. Starting with a population of simple neural networks, it iterates through evolutionary cycles of species formation, fitness evaluation, and genetic operations. This process dynamically refines the networks until it achieves desired fitness levels or reaches a generational cap, ultimately yielding the optimal network structure.

Setting itself apart from alternative neuroevolution algorithms, NEAT employs three distinctive techniques:

**Incremental Topological Expansion**: The NEAT algorithm initializes its evolutionary search with the most basic networks, consisting of just a single hidden node bridging inputs to outputs. As evolution advances, the algorithm incorporates new nodes and connections, progressively sophisticating the network topology. This strategy pragmatically narrows down the search space, facilitating the resolution of complex challenges using more streamlined network structures.**Historical Markers for Nodes**: Each node in a NEAT network is tagged with a unique historical marker. During the crossover process in the NEAT algorithm, only nodes with identical markers are combined. This method adeptly navigates the challenges of combining networks that possess different topological configurations.**Species-based Population Segmentation**: NEAT categorizes its entire population into individual species. Conventional genetic procedures, such as selection, mutation, and crossover, are executed independently within each species. This approach serves dual purposes: it not only shields fledgling and potentially advantageous network structures from premature extinction, but also amplifies the diversity of solutions within the evolutionary exploration.

Since its inception, the NEAT algorithm has undergone significant evolution, leading to the emergence of several innovative variants, each addressing unique challenges and applications. HyperNEAT [10] marked a pivotal advancement by employing an indirect encoding scheme. This approach allows efficient handling of large-scale neural networks, leveraging geometric regularities to create complex network topologies that can be applied to tasks requiring substantial representational capacity. Building upon this concept, ES-HyperNEAT [11] further refined the approach, introducing enhanced techniques for evolving network structures with even greater scalability and adaptability.

Then the development of DeepNEAT and CoDeepNEAT [12] represented a significant leap in integrating neuroevolution with deep learning principles. DeepNEAT extended the NEAT algorithm to the realm of deep neural networks. It leverages gradient descent to explore and optimize complex, layered neural architectures. CoDeepNEAT expanded this concept further, introducing a co-evolutionary approach that allowed for the simultaneous evolution of both the topology and components of deep neural networks. This fusion of gradient descent and evolutionary strategies enabled a more thorough and nuanced exploration of neural structures, opening new avenues in areas like feature learning and hierarchical network construction.

RankNEAT [13] is another variant of the NEAT algorithm. It overcomes the limitations of Stochastic Gradient Descent by using neuroevolution to optimize network architectures, effectively reducing overfitting. RankNEAT has proven to be effective in affective computing, outshining traditional methods like RankNet. It is particularly successful in analyzing player arousal from game footage, highlighting its potential in handling subjective data.

Together, these variants of NEAT demonstrate the superior algorithm’s versatility and its continuous adaptation to address the evolving complexities of neural network design and application.

In the past two decades, the research community has witnessed the emergence of various NEAT libraries, including NEAT-Python [19], MultiNEAT [20], and MonopolyNEAT [21]. Among these libraries, the NEAT-Python library [19] stands out as the foremost open-source implementation of NEAT at present. With over 1,200 GitHub stars, it has served as the foundational base for numerous academic research projects related to NEAT [13], [26], [27]. These NEAT implementations use the object-oriented programming paradigm, involving the representation of core NEAT components such as populations, species, genomes, and genes as objects. This programming paradigm provides remarkable transparency, enhancing code readability and thereby enabling those new to the domain to rapidly grasp the complexities of the NEAT algorithm. However, maintaining objects also brings extra computational overheads in the running process, especially when encountered with large-scale problems or substantial population sizes.

Most existing NEAT implementations do not utilize GPUs to accelerate computation, with a few exceptions such as PyTorchNEAT [22]. These libraries integrate tensor deep learning, such as PyTorch [28] and TensorFlow [29], enabling networks generated by NEAT to perform inference on GPUs. This approach also enables batch inference for multiple inputs, thereby enhancing network inference speed when dealing with a large number of input data. However, these libraries still rely on the object-oriented programming paradigm and have optimized only the inference aspect of the network. The acceleration of the neuroevolution process, which NEAT uses for network search, has not yet been achieved. Moreover, although individual network inferences can be batch-processed on GPUs, the inference process for the entire population in the NEAT algorithm remains sequential. Consequently, existing NEAT implementations cannot fully make use of the high parallel processing capabilities of GPUs.

To overcome the limitations of current NEAT implementations and fully leverage modern hardware to accelerate the NEAT algorithm’s efficiency, we introduce a new tensorization method. Figure 1 depicts the key principle of acceleration supporting this approach. Our method facilitates the transformation of various network topologies and their related operations into uniformly structured tensors, suitable for tensor computation. By implementing function vectorization within network operations, our approach enables parallel processing across the entire population, which effectively harnesses the parallel computing power of GPUs. In the following sections, we detail the specific tensorization techniques employed in network encoding and operations.

In NEAT, a network \(\mathcal{N}\) can be represented as: \[\mathcal{N} = \langle N, C \rangle,\] where \(N\), \(C\) are nodes and connections in \(\mathcal{N}\), respectively. They can be represented as: \[N = \{n_1, n_2, n_3, \ldots\} \quad \text{and} \quad C = \{c_1, c_2, c_3, \ldots\},\] with \(n_i\) denoting the \(i\)-th node in \(N\) and \(c_i\) denoting the \(i\)-th connection in \(C\).

In the NEAT algorithm, each node \(n\) can be represented as a tuple consisting of a historical marker and its attributes: \[n = (k, \text{attr}_1, \text{attr}_2, \ldots),\] where \(k \in \mathbb{N}\) is the historical marking, and \(\text{attr}_i\) is the \(i\)-th attribute of \(n\). Similarly, a connection \(c\) can be represented as: \[c = (k_i, k_o, e, \text{attr}_1, \text{attr}_2, \ldots),\] where \(k_i \in \mathbb{N}\) and \(k_o \in \mathbb{N}\) are the historical markings of the input and output nodes of \(c\), respectively, \(e \in \{\texttt{True}, \texttt{False}\}\) is the enabled flag. The attributes of a node \(n\) can be bias \(b\), aggregation function \(f_{\text{agg}}\) like sum, and activation function \(f_{\text{act}}\) like sigmoid and the attributes of a connection \(c\) can be weight \(w\). Attributes that can be represented numerically, such as \(b\) and \(w\), are directly stored in tensors. Those attributes that are not inherently numerical, such as \(f_{\text{agg}}\) and \(f_{\text{act}}\), are stored in tensors as specific integer values, for example, tanh is represented by 1, and sigmoid by 2.

We can use one-dimensional tensors to encode \(n\) and \(c\): \[\begin{align} \boldsymbol{n} & = [k, \text{attr}_1, \text{attr}_2, \ldots] \in \mathbb{R}^{1 + \texttt{noa}(n)},\\ \boldsymbol{c} & = [k_i, k_o, e, \text{attr}_1, \text{attr}_2, \ldots] \in \mathbb{R}^{3 + \texttt{noa}(c)}, \end{align}\] where \(\texttt{noa}(i)\) denotes the number of attributes of \(i\).

Then, the sets \(N\) and \(C\) can be represented as tensors \(\boldsymbol{N}\) and \(\boldsymbol{C}\), respectively: \[\begin{align} \boldsymbol{N} & = [\boldsymbol{n}_1, \boldsymbol{n}_2, \ldots] \in \mathbb{R}^{|N|\times\texttt{len}(\boldsymbol{n})}, \\ \boldsymbol{C} & = [\boldsymbol{c}_1, \boldsymbol{c}_2, \ldots] \in \mathbb{R}^{|C|\times\texttt{len}(\boldsymbol{c})}, \end{align}\] where \(|\cdot|\) is the cardinality of a set, \(\texttt{len}(\cdot)\) is the length of a one-dimensional tensor, and \(\boldsymbol{n}_i\) and \(\boldsymbol{c}_j\) are the tensor representations of the \(i\)-th node and \(j\)-th connection, respectively.

To address the issue where each network in the population possesses a varying number of nodes and connections, we employ tensor padding using `NaN`

values for alignment. We set predefined maximum limits for the number of nodes and
connections, represented as \(|N|_\text{max}\) and \(|C|_\text{max}\), respectively. The resulting padded tensors, \(\hat{\boldsymbol{N}}\) and \(\hat{\boldsymbol{C}}\), are formulated as: \[\begin{align}
\hat{\boldsymbol{N}}& = [\boldsymbol{n}_1, \boldsymbol{n}_2, \ldots, \texttt{NaN}, \ldots] \in \mathbb{R}^{|N|_\text{max}\times\texttt{len}(\boldsymbol{n})}, \\
\hat{\boldsymbol{C}}& = [\boldsymbol{c}_1, \boldsymbol{c}_2, \ldots, \texttt{NaN}, \ldots] \in \mathbb{R}^{|C|_\text{max}\times\texttt{len}(\boldsymbol{c})}.
\end{align}\]

Upon alignment, the tensors \(\hat{\boldsymbol{N}}\) and \(\hat{\boldsymbol{C}}\) of each network in the population can be concatenated, allowing us to express the entire population using two tensors: \(\boldsymbol{P}_{N}\) and \(\boldsymbol{P}_{C}\). These tensors encompass all nodes and connections within the population, respectively. Formally, the concatenated tensors, \(\boldsymbol{P}_{N}\) and \(\boldsymbol{P}_{C}\), are defined as: \[\begin{align} \boldsymbol{P}_{N} & = [\hat{\boldsymbol{N}}_1, \hat{\boldsymbol{N}}_2, \ldots] \in \mathbb{R}^{P \times |N|_\text{max} \times\texttt{len}(\boldsymbol{n})}, \\ \boldsymbol{P}_{C} & = [\hat{\boldsymbol{C}}_1, \hat{\boldsymbol{C}}_2, \ldots] \in \mathbb{R}^{P \times |C|_\text{max} \times\texttt{len}(\boldsymbol{c})}, \end{align}\] where \(P\) denotes the population size, \(\hat{\boldsymbol{N}}_i\) and \(\hat{\boldsymbol{C}}_i\) represent the tensor representations of the node and connection sets of the \(i\)-th network, respectively. Fig. 2 provides a graphical representation of the tensorized encoding process.

Upon encoding networks as tensors, we subsequently express operations on NEAT networks as corresponding tensor operations. In this subsection, we detail the tensorized representations of three fundamental operations: node modification, connection modification, and attribute modification.

Given a network \(\mathcal{N} = \langle N, C \rangle\), node set modifications in \(N\) can involve either the addition of a new node \(n\) or the removal of an existing node \(n\): \[\begin{align} N' = N \cup \{n\} \quad \text{or} \quad N' = N \setminus \{n\}. \end{align}\]

In the tensorized representation \(\hat{\boldsymbol{N}} \in \mathbb{R}^{|N|_\text{max}\times\texttt{len}(\boldsymbol{n})}\) of the node set \(N\), the tensorized operation for node
addition can be represented as: \[\hat{\boldsymbol{N}}[r_i] \leftarrow \boldsymbol{n}_\text{new},\] where \(r_i\) denotes the index of the first `NaN`

row in \(\hat{\boldsymbol{N}}\), \(\boldsymbol{n}_\text{new}\) stands for the tensor representation of the node being added, \([\cdot]\) represents tensor slicing, and
\(\leftarrow\) indicates the assignment operation.

Conversely, the tensorized operation for node removal can be depicted as: \[\hat{\boldsymbol{N}}[r_j] \leftarrow \texttt{NaN},\] with \(r_j\) representing the index of the node for removal.

Given a network \(\mathcal{N} = \langle N, C \rangle\), modifications in the connection set \(C\) can entail either adding a new connection or eliminating an existing connection \(c\): \[\begin{align} C' = C \cup \{c\} \quad \text{or} \quad C' = C \setminus \{c\}. \end{align}\]

In the tensorized representation, \(\hat{\boldsymbol{C}}\), of \(C\), the operations of connection addition or removal can be depicted as: \[\hat{\boldsymbol{C}}[r_i] \leftarrow \boldsymbol{c}_\text{new} \quad \text{or} \quad \hat{\boldsymbol{C}}[r_j] \leftarrow \texttt{NaN},\] respectively. Here, \(r_i\) indicates the index of the
initial `NaN`

row in \(\hat{\boldsymbol{C}}\), \(\boldsymbol{c}_\text{new}\) represents the tensor form of the connection being introduced, and \(r_j\) signifies the index of the connection for removal.

NEAT is designed not only to modify the network structures but also the internal attributes, either in a node or a connection. Specifically, given a node \(n = (k, \text{attr}_1, \text{attr}_2, \ldots)\), where \(k \in \mathbb{N}\) is the historical marking, and \(\text{attr}_i\) is the \(i\)-th attribute of \(n\), when modifying the \(j\)-th attribute in a node, the transformation can be represented as: \[n' = (k, \text{attr}_1, \text{attr}_2, \ldots, \text{attr}'_j, \ldots),\] and the corresponding tensorized operation is: \[\boldsymbol{n}[1 + j] \leftarrow \text{attr}'_j.\] Similarly, for a connection \(c = (k_i, k_o, e, \text{attr}_1, \text{attr}_2, \ldots)\), where \(k_i \in \mathbb{N}\) and \(k_o \in \mathbb{N}\) are the historical markings of the input and output nodes of \(c\), respectively, \(e \in \{\text{True}, \text{False}\}\) is the enabled flag, when modifying the \(j\)-th attribute in a connection, the transformation can represented as: \[c' = (k_i, k_o, e, \text{attr}_1, \text{attr}_2, \ldots, \text{attr}'_j, \ldots),\] and the corresponding tensorized operation is: \[\boldsymbol{c}[3 + j] \leftarrow \text{attr}'_j.\]

By combining the aforementioned three operations, operations for searching networks in the NEAT algorithm including Mutation and Crossover can be transformed into operations on tensors. Fig. 3 provides a graphical representation of the tensorized network operations.

Another crucial component in NEAT is the inference process, where a network receives inputs and generates corresponding outputs based on its topologies and weights. For a network \(\mathcal{N}\), given its node tensor \(\hat{\boldsymbol{N}}\) and connection tensor \(\hat{\boldsymbol{C}}\), the inference process can be represented as: \[\boldsymbol{O} = \text{inference}_{\hat{\boldsymbol{N}}, \hat{\boldsymbol{C}}}(\boldsymbol{I}),\] where \(\boldsymbol{O}\) and \(\boldsymbol{I}\) denote the outputs and inputs in the inference process, respectively.

In our tensorization method, the inference process is divided into two stages: transformation and calculation. ‘Transformation’ involves converting the node tensor \(\hat{\boldsymbol{N}}\) and connection tensor \(\hat{\boldsymbol{C}}\) into formats more conducive to network inference. ‘Calculation’ refers to computing the output using the tensors produced in the transformation stage. When a network undergoes multiple inference operations, it only needs to be transformed once. Networks in the NEAT algorithm can be categorized as either feedforward or recurrent, based on the presence or absence of cycles in their topological structure. Here, we primarily focus on the transformation and calculation processes in feedforward networks.

In feedforward networks, the transformation process creates two new tensors: the topological order of nodes \(\boldsymbol{N}_{\text{order}} \in \mathbb{R}^{|N|_\text{max}}\) and the expanded connections \(\boldsymbol{C}_{\text{exp}} \in \mathbb{R}^{|N|_\text{max} \times |N|_\text{max} \times \texttt{noa}(c)}\), where \(|N|_\text{max}\) represents the predefined maximum limit for the number of nodes and \(\texttt{noa}(c)\) denoting the number of attributes of connections in the network.

Given the absence of cycles in the network, topological sorting is employed to obtain \(\boldsymbol{N}_{\text{order}}\). For \(\boldsymbol{C}_{\text{exp}}\), the generation rule can be represented as: \[\boldsymbol{C}_{\text{exp}}[\hat{\boldsymbol{C}}[i][0, 1]] \leftarrow \begin{cases} \texttt{NaN}, & \hat{\boldsymbol{C}}[i][2] = 0\\ \hat{\boldsymbol{C}}[i][2:], & \hat{\boldsymbol{C}}[i][2] = 1 \end{cases},\] where \(i = 0, 1, 2, \ldots, |C|_\text{max}\), and \(|C|_\text{max}\) is the predefined maximum limit for the number of connections. Recall that in each line \(c\) of \(\hat{\boldsymbol{C}}\), the values are \((k_i, k_o, e, \text{attr}1, \text{attr}2, \ldots)\), with \(c[0]=k_i\) and \(c[1]=k_o\) indicating the indices of the input and output nodes of the connection, respectively, and \(c[2]=e \in \{\texttt{True}, \texttt{False}\}\) representing the enabled flag. Locations not updated by this rule default to the value \(\texttt{NaN}\). The tensors \(\hat{\boldsymbol{N}}\), \(\boldsymbol{N}_{\text{order}}\), and \(\boldsymbol{C}_{\text{exp}}\) are then used as inputs for the calculation process.

In the forward process, we utilize the transformed tensors \(\hat{\boldsymbol{N}}\), \(\boldsymbol{N}_{\text{order}}\), \(\boldsymbol{C}_{\text{exp}}\), and the input \(\boldsymbol{I}\) to calculate the output \(\boldsymbol{O}\). We maintain a tensor \(\boldsymbol{v} \in \mathbb{R}^{|N|_\text{max}}\) to store the values of nodes in the network. Initially, \(\boldsymbol{v}\) is set to the default value \(\texttt{NaN}\) and then updated with: \[\boldsymbol{v}[k_{\text{input}}] \leftarrow \boldsymbol{I},\] where \(k_{\text{input}}\) denotes the indices of input nodes in the network. The value of nodes is iteratively calculated in the order specified by \(\boldsymbol{N}_{\text{order}}\). The rule to obtain the value \(\boldsymbol{v}[k]\) of node \(\boldsymbol{n}_k\) can be expressed as: \[\begin{align} \boldsymbol{v}[k] \leftarrow f_n(f_c(\boldsymbol{v} \;|\; \boldsymbol{C}_{\text{exp}}[:][k]) \;| \;\hat{\boldsymbol{N}}[k][1:]), \end{align}\] where \(\boldsymbol{C}_{\text{exp}}[:][k]\) indicates the attributes of all connections to \(\boldsymbol{n}_k\), and \(\hat{\boldsymbol{N}}[k][1:]\) denotes the attributes of \(\boldsymbol{n}_k\). \(f_c\) and \(f_n\) are the calculation functions for connections and nodes in the network, respectively. For a network with connection attribute weight \(w\), and node attributes bias \(b\), aggregation function \(f_{\text{agg}}\) and activation function \(f_{\text{act}}\), \(f_c\) and \(f_n\) can be defined as: \[\begin{align} f_c(\boldsymbol{I}_{\text{conn}}\;|\;w) &= w\boldsymbol{I}_{\text{conn}},\\ f_n(\boldsymbol{I}_{\text{node}}\;|\;b, f_{\text{agg}}, f_{\text{act}}) &= f_{\text{act}}(f_{\text{agg}}(\boldsymbol{I}_{\text{node}}) + b), \end{align}\] where \(\boldsymbol{I}_{\text{conn}}\) and \(\boldsymbol{I}_{\text{nodes}}\) denote the inputs for connections and nodes, respectively.

After computing the values of all nodes, the tensor \(\boldsymbol{v}[k_{\text{output}}]\) represents the network’s output, with \(k_{\text{output}}\) indicating the indices of output nodes in the network. Fig. 4 graphically depicts the tensorized network inference process.

JAX [15] is an open-source numerical computing library, offering APIs similar to NumPy [30] and enabling efficient execution across various hardware platforms (CPU/GPU/TPU). Leveraging the capabilities of XLA, JAX facilitates the transformation of numerical code into optimized machine instructions. The optimization techniques provided by JAX have supported the development of numerous projects in evolutionary computation, as evidenced by [16], [17], [31], and [18]. These advancements significantly contribute to JAX’s growing popularity in the scientific community.

TensorNEAT, integrating JAX, utilizes the functional programming paradigm to implement our proposed tensorization methods. This integration allows NEAT to effectively use hardware accelerators such as GPUs and TPUs. With uniform tensor shapes in network
encoding, several NEAT operations such as mutation, crossover, and network inference can be vectorized across the population dimension. This vectorization, leveraging `jax.vmap`

and `jax.pmap`

functions, is suitable for both single
and multi-device configurations.

Designed with user-friendly interfaces, TensorNEAT provides mechanisms for adjusting algorithms to specific requirements. It offers an extensive set of hyperparameters, allowing users to fine-tune various computational elements of the NEAT algorithm. Additionally, its modular problem templates facilitate the integration of specialized problems. TensorNEAT also offers several open interfaces. By implementing a few functions, users can define the behavior of networks within the NEAT algorithm. A notable feature is the interface supporting the evolution of advanced network architectures, including Spiking Neural Networks [32] and Binary Neural Networks [33]. Details on the hyperparameters and the interfaces are elaborated in Appendix 8 and Appendix 9.

Extending beyond the conventional NEAT paradigm, TensorNEAT includes notable algorithmic extensions such as Compositional Pattern Producing Networks (CPPN) [23] and HyperNEAT [10], tailored for parallel processing on hardware accelerators. For evaluation, TensorNEAT provides a suite of standard test benchmarks, covering areas from numerical optimization to function approximation. Moreover, it integrates seamlessly with leading reinforcement learning environments like Gym [34], and hardware-optimized platforms such as gymnax [25] and Brax [24], thus allowing users to evaluate the performance of the NEAT algorithm in various settings.

This section presents an empirical comparison between TensorNEAT and NEAT-Python, focusing on their performance in three robotics control tasks: Swimmer, Hopper, and Halfcheetah, within the Brax environment. The experiments were conducted with both TensorNEAT and NEAT-Python configured using uniform parameter settings. Data presented are the average outcomes from ten independent trials, complete with 95% confidence intervals to ensure statistical robustness. Details on the experiment settings are shown in Appendix 10.

Task | Framework | Hardware | Time (\(s\)) | Speedup |
---|---|---|---|---|

Swimmer | NEAT-Python | EPYC 7543 (CPU) | \(42279.63 \pm 3031.97\) | \(1.00\) |

2-5 | TensorNEAT | RTX 4090 | \(\boldsymbol{215.70 \pm 6.35}\) | \(\boldsymbol{196.01}\) |

RTX 3090 | \(292.22 \pm 20.19\) | \(144.68\) | ||

RTX 2080Ti | \(434.94 \pm 12.49\) | \(97.21\) | ||

EPYC 7543 (CPU) | \(14678.10 \pm 616.85\) | \(2.88\) | ||

Hopper | NEAT-Python | EPYC 7543 (CPU) | \(14438.13 \pm 900.68\) | \(1.00\) |

2-5 | TensorNEAT | RTX 4090 | \(\boldsymbol{241.30 \pm 9.41}\) | \(\boldsymbol{59.83}\) |

RTX 3090 | \(336.08 \pm 26.96\) | \(42.96\) | ||

RTX 2080Ti | \(518.40 \pm 7.22\) | \(27.85\) | ||

EPYC 7543 (CPU) | \(13473.01 \pm 544.07\) | \(1.07\) | ||

Halfcheetah | NEAT-Python | EPYC 7543 (CPU) | \(149516.00 \pm 4817.90\) | \(1.00\) |

2-5 | TensorNEAT | RTX 4090 | \(\boldsymbol{274.74 \pm 14.21}\) | \(\boldsymbol{544.21}\) |

RTX 3090 | \(487.82 \pm 19.05\) | \(306.50\) | ||

RTX 2080Ti | \(705.13 \pm 16.02\) | \(212.04\) | ||

EPYC 7543 (CPU) | \(15914.08 \pm 4005.08\) | \(9.40\) |

First, we examined the evolution of average population fitness and the cumulative runtime of the algorithms across generations, with a constant population size of 10,000. As depicted in Fig. 5, TensorNEAT exhibits a more rapid improvement in population fitness over the course of the algorithm’s execution. This performance disparity between the two frameworks stems from the modification in the NEAT algorithm after tensorization, including network encoding and the computation of distances between networks. Furthermore, the analysis of execution times, illustrated in the final panel of Fig. 5, reveals a significant decrease in runtime with TensorNEAT compared to NEAT-Python.

Given the iterative nature of NEAT’s process, there is an expected increase in per-iteration time as network structures become more complex. This phenomenon was investigated by comparing the per-generation runtimes of both algorithms. In the Swimmer and Hopper tasks, as shown in the left panels of Fig. 6, NEAT-Python exhibits a more marked increase in runtime, likely due to its less efficient object encoding mechanism, which becomes increasingly cumbersome with rising network complexity. By contrast, TensorNEAT, employing a tensorized encoding approach constrained by the pre-set maximum values of \(|N|_\text{max}\) and \(|C|_\text{max}\), achieves a consistent network encoding size, resulting in more stable iteration times.

Additionally, we investigated how runtime varies with changes in population size, ranging from 50 to 10,000. As illustrated in the right panels of Fig. 6, NEAT-Python’s runtime significantly increases with larger populations, whereas TensorNEAT only experiences a marginal increase in runtime. TensorNEAT demonstrates advantages across various population sizes, with its benefits being more pronounced in larger populations.

The adaptability of TensorNEAT was further validated by evaluating its performance across various hardware configurations, including the AMD EPYC 7543 CPU and a selection of mainstream GPU models. These evaluations were conducted in the Cart Pole environment, with a consistent population size of \(10,000\). The total wall-clock time was recorded over \(100\) generations of the algorithm, and the results are summarized in Table 1. In every GPU configuration, TensorNEAT demonstrated a significant performance advantage over NEAT-Python. Notably, with the RTX 4090, TensorNEAT achieved a speedup surpassing \(500\times\) in the Halfcheetah environment. It is also noteworthy that TensorNEAT realized a speedup on the same CPU device in comparison to NEAT-Python, especially in the more complex Halfcheetah environment.

Our experiments highlight TensorNEAT’s considerable superiority to NEAT-Python in terms of algorithmic execution speed. Moreover, as the computational demands increase, whether due to larger network structures or higher population sizes, TensorNEAT’s relative time consumption grows more slowly in contrast to NEAT-Python. These findings emphasize the significant acceleration and performance efficiency of the TensorNEAT algorithm, especially with GPU accelerations.

In this work, we address the scalability challenge of NeuroEvolution of Augmenting Topologies (NEAT) by introducing a tensorization method, which transforms the network topologies into uniformly shaped tensors for efficient parallel processing. Building upon the tensorization method, we develop TensorNEAT, which leverages JAX for automated function vectorization and hardware acceleration. Compatible with environments like Gym, Brax, and gymnax, TensorNEAT significantly outperforms traditional NEAT implementations, achieving over 500x speedups in various robotics control tasks.

Looking ahead, our roadmap for TensorNEAT includes expanding its reach to distributed computing environments, transcending the limitations of single-machine setups. Furthermore, we plan to augment TensorNEAT’s suite of functionalities by integrating advanced NEAT variants, such as DeepNEAT and CoDeepNEAT [12], to further enhance its potential in solving complex neuroevolution challenges.

The hyperparameters in TensorNEAT consists of those that influence the algorithm’s dynamics and those who affect network behaviors:

Algorithmic Controls:

`seed`

: Random seed (integer).`fitness_target`

: Target fitness value for termintaion (float).`generation_limit`

: Maximum number of generations for termintaion (float).`pop_size`

: Population size (integer).`network_type`

: Network type, either`feedforward`

(no cycles) or`recurrent`

(with cycles).`inputs`

: Number of network inputs (integer).`outputs`

: Number of network outputs (integer).`max_nodes`

: Max nodes allowed in a network (integer).`max_conns`

: Max connections allowed in a network (integer).`max_species`

: Max species in the population (integer).`compatibility_disjoint`

: Weight for disjoint genes in distance calculation between genomes (float).`compatibility_homologous`

: Weight for homologous genes in distance calculation between genomes (float).`node_add`

: Probability of a node addition during mutation (float).`node_delete`

: Probability of a node deletion during mutation (float).`conn_add`

: Probability of a connection addition during mutation (float).`conn_delete`

: Probability of a connection deletion during mutation (float).`compatibility_threshold`

: Distance threshold for genome speciation (float).`species_elitism`

: Minimum species count to prevent all species are stagnated (integer).`max_stagnation`

: Stagnation threshold for species (integer). If a species does not show improvement for`max_stagnation`

consecutive generations, then this species will be stagnated.`genome_elitism`

: Number of elite genomes preserved for next generation (integer).`survival_threshold`

: Percentage of species survival for crossover (float).`spawn_number_change_rate`

: Rate of change in species size over two consecutive generations (float).

Network Behavior Controls:

`bias_init_mean`

: Mean value for bias initialization (float).`bias_init_std`

: Standard deviation for bias initialization (float).`bias_mutate_power`

: Mutation strength for bias values (float).`bias_mutate_rate`

: Probability of bias value mutation (float).`bias_replace_rate`

: Probability to replace existing bias with a new value during mutation (float).`response_init_mean`

: Mean value for response initialization (float).`response_init_std`

: Standard deviation for response initialization (float).`response_mutate_power`

: Mutation strength for response values (float).`response_mutate_rate`

: Probability of response value mutation (float).`response_replace_rate`

: Probability to replace existing response with a new value during mutation (float).`weight_init_mean`

: Mean value for weight initialization (float).`weight_init_std`

: Standard deviation for weight initialization (float).`weight_mutate_power`

: Mutation strength for weight values (float).`weight_mutate_rate`

: Probability of weight value mutation (float).`weight_replace_rate`

: Probability to replace existing weight with a new value during mutation (float).`activation_default`

: Default value for activation function.`activation_options`

: Available activation functions.`activation_replace_rate`

: Probability to change the activation function during mutation.`aggregation_default`

: Default value for aggregation function.`aggregation_options`

: Available aggregation functions.`aggregation_replace_rate`

: Probability to change the aggregation function during mutation.

TensorNEAT offers users the flexibility to define a custom network that will be optimized by the NEAT algorithms by providing a network interface (as presented in Listing [lst:gene_interface] and Listing [lst:genome_interface]). Users are required to define the specific behaviors associated with their network, including initialization, mutation, distance computation, and inference.

By implementing these interfaces, users can fit their specific requirements and leverage TensorNEAT’s power for a wide range of neural network architectures.

```
[
language=Python,
caption=Gene interface. ,
label=lst:gene_interface
]
class BaseGene:
fixed_attrs = []
custom_attrs = []
def __init__(self):
pass
def new_custom_attrs(self):
raise NotImplementedError
def mutate(self, randkey, gene):
raise NotImplementedError
def distance(self, gene1, gene2):
raise NotImplementedError
def forward(self, attrs, inputs):
raise NotImplementedError
@property
def length(self):
return len(self.fixed_attrs) + len(self.custom_attrs)
```

```
[
language=Python,
caption=Genome interface. ,
label=lst:genome_interface
]
class BaseGenome:
network_type = None
def __init__(
self,
num_inputs: int,
num_outputs: int,
max_nodes: int,
max_conns: int,
node_gene: BaseNodeGene = DefaultNodeGene(),
conn_gene: BaseConnGene = DefaultConnGene(),
):
self.num_inputs = num_inputs
self.num_outputs = num_outputs
self.input_idx = jnp.arange(num_inputs)
self.output_idx = jnp.arange(num_inputs, num_inputs + num_outputs)
self.max_nodes = max_nodes
self.max_conns = max_conns
self.node_gene = node_gene
self.conn_gene = conn_gene
def transform(self, nodes, conns):
raise NotImplementedError
def forward(self, inputs, transformed):
raise NotImplementedError
```

In TensorNEAT, users also have the flexibility to define custom problems they wish to optimize using the NEAT algorithms. This can be done by implementing the `Problem`

interface as illustrated in Alg. [lst:problem_interface]. Within this interface, users are required to provide the evaluation process for the problem, specify the input and output dimensions, and optionally implement a
function to visualize the solution.

```
[
language=Python,
caption=Problem interface in TensorNEAT. ,
label=lst:problem_interface
]
from typing import Callable
from utils import State
class BaseProblem:
jitable = None
def setup(self, randkey, state: State = State()):
"""initialize the state of the problem"""
raise NotImplementedError
def evaluate(self, randkey, state: State, act_func: Callable, params):
"""evaluate one individual"""
raise NotImplementedError
@property
def input_shape(self):
raise NotImplementedError
@property
def output_shape(self):
raise NotImplementedError
def show(self, randkey, state: State, act_func: Callable, params, *args, **kwargs):
raise NotImplementedError
```

In this section, we detail the environments and hyperparameters in experiments.

We compare TensorNEAT to NEAT-Python in three reinforcement learning environments in Brax: Swimmer, Hopper and Halfcheetah. They are robotic control tasks and fig 8 are screenshots of them. The primary objective of the robots is to advance as far as possible in a forward direction. These three environments have different action dimension and observation dimension, which are shown in table 2.

Environments | Action Dimension | Observertion Dimension |
---|---|---|

Swimmer | 2 | 8 |

Hopper | 3 | 11 |

Halfcheetah | 6 | 18 |

Hyperparameters in TensorNEAT:

Algorithmic Controls:

`seed`

:`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]`

;`fitness_target`

:`Inf (to terminate at the fixed generation)`

;`generation_limit`

:`100`

;`pop_size`

:`[50, 100, 200, 500, 1000, 2000, 5000, 10000]`

;`network_type`

:`feedforward`

;`inputs`

: the same as the observation dimension of the environment.`outputs`

: the same as the action dimension of the environment.`max_nodes`

:`50`

;`max_conns`

:`100`

;`max_species`

:`10`

;`compatibility_disjoint`

:`1.0`

;`compatibility_homologous`

:`0.5`

;`node_add`

:`0.2`

;`node_delete`

:`0`

;`conn_add`

:`0.4`

;`conn_delete`

:`0`

;`compatibility_threshold`

:`3.5`

;`species_elitism`

:`2`

;`max_stagnation`

:`15`

;`genome_elitism`

:`2`

;`survival_threshold`

:`0.2`

;`spawn_number_change_rate`

:`0.5`

;

Network Behavior Controls:

`bias_init_mean`

:`0`

;`bias_init_std`

:`1.0`

;`bias_mutate_power`

:`0.5`

;`bias_mutate_rate`

:`0.7`

;`bias_replace_rate`

:`0.1`

;`response_init_mean`

:`1.0`

;`response_init_std`

:`0`

;`response_mutate_power`

:`0`

;`response_mutate_rate`

:`0`

;`response_replace_rate`

:`0`

;`weight_init_mean`

:`0`

;`weight_init_std`

:`1`

;`weight_mutate_power`

:`0.5`

;`weight_mutate_rate`

:`0.8`

;`weight_replace_rate`

:`0.1`

;`activation_default`

:`tanh`

;`activation_options`

:`[tanh]`

;`activation_replace_rate`

:`0`

;`aggregation_default`

:`sum`

;`aggregation_options`

:`[sum]`

;`aggregation_replace_rate`

:`0`

;

Hyperparameters in NEAT-Python:

`fitness_criterion`

:`max`

`fitness_threshold`

:`999`

`pop_size`

:`10000`

`reset_on_extinction`

:`False`

**[DefaultGenome]**`activation_default`

:`tanh`

`activation_mutate_rate`

:`0`

`activation_options`

:`tanh`

`aggregation_default`

:`sum`

`aggregation_mutate_rate`

:`0.0`

`aggregation_options`

:`sum`

`bias_init_mean`

:`0.0`

`bias_init_stdev`

:`1.0`

`bias_max_value`

:`30.0`

`bias_min_value`

:`-30.0`

`bias_mutate_power`

:`0.5`

`bias_mutate_rate`

:`0.7`

`bias_replace_rate`

:`0.1`

`compatibility_disjoint_coefficient`

:`1.0`

`compatibility_weight_coefficient`

:`0.5`

`conn_add_prob`

:`0.5`

`conn_delete_prob`

:`0`

`enabled_default`

:`True`

`enabled_mutate_rate`

:`0.01`

`feed_forward`

:`True`

`initial_connection`

:`full`

`node_add_prob`

:`0.2`

`node_delete_prob`

:`0`

`num_hidden`

:`0`

`num_inputs`

: the same as the observation dimension of the environment.`num_outputs`

: the same as the action dimension of the environment.`response_init_mean`

:`1.0`

`response_init_stdev`

:`0.0`

`response_max_value`

:`30.0`

`response_min_value`

:`-30.0`

`response_mutate_power`

:`0.0`

`response_mutate_rate`

:`0.0`

`response_replace_rate`

:`0.0`

`weight_init_mean`

:`0.0`

`weight_init_stdev`

:`1.0`

`weight_max_value`

:`30`

`weight_min_value`

:`-30`

`weight_mutate_power`

:`0.5`

`weight_mutate_rate`

:`0.8`

`weight_replace_rate`

:`0.1`

**[DefaultSpeciesSet]**`compatibility_threshold`

:`3.0`

**[DefaultStagnation]**`species_fitness_func`

:`max`

`max_stagnation`

:`20`

`species_elitism`

:`2`

**[DefaultReproduction]**`elitism`

:`-9999`

`survival_threshold`

:`0.2`

[1]

K. O. Stanley, J. Clune, J. Lehman, and R. Miikkulainen, “Designing neural networks through neuroevolution,” *Nature Machine Intelligence*, vol. 1, no. 1, pp. 24–35,
2019.

[2]

J. Lehman and booktitle=Proceedings. of the 13th. annual conference on G. and evolutionary computation Stanley Kenneth O, “Evolving a diversity of creatures through novelty search
and local competition,” 2011, pp. 211–218.

[3]

.
[4]

K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,” *Evolutionary Computation*, vol. 10, no. 2, pp. 99–127, 2002.

[5]

K. O. Stanley, B. D. Bryant, I. Karpov, and R. Miikkulainen, “Real-time evolution of neural networks in the NERO video game,” 2006 , isbn = {9781577352815}, vol. 2, p.
1671?1674.

[6]

S. Pham, K. Zhang, T. Phan, J. Ding, and booktitle=Proceedings. of the A. C. on A. I.-. S. Dancy Christopher, “Playing SNES games with neuroevolution of augmenting
topologies,” 2018, vol. 32.

[7]

F. Silva, P. Urbano, S. Oliveira, and A. L. Christensen, “odNEAT: An algorithm for distributed online, onboard evolution of robot behaviours,” Jul. 2012, vol. ALIFE 2012: The
Thirteenth International Conference on the Synthesis and Simulation of Living Systems , booktitle = The Thirteenth International Conference on the Synthesis and Simulation of Living Systems, pp. 251–258, doi: 10.1162/978-0-262-31050-5-ch034.

[8]

J. E. Auerbach and booktitle=Proceedings. of the 13th. annual conference on G. and evolutionary computation Bongard Josh C, “Evolving complete robots with CPPN-NEAT: The utility of
recurrent connections,” 2011, pp. 1475–1482.

[9]

M. E. Yuksel, “Agent-based evacuation modeling with multiple exits using NeuroEvolution of augmenting topologies,” *Advanced Engineering Informatics*, vol. 35, pp.
30–55, 2018.

[10]

K. O. Stanley, D. B. D’Ambrosio, and J. Gauci, “A hypercube-based encoding for evolving large-scale neural networks,” *Artificial life*, vol. 15, no. 2, pp. 185–212,
Apr. 2009, doi: 10.1162/artl.2009.15.2.15202.

[11]

S. Risi, J. Lehman, and booktitle=Proceedings. of the 12th. annual conference on G. and evolutionary computation Stanley Kenneth O, “Evolving the placement and density of neurons
in the hyperneat substrate,” 2010, pp. 563–570.

[12]

R. Miikkulainen *et al.*, “Evolving deep neural networks,” Elsevier, 2019, pp. 293–312.

[13]

K. Pinitas, K. Makantasis, A. Liapis, and G. N. Yannakakis, “RankNEAT: Outperforming stochastic gradient search in preference learning tasks,” 2022 , isbn = {9781450392372},
p. 1084?1092, doi: 10.1145/3512290.3528744.

[14]

T. Brown *et al.*, “Language models are few-shot learners,” *Advances in Neural Information Processing Systems*, vol. 33, pp. 1877–1901, 2020.

[15]

R. Frostig, M. J. Johnson, and C. Leary, “Compiling machine learning programs via high-level tracing,” *Systems for Machine Learning*, vol. 4, no. 9, 2018.

[16]

Y. Tang, Y. Tian, and D. Ha, “EvoJAX: Hardware-accelerated neuroevolution,” 2022 , isbn = {9781450392686}, p. 308?311, doi: 10.1145/3520304.3528770.

[17]

R. T. Lange, “Evosax: JAX-based evolution strategies.” Association for Computing Machinery , address = New York, NY, USA, p. 659?662, 2023 , isbn =
{9798400701207}, doi: 10.1145/3583133.3590733.

[18]

B. Huang, R. Cheng, Y. Jin, and K. C. Tan, “EvoX: A distributed GPU-accelerated library towards scalable evolutionary computation.” 2023 , eprint={2301.12457},
archivePrefix={arXiv}.

[19]

A. McIntyre, M. Kallada, C. G. Miguel, C. Feher de Silva, and M. L. Netto, “Python implementation of the NEAT neuroevolution algorithm.” 2023 ,
howpublished="\url{https://github.com/CodeReclaimers/neat-python}", note="Accessed: 2023-08-07.

[20]

peter-ch, “MultiNEAT: Portable NeuroEvolution library.” 2019 , howpublished="\url{https://github.com/peter-ch/MultiNEAT}", note="Accessed: 2023-08-15.

[21]

b2developer, “MonopolyNEAT: NEAT implemented into monopoly with a knockout tournament scheme.” 2022 , howpublished="\url{https://github.com/b2developer/MonopolyNEAT}",
note="Accessed: 2023-08-15.

[22]

A. Gajewsky, “PyTorch NEAT.” 2023 , howpublished="\url{https://github.com/uber-research/PyTorch-NEAT}", note="Accessed: 2023-08-07.

[23]

K. O. Stanley, “Compositional pattern producing networks: A novel abstraction of development,” *Genetic Programming and Evolvable Machines*, vol. 8, pp. 131–162,
2007.

[24]

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax–a differentiable physics engine for large scale rigid body simulation,” 2021 , booktitle =
{Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks}.

[25]

R. T. Lange, “Gymnax : A JAX-based reinforcement learning environment library.” 2022 , howpublished="\url{https://github.com/ukuleleplayer/pureples}",
note="Accessed: 2023-08-07.

[26]

Z. Gao and G. Lan, “A NEAT-based multiclass classification method with class binarization,” 2021 , isbn = {9781450383516}, p. 277?278, doi: 10.1145/3449726.3459509.

[27]

S. Sarti and booktitle=Applications. of E. C. Ochoa Gabriela, “A NEAT visualisation of neuroevolution trajectories,” 2021 , organization={Springer}, pp. 714–728.

[28]

A. Paszke *et al.*, “Advances in neural information processing systems 32 , editor = Wallach, H. and Larochelle, H. and Beygelzimer, A. and d’Alch?-Buc,
F. and Fox, E. and Garnett, R.” 2019, pp. 8024–8035, [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.

[29]

M. Abadi *et al.*, “TensorFlow, large-scale machine learning on heterogeneous systems.” Nov. 2015, doi: 10.5281/zenodo.4724125 , license = {Apache-2.0}.

[30]

C. R. Harris *et al.*, “Array programming with NumPy,” *Nature*, vol. 585, no. 7825, pp. 357–362, 2020.

[31]

B. Lim, M. Allard, L. Grillotti, and A. Cully, “QDax: On the benefits of massive parallelization for quality-diversity,” 2022 , isbn = {9781450392686}, p. 128?131, doi: 10.1145/3520304.3528927.

[32]

S. Ghosh-Dastidar and H. Adeli, “Spiking neural networks,” *International Journal of Neural Systems*, vol. 19, no. 4, pp. 295–308, 2009.

[33]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks,” *Advances in Neural Information Processing Systems*, vol. 29,
2016.

[34]

G. Brockman *et al.*, “OpenAI gym.” 2016 , eprint={1606.01540}, archivePrefix={arXiv}.