Abstract

Exascale computing is transforming the field of materials science by enabling simulations of unprecedented scale and complexity. We present exa-AMD, an open-source, high-performance simulation code specifically designed for accelerated materials discovery on modern supercomputers. exa-AMD addresses the computational challenges inherent in large-scale materials discovery by employing task-based parallelization strategies and optimized data management tailored for high performance computers. The code features a modular design, supports both distributed and on-node parallelism, and is designed for flexibility and extensibility to accommodate a wide range of materials science applications. We detail the underlying algorithms and implementation, and provide comprehensive benchmark results demonstrating strong scaling across multiple high performance computing platforms. We provide two example applications, the design of Fe-Co-Zr and Na-B-C compounds, to illustrate the code’s effectiveness in accelerating the discovery and characterization of novel materials. With only a set of elements as input, exa-AMD automates the workflow on CPU or GPU-enabled clusters, outputs the structures and energies of promising candidates, and updates the phase diagram. exa-AMD is publicly available on GitHub, with detailed documentation and reproducible test cases to support community engagement and collaborative research. This work aims to advance materials science by providing a robust, efficient, and extensible tool ready for exascale platforms.

exascale computing ,high-performance computing ,materials discovery ,machine learning

PROGRAM SUMMARY/NEW VERSION PROGRAM SUMMARY

Program Title: exa-AMD
CPC Library link to program files: (to be added by Technical Editor)
Developer’s repository link: https://github.com/ml-AMD/exa-amd/
Code Ocean capsule: (to be added by Technical Editor)
Licensing provisions: BSD3
Programming language: Python
Supplementary material: [if any]
Nature of problem (approx. 50-250 words): The discovery of novel functional materials in multinary chemical systems is hindered by the combinatorial explosion of possible compositions and structures, making exhaustive exploration computationally intractable with traditional methods. High-throughput density functional theory (DFT) screening is limited by its immense resource demands, particularly for predicting thermodynamic stability and functional properties in complex spaces like ternary or quaternary alloys where millions of candidates must be evaluated. Additionally, integrating machine learning (ML) acceleration, workflow automation, and exascale resource management into a unified, reproducible framework remains a significant challenge, often resulting in inefficient or non-portable solutions that fail to guide experimental synthesis effectively.
Solution method (approx. 50-250 words): exa-AMD addresses these challenges through a modular, Python-based workflow that automates structure generation via template substitution, rapid stability screening using machine learning models for formation energy prediction followed by DFT calculations for validation. These are managed by the Parsl library for dynamic task distribution across CPU/GPU clusters. The framework efficiently filters candidates by energy thresholds and structural similarity, computes convex hulls for thermodynamic stability assessment, and supports elastic scaling on high performance computing platforms, enabling the discovery of stable and metastable compounds from user-specified elements within hours to days.
Additional comments including restrictions and unusual features (approx. 50-250 words): Structural motifs have to be provided by users in the initial prototype set (e.g., from databases such as Materials Project or user-provided templates), potentially missing novel structures not represented therein. However, users can fully customize the input structure pool, ML models, or DFT backends for specific applications. The code requires access to compatible quantum simulation software (e.g., VASP) and HPC schedulers (e.g., SLURM), with performance optimized for GPU acceleration in ML and DFT stages, though it runs on CPUs as well.

1 Introduction↩︎

The discovery of novel functional materials is one of the major scientific challenges of the twenty-first century. The challenge is particularly acute for multinary systems, where the number of potential compositions and crystal structures grows exponentially with the number of constituent elements, making exhaustive searches with traditional trial-and-error and brute-force approaches practically impractical, due to the combinatorial explosion of atomic configurations and the high computational cost of predicting electronic, magnetic, and thermodynamic properties from first principles. To address this, the field has shifted towards a computational paradigm that leverages high-throughput first-principles calculations to accelerate the discovery cycle, systematically screening hypothetical candidates to identify promising materials for targeted synthesis.

Traditional high-throughput methods, primarily based on density functional theory (DFT), are used to establish large materials databases such as the Materials Project [1], Automatic FLOW for Materials Discovery (AFLOW) [2], and the Open Quantum Materials Database (OQMD) [3], [4]. These approaches focus on computing key properties such as formation energies, band gaps, and elastic moduli for known or enumerated structures, providing valuable insight into phase stability and guiding experimental efforts [1]–[6]. However, they are severely limited in coverage and efficiency for complex systems, as brute-force DFT evaluations become computationally intractable for millions of candidates, often requiring months of supercomputing time and lacking the flexibility to intelligently prioritize low-energy structures without exhaustive calculations.

A solution to this problem is the use of machine learning (ML) models to rapidly predict formation energies and stability metrics for novel materials and down-select candidates before performing costly DFT validation, thereby accelerating the exploration of significantly larger chemical and structural spaces. ML models such as graph neural networks have shown promises in achieving DFT accuracy at a fraction of the cost, reducing screening time from months to minutes while maintaining predictive reliability [7], [8]. This data-driven acceleration is essential for tackling scientific problems like designing rare-earth-free magnets or novel battery materials, where multi-objective optimization across stability, magnetism, and synthesizability is required.

In this work, we present exa-AMD, an open-source, modular, and scalable framework tailored for Accelerated Materials Discovery on exascale platforms. exa-AMD unifies data mining, machine learning, advanced workflow automation, and first-principles computation into a holistic framework, enabling rapid prediction of stable and metastable compounds and property optimization in complex, multi-element chemical spaces that are otherwise inaccessible through conventional empirical or brute-force computational methods. It supports ternary and quaternary alloys, and can be extended to higher-order systems. exa-AMD integrates advanced ML models and high-throughput quantum mechanical calculations to enable rapid screening and characterization of novel compounds through dynamical computing resource management. The software streamlines the entire materials discovery process, from structural hypothesis generation to thermodynamic stability and property prediction, thereby significantly reducing the time and resources needed for discovering promising candidates. The code’s flexible, modular design supports various workflow needs for researchers to efficiently target new compounds for synthesis and technological applications, as will be demonstrated by two examples in designing rare-earth-free magnets and beyond. We also show benchmarking results to demonstrate its excellent scaling on some of the largest high performance computing (HPC) resources available.

2 Program Overview↩︎

The exa-AMD framework is designed as a modular, high-throughput pipeline for accelerated materials discovery, automating the entire process from user-specified chemical elements to the prediction of stable compounds and phase diagrams. Implemented in Python, it leverages ML for rapid stability screening and DFT for precise characterization, while ensuring extensibility, reproducibility, and efficient scaling from laptops to GPU-rich HPC environments. The basic workflow (Fig. 1) is comprised of five major steps: (1) crystal structures construction; (2) rapid stability screening using ML models; (3) duplicate removal and structure selection; (4) first-principles DFT optimization and characterization of physical properties; (5) post-processing evaluates formation energies relative to elemental references, assesses thermodynamic stability via convex hull construction, and generates updated phase diagrams.

Each of these stages is encapsulated as a configurable module, allowing users to adapt to new chemical systems, swap ML models, or integrate alternative quantum calculation packages, with workflow orchestration handled by Parsl [9] for dynamic resource management. In the following, we will describe each step in details.

2.1 Crystal structures construction↩︎

At the beginning of the framework, an initial pool of crystal structures in the Crystallographic Information File (CIF) format should be provided. They are the “seeds” which the new materials will be generated from. These crystal structure files can be obtained from existing databases (Materials Project [1], GNoME [10], OQMD [3], [4], AFLOW [2], and NovoMag [11], ensuring a broad coverage of known and plausible motifs. By default, exa-AMD provides an initial pool of X number of structures. Users can also prepare their own prototype crystal structures as their customized initial structure pool of their research interest.

Hundreds of thousands to nearly one million candidate structures are then generated using three methods: (i) systematic elemental substitution, where ... (ii) combinatorial atom-type shuffling from these prototype structure [12]–[14], where ... (iii) lattice-volume scaling, which allows for the sampling of realistic bond lengths across different chemistries, typically with a factor of 0.9–1.1 compared to the reference bond lengths, as described in [12]–[14].

2.2 Rapid stability screening using ML models↩︎

To efficiently predict the formation energies and eliminate high-energy candidates, exa-AMD employs an ML model, Crystal Graph Convolutional Neural Network (CGCNN), for high-throughput stability screening [7]. CGCNN represents each hypothetical crystal as a graph, encoding atoms as nodes (featuring atomic number and chemical environment) and bonds as edges (including interatomic distances). Each hypothetical structure generated from the previous step is evaluated using the CGCNN model that is trained to predict the formation energy. The initial, universal CGCNN model we utilized in this framework was trained based on XXX structures from Materials Project, as described in [7]. The mean absolute error of this model is typically around 0.1–0.2 eV/atom. Down selection is then carried out by thresholding the formation energies to include low-energy configurations (e.g., \(E_\mathrm{f} < 0\) eV/atom) and removing duplicates using structure similarity criteria. This results in typically 1,000–4,000 unique candidates for first-principles calculations in the next step [12]–[14].

Our previous work show that the use of ML models drastically reduces the computational time from months (using brute force DFT) to minutes, for over \(10^6\) structures . This screening enables the rapid exclusion of high-energy or chemically implausible candidates before any quantum calculation is performed.

After the completion of the first round of the framework (i.e., including DFT calculations and post-processing as discussed in Sections 2.3 and 2.4), a system-specific, second-generation CGCNN model is retrained using several hundred to thousands of new DFT-relaxed ground states, reducing the mean absolute errors to as low as 0.03–0.05 eV/atom for the chemical system we investigated [12], [13], [15]–[17]. Key hyperparameters for training include 3–6 convolutional layers, batch size of 64–256, 100–200 epochs, stochastic gradient decent optimization, and 80/10/10 data splits for training/validation/testing, respectively. Since the evaluation of each candidate structure is independent, this approach provides a scalable and accurate procedure to select the most promising candidate materials for the costly quantum calculations in the next step. Importantly, our framework allows users to replace or augment this model with other ML architectures or retrained models specific to their system of interest, enabling improved prediction accuracy and tailored screening strategies. Such flexibility allows adaptation to a wide range of material classes and accelerates convergence to plausible candidate structures.

2.3 First-Principles Calculations↩︎

In exa-AMD, we use density functional theory as implemented in the Vienna Ab initio Simulation Package (VASP) [18], [19]. By default, structural relaxations and total energy calculations employ the projector augmented-wave (PAW) method [20] and the Perdew-Burke-Ernzerhof (PBE) GGA exchange-correlation functional [21], with a plane-wave cutoff of 520 eV and Monkhorst-Pack \(k\)-point meshes at a density of 2\(\pi \times 0.025\)–0.03 Å\(^{-1}\) to ensure convergence for metals and intermetallics [12], [13], [17]. The lattice parameters and internal atomic positions are relaxed until all forces fall below 0.01 eV/Å. The electronic band structures are also computed with high accuracy. For magnetic systems, spin polarization is included, with the initial moment set to ferromagnetic configuration. The saturation magnetization (\(M_s\), \(J_s\)) is computed from the total moment and relaxed cell volume.

Formation energies per atom are always referenced to the relaxed energies of the elemental (and, if needed, binary) phases, and convex hull constructions are used to assess thermodynamic stability—a compound is designated as stable, metastable (\(E_{\rm hull} < 0.1\) eV/atom), or unstable in this step. For correlated systems, a Hubbard \(U\) correction is applied within standard DFT for more accurate valence treatments when necessary. Dynamical stability checks, such as phonon calculations and ab initio molecular dynamics, can be conducted for representative new phases as post-processing steps. By parallelizing DFT jobs across CPU/GPU clusters using Parsl, thousands of structure relaxations with varying size and complexity can be completed within hours to days—enabling practical exploration of otherwise intractable structural and compositional spaces [12]–[16]. Although this first-principles calculation stage primarily uses VASP as the default DFT engine, the workflow is designed to be agnostic about the choice of DFT software. Depending on user preferences and available computational resources, alternative software packages such as Quantum ESPRESSO [22], [23] can be used. This modularity not only offers flexibility in computation but also enables benchmarking and method comparison to ensure robust validation of candidate materials.

2.4 Convex Hull and Thermodynamic Stability↩︎

The formation energy of a candidate material is defined as \[E_\text{form} = E_\text{tot}(\text{compound}) - \sum_i n_i E_\text{ref}(i)\] where \(E_\text{tot}\) is the total energy of the candidate, \(n_i\) are the atomic fractions, and \(E_\text{ref}(i)\) are the reference energies of the constituent elements.

Candidates are compared against all known phases to compute their energy above the convex hull (\(E_\mathrm{hull}\)). Compounds on the hull (\(E_\mathrm{hull}=0\) eV/atom) or near the hull (e.g., \(E_\mathrm{hull} < 0.1\) eV/atom) are considered stable or metastable, respectively. This rigorous convex hull analysis is essential for predicting which compounds can be synthesized in practice, as confirmed by literature and data-mined studies .

It is important to note that the scope of the predicted phases and phase diagrams generated by exa-AMD is fundamentally determined by the set of structure prototypes supplied at the beginning of the workflow. These initial prototypes, either provided by default with exa-AMD or customized by the user, define the structural motifs accessible to elemental substitution, combinatorial screening, and all subsequent machine learning and DFT evaluation. Consequently, all new compounds, low-energy structures, and phase diagram updates produced by the current version of exa-AMD are restricted to the chemical and structural space spanned by these templates. While this approach enables high throughput and chemical flexibility, it does not guarantee exhaustive exploration of all possible structure types. As demonstrated in our previous work on the Fe-Co-C system [17], this limitation can be addressed by integrating adaptive genetic algorithms (AGA) [24], [25] capable of exploring and predicting new structure types that are absent from existing databases or user-supplied templates. In future releases of exa-AMD, we plan to natively incorporate AGA-based structure prediction to overcome the prototype bottleneck and expand the framework’s reach to uncover genuinely novel structural motifs and compounds beyond the current prototype-driven paradigm.

3 Software Implementation↩︎

exa-AMD is implemented in Python and designed for clarity, portability, and extensibility. It encompasses plug-in scripts for structure handling, database access, machine learning inference, and DFT job submission. The use of the Parsl library [9] ensures efficient, scalable workflow orchestration involving task parallelism, fault tolerance, and elastic resource management. Users can configure resource allocation, execution platforms, and job scheduling via flexible configuration files, allowing seamless portability from local workstations to large-scale HPC systems. Key features of the software implementation include:

Modular design: The framework’s modular architecture allows users to customize or replace key components to adapt to diverse research needs. For example, users can provide their own customized initial structure pool as discussed above. Moreover, the machine learning model (default is CGCNN currently), can be replaced by other advanced models [add references here]. The first principles component also supports multiple DFT engines such as VASP, Quantum ESPRESSO, and others.
Workflow orchestration: Our framework manages scheduling, distribution, and monitoring of tasks, supporting both CPU and GPU execution with elasticity and fault tolerance to efficiently use heterogeneous computing nodes.
I/O and database integration: Structures are sourced from our exa-AMD dataset [26], curated from multiple databases or user-provided dataset, with outputs systematically organized into logs, results, and configuration files to ensure comprehensive data provenance and reproducibility.
Parallel execution: High-throughput tasks like ML screening and DFT calculations are automatically parallelized, with configurable resource allocations (node types and counts) for elastic scaling on supercomputers and clouds.
User interface: Flexible command-line tools and configuration files support simple and advanced job submission, including the ability to restart workflows at any stage.
Documentation and testing: Comprehensive user guides, API documentation, and automated test suites are provided to facilitate adoption and ensure software reliability.

At the heart of exa-AMD’s scalability and automation is the use of Parsl [9], a flexible, Python-based parallel programming and workflow library designed to efficiently orchestrate scientific pipelines across heterogeneous computational resources. Parsl enables exa-AMD to decompose each major workflow step—structure generation, ML-based screening, structure similarity filtering, DFT calculations, and property post-processing—into fine-grained, asynchronous tasks (“apps” in Parsl terminology) that can be dynamically mapped onto available compute resources such as CPUs, GPUs, or hybrid clusters. In practice, each workflow stage is implemented as a Parsl app, either a “Python app” or a “Bash app”. The user specifies execution back-ends (e.g., SLURM, PBS, or local), resource allocation (cores, GPUs, wall time), and queue preferences in a standard Parsl configuration file, which is agnostic about the scientific workflow. Parsl’s executor-provider abstraction allows exa-AMD to submit thousands of jobs to HPC queueing systems and efficiently manage task distribution, resource scaling, and result collection without manual intervention. For example, ML screening tasks are offloaded as batches to available GPUs, while high-throughput DFT tasks are partitioned into separate job arrays, each tracked and managed independently. As tasks complete, outputs are automatically aggregated and passed to downstream stages, ensuring full automation and rapid throughput.

Parsl also enables robust resumability and error handling within exa-AMD. Every task’s state and results are tracked, allowing failed calculations to be retried or workflows to resume from intermediate stages in case of interruption. Through elastic scaling, Parsl dynamically grows or shrinks the compute pool depending on job queue length and available resources; for example, releasing idle GPU nodes during less demanding workflow stages. This highly modular and fault-tolerant approach not only maximizes resource usage and throughput on shared HPC queues but also supports rapid development and reproducibility—key requirements for cutting-edge, community-driven materials discovery frameworks.

By decoupling workflow logic from execution configuration, exa-AMD with Parsl facilitates portability from laptops to exascale clusters. Users can port identical workflows across computing systems by simply adjusting configuration files, making rapid prototyping, exploration, and full production campaigns equally straightforward. Extensive documentation and tutorials are provided on the exa-AMD documentation website, including ready-to-use Parsl configuration templates for common platforms and guidance for workflow customization.

4 Performance and Scalability↩︎

The exa-AMD framework is explicitly designed to leverage modern high-performance computing (HPC) facilities, including exascale clusters with both CPU and GPU architectures with the use of Parsl. By decoupling workflow logic from execution resources, exa-AMD can efficiently multiplex tens of thousands of independent jobs, automatically scaling workload to available nodes and managing asynchronous dependencies. Importantly, Parsl supports dynamic provisioning and workload elasticity: as the workflow proceeds through different stages in the workflow, computing resources can be grown or shrunk, and jobs can transparently recover or resume from failures without user’s intervention. This ensures efficient backfilling and utilization of large, shared supercomputers.

To quantify performance, we conducted systematic scaling tests three representative systems: Na-B-C, Ce-Co-B, and Fe-Co-Zr. Na-B-C only involves light elements, making it the simplest system among them. Ce-Co-B is relatively complicated, involving rare-earth element as well as 3d transition metal. Fe-Co-Zr is selected as a typical rare-earth-free magnetic system. For each system, we measured total wall-clock time required to complete a workflow consisting of structure generation, CGCNN-based screening, and a fixed-size pool of parallelized DFT relaxations. Benchmarks were performed on both CPU and GPU partitions, with node counts up to 32 for Na-B-C system, and up to 256 nodes for Ce-Co-B and Fe-Co-Zr systems. [explain why 32 nodes for Na-B-C system with light element, and 256 nodes for complex systems]. Figs. 2, 3, and 4 summarize the strong scaling results: wall-clock times decrease nearly ideally with node count, following a 1/\(N\) scaling, on both CPU and GPU backends, illustrating excellent parallel efficiency. We have performed the benchmark tests on two large computers, perlmutter from NERSC and Chicoma from Los Alamos National Laboratory. [details on GPU nodes, CPU nodes]. For example, in the Na-B-C system, wall-clock time reduced from 1550 minutes on a single node with 4 GPUs to 88 minutes (32 GPU nodes), and from 1520 to 98 minutes over the same range on CPUs. GPU benchmarks consistently achieve slightly faster runtimes than CPU. Moreover, compared to CPU benchmarks, GPU benchmarks exhibit strong scaling that is closer to ideal behavior at higher node counts, reflecting the efficiency of accelerating ML inference and ab initio calculations by modern GPU architectures.

Performance comparisons for the Ce-Co-B and Fe-Co-Zr systems reveal similar strong scaling: for Ce-Co-B, total workflow time dropped from 2112 to 50 minutes on GPUs (4 to 256 nodes), and 2091 to 118 minutes on CPUs (4 to 128 nodes) . For the more demanding Fe-Co-Zr system, wall-clock time decreased from 2768 to 63 minutes as GPU node counts were increased from 4 to 256, and from 3018 to 155 minutes as CPU node counts change from 4 to 128. The parallel efficiency typically remaining at above 80% across the tested range. The highly modular design managed by Parsl ensures that each job (whether a ML inference, structure relaxation, or post-processing stage) is distributed, tracked, and aggregated without significant idle time or manual intervention. Dynamic resource allocation and robust fault tolerance further maximize throughput and utilization, particularly in shared supercomputing environments with variable queueing latency and hardware availability.

Figure 2: Strong scaling of exa-AMD workflow for the Na-B-C system. Wall-clock times are shown for both GPU and CPU nodes (1, 2, 4, 8, 16, 32). The workflow exhibits near-linear speed-up and substantial GPU acceleration at all node counts.

Figure 3: Benchmark results for the Ce-Co-B system executed on GPU architectures for 4 to 256 nodes, and on CPU for 4 to 128 nodes. Performance demonstrates efficient strong scaling and consistent GPU advantage for large-scale campaigns.

Figure 4: Wall-clock time as a function of node count for the Fe-Co-Zr system, on GPU architectures for 4 to 256 nodes, and on CPU for 4 to 128 nodes. . Strong scaling is maintained across the full range, underscoring exa-AMD’s readiness for large, exascale discovery campaigns.

These results directly demonstrate that exa-AMD achieves almost optimal scaling and efficient throughput across diverse computational environments. This enables million-structure ML screening in under an hour and ab initio relaxation of thousands of candidates within a day on modern clusters.

To provide deeper insight into exa-AMD’s resource utilization, we further analyzed the time distribution among major phases in the workflow using Na-B-C as an example. As illustrated in Fig. 5, the vast majority of the total wall-clock time is consumed by VASP calculations (97.6 %). Structure generation and selection, the CGCNN machine learning inference, and selection/filtering steps together require under 3% combined.

Figure 5: Time distribution among major workflow phases for the Na-B-C benchmark. Shown are the fractions for structure generation (blue), CGCNN inference (orange), structure selection (green), and VASP DFT calculations (red). Nearly all computational effort is spent on the high-throughput ab initio tasks, underscoring the efficiency of accelerated machine learning screening and workflow management stages.

With GPU acceleration, the CGCNN model can process over 1 million structures within minutes, allowing exa-AMD to rapidly down-select candidates and prioritize compute resources for only the most promising compounds. This efficient distribution—made possible by Parsl’s fine-grained, parallel task scheduling—demonstrates that the workflow overhead is minimal and that exa-AMD achieves near-optimal throughput for production-scale discovery, limited primarily by the computationally intensive quantum mechanical calculations.

This workflow profiling confirms that by using advanced job control and asynchronous task management, non-ab initio workflow steps incur negligible bottlenecks, even as campaign size and node count increase. Resources are automatically released from structure screening phases and reallocated for the surge in DFT job submissions, allowing seamless scaling and minimized overall turnaround time for large discovery efforts.

5 Example Applications↩︎

A compelling demonstration of the exa-AMD framework is its application to accelerated discovery and design of rare-earth-free permanent magnets in the ternary Fe-Co-Zr system. One of the hallmarks of exa-AMD is its user-focused interface and workflow: the user simply specifies a set of chemical elements—in this case, Fe, Co, and Zr—and the directory containing the prototype structures, and the framework automates the entire process of generating, screening, and validating hypothetical compounds, requiring minimal manual intervention.

After specifying the elements, the workflow proceeds by generating a broad pool of candidate structures using database-driven template substitution, combinatorial permutation, and lattice scaling. CGCNN predicts the formation energies for nearly 900,000 candidate Fe-Co-Zr compositions in under 15 minutes using a single GPU node. Structures with predicted formation energy below a set threshold (e.g., \(E_\mathrm{f} < -0.1\) eV/atom) are automatically filtered in. After removing duplicated structures, this yields about 3,100 distinct candidates for ab initio validation. The subsequent first-principles (VASP) relaxation of all 3,100 structures was completed in less than 12 hours on a modest GPU cluster. The only user input required throughout this discovery pipeline was the choice of target elements. All stability screening, filtering, relaxation, and post-processing were handled automatically, demonstrating both the simplicity and power of the exa-AMD approach.

As a final output, exa-AMD provided significantly updated convex hulls for the Fe-Co-Zr phase diagram (Fig. 6), as well as an annotated list of newly predicted stable and low-energy (metastable) ternary compounds. Nine new stable Fe-Co-Zr phases and 81 promising metastable compounds (within 0.1 eV/atom above the hull) were discovered, dramatically extending the known landscape for this key system. For selected candidates, exa-AMD also provided DFT-optimized crystal structures, site-specific magnetic properties, and both electronic band structures and densities of states (as in Table 1 and supplementary figures of the Fe-Co-Zr study).

Figure 6: The updated convex hull with the newly predicted Fe-Co-Zr compounds by DFT calculations based on the structures selected through exa-AMD framework. Red dots are the 9 newly predicted stable phases; colored triangles are the 81 newly identified unique metastable phases with low formation energies (within 0.1 eV/atom above the updated convex hull).

The exa-AMD workflow is equally applicable to a wide range of materials and other target properties. For demonstration, in the Na-B-C system, the user again simply specifies the three desired elements. exa-AMD then systematically explores all possibilities generated by template substitution, screens stable candidates by ML, and refines selected compound energies and properties by DFT calculations. The outputs include both updated convex hulls (for phase stability visualization) and, if desired, relaxed structure information for experimental synthesis or further theoretical study. Fig. 7 shows a sample output for Na-B-C. The discovery of quaternary systems and more complex systems can be pursued using the same high-throughput approach.

Figure 7: The updated convex hull with the newly predicted Na-B-C compounds by DFT calculations based on the structures selected through exa-AMD framework. Colored squares are identified metastable phases with low formation energies (within 0.2 eV/atom above the convex hull).

This modular, scalable, and element-agnostic strategy allows exa-AMD users to efficiently perform reliable high-throughput discovery campaigns for virtually any combination of elements, fulfilling a critical need for rapid, reproducible computational materials design.

6 User Guide↩︎

6.1 Installation↩︎

Clone the repository from https://github.com/ml-AMD/exa-amd/ and follow the conda installation instructions for exa-AMD and its dependencies (conda env create -f amd_env.yml).
Prepare your computational environment with compatible quantum simulation packages (e.g., VASP, Quantum ESPRESSO).

6.2 Workflow Execution↩︎

Populate the structure pool with initial templates or import from a supported materials database, in the Crystallographic Information File (CIF) format
Edit the JSON input file specifying target elements, scaling factors, and workflow parameters.
Prepare a Parsl configuration for the target computing platform.
Launch the workflow via command-line or Python API, specifying node type and count (CPU/GPU).
Monitor progress via logs; restart is possible from saved stages for long or batch jobs.
Analyze the results in the output directory: stable/metastable candidates, updated convex hull, relaxed structures, property tables.

6.3 Examples and Troubleshooting↩︎

Example workflows and datasets are included in the GitHub documentation site and online guides.
Troubleshooting tips for common errors (convergence, resource allocation, file I/O) are detailed in the user documentation. The tutorial provides an example of predicting novel Na–B–H–C compounds (https://ml-amd.github.io/exa-amd/tutorial.html).

7 Summary and Outlook↩︎

exa-AMD represents a significant advancement in the field of computational materials science by providing an exascale-capable, modular framework that dramatically accelerates the exploration of complex composition-structure-property spaces. By automating structure generation, stability screening using state-of-the-art machine learning methods, and rigorous first-principles (DFT) calculations in a high-throughput manner using Parsl, exa-AMD enables researchers to efficiently identify new stable and metastable compounds from vast candidate pools. Its capability is demonstrated by a few selected, challenging multinary systems, achieving rapid down-selection and efficient resource utilization with minimal manual intervention. One of the goals of exa-AMD is to support transparent, reproducible, and community-driven research by providing open-source codes with detailed documentation, as well as a user-friendly design.

At present, exa-AMD focuses on first principles characterization on carefully selected compounds and phase diagram refinement based on known structural templates. A limitation is that the proposed compounds are limited to the structure types present in the existing templates or user-supplied prototypes, making the discovery of entirely new structure motifs challenging. To address this problem, future developments for exa-AMD will incorporate two new features: (1) the integration of advanced machine learning potentials for efficient atomistic simulation and structure optimization, and (2) the combination of Adaptive Genetic Algorithm for new structure generation . These new features will enable the prediction of novel crystal structures beyond the predefined prototypes, providing a more robust and comprehensive approach for material discovery.

Acknowledgements↩︎

Work at Ames National Laboratory and Los Alamos National Laboratory was supported by the U.S. Department of Energy (DOE), Office of Science, Basic Energy Sciences, Materials Science and Engineering Division through the Computational Material Science Center program. Ames National Laboratory is operated for the U.S. DOE by Iowa State University under contract No. DE-AC02-07CH11358. Los Alamos National Laboratory is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy under Contract No. 89233218CNA000001. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported under Contract No. DE-AC02-05CH11231. (LA-UR-25-28449)

References↩︎

[1]

A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K. A. Persson, Commentary: The materials project: A materials genome approach to accelerating materials innovation, APL Materials 1 (2013) 011002. https://doi.org/10.1063/1.4812323.

[2]

S. Curtarolo, W. Setyawan, G. L. W. Hart, M. Jahnatek, R. V. Chepulskii, R. H. Taylor, S. Wang, J. Xue, K. Yang, O. Levy, M. J. Mehl, H. T. Stokes, D. O. Demchenko, D. Morgan, Aflow: An automatic framework for high-throughput materials discovery, Computational Materials Science 58 (2012) 218–226. https://doi.org/10.1016/j.commatsci.2012.02.005.

[3]

J. E. Saal, S. Kirklin, M. Aykol, B. Meredig, C. Wolverton, Materials design and discovery with high-throughput density functional theory: The open quantum materials database (oqmd), JOM 65 (2013) 1501–1509. https://doi.org/10.1007/s11837-013-0755-4.

[4]

S. Kirklin, J. E. Saal, B. Meredig, A. Thompson, J. W. Doak, M. Aykol, S. Rühl, C. Wolverton, The open quantum materials database (oqmd): assessing the accuracy of dft formation energies, npj Computational Materials 1 (2015) 15010. https://doi.org/10.1038/npjcompumats.2015.10.

[5]

S. Curtarolo, G. L. W. Hart, M. B. Nardelli, N. Mingo, S. Sanvito, O. Levy, The high-throughput highway to computational materials design, Nature Materials 12 (3) (2013) 191–201. https://doi.org/10.1038/nmat3568.

[6]

G. Hautier, https://www.sciencedirect.com/science/article/pii/S0927025619301156, Computational Materials Science 163 (2019) 108–116. https://doi.org/https://doi.org/10.1016/j.commatsci.2019.02.040. ://www.sciencedirect.com/science/article/pii/S0927025619301156.

[7]

T. Xie, J. C. Grossman, Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties, Physical Review Letters 120 (2018) 145301. https://doi.org/10.1103/PhysRevLett.120.145301.

[8]

K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, A. Walsh, Machine learning for molecular and materials science, Nature 559 (7715) (2018) 547–555. https://doi.org/10.1038/s41586-018-0337-2.

[9]

Y. Babuji, A. Woodard, Z. Li, D. S. Katz, B. Clifford, R. Kumar, L. Lacinski, R. Chard, J. M. Wozniak, I. Foster, Parsl: Pervasive parallel programming in python, Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing 28 (2019) 25–36. https://doi.org/10.1145/3307512.3329361.

[10]

A. Merchant, S. Batzner, S. S. Schoenholz, Scaling deep learning for materials discovery, Nature 624 (2023) 80–85. https://doi.org/10.1038/s41586-023-06735-9.

[11]

M. Sakurai, R. Wang, T. Liao, C. Zhang, H. Sun, Y. Sun, H. Wang, X. Zhao, S. Wang, B. Balasubramanian, X. Xu, D. J. Sellmyer, V. Antropov, J. Zhang, C.-Z. Wang, K.-M. Ho, J. R. Chelikowsky, Discovering rare-earth-free magnetic materials through the development of a database, Physical Review Materials 4 (2020) 114408. https://doi.org/10.1103/PhysRevMaterials.4.114408.

[12]

W. Xia, M. Moraru, Y. W. Li, T. Liao, J. R. Chelikowsky, C.-Z. Wang, Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing, arXiv preprint arXiv:2506.22627 (2025). http://arxiv.org/abs/2506.22627.

[13]

W. Xia, W. S. Tee, P. C. Canfield, R. Flint, C.-Z. Wang, Search for stable and metastable ce-co-cu ternary compounds using machine learning, Inorganic ChemistryJust published, DOI not yet assigned (2025).

[14]

M. Moraru, W. Xia, Z. Ye, F. Zhang, Y. Yao, Y. W. Li, C.-Z. Wang, exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design, arXiv preprint arXiv:2506.21449 (2025). http://arxiv.org/abs/2506.21449.

[15]

W. Xia, M. Sakurai, B. Balasubramanian, T. Liao, R. Wang, C. Zhang, H. Sun, K.-M. Ho, J. R. Chelikowsky, D. J. Sellmyer, C.-Z. Wang, Accelerating the discovery of novel magnetic materials using a machine learning guided adaptive feedback, Proceedings of the National Academy of Sciences 119 (47) (2022) e2204485119. https://doi.org/10.1073/pnas.2204485119.

[16]

W. Xia, L. Tang, H. J. Sun, C. Zhang, K.-M. Ho, G. Viswanathan, K. Kovnir, C.-Z. Wang, Accelerating materials discovery using integrated deep machine learning approaches, Journal of Materials Chemistry A 11 (47) (2023) 25973–25982. https://doi.org/10.1039/D3TA03771A.

[17]

W. Xia, M. Sakurai, B. Balasubramanian, T. Liao, R. Wang, C. Zhang, H. Sun, K.-M. Ho, J. R. Chelikowsky, C.-Z. Wang, Machine learning assisted search for fe-co-c ternary compounds with high magnetic anisotropy, APL Machine Learning 2 (4) (2024) 046103. https://doi.org/10.1063/5.0185935.

[18]

G. Kresse, J. Furthmüller, Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set, Computational Materials Science 6 (1996) 15–50. https://doi.org/10.1016/0927-0256(96)00008-0.

[19]

G. Kresse, J. Furthmüller, Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set, Physical Review B 54 (1996) 11169–11186. https://doi.org/10.1103/PhysRevB.54.11169.

[20]

P. E. Blöchl, Projector augmented-wave method, Physical Review B 50 (1994) 17953–17979. https://doi.org/10.1103/PhysRevB.50.17953.

[21]

J. P. Perdew, K. Burke, M. Ernzerhof, Generalized gradient approximation made simple, Physical Review Letters 77 (1996) 3865–3868. https://doi.org/10.1103/PhysRevLett.77.3865.

[22]

P. Giannozzi, O. Andreussi, T. Brumme, O. Bunau, M. B. Nardelli, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, M. Cococcioni, N. Colonna, I. Carnimeo, A. D. Corso, S. de Gironcoli, P. Delugas, R. A. D. Jr, A. Ferretti, A. Floris, G. Fratesi, G. Fugallo, R. Gebauer, U. Gerstmann, F. Giustino, T. Gorni, J. Jia, M. Kawamura, H.-Y. Ko, A. Kokalj, E. Küçükbenli, M. Lazzeri, M. Marsili, N. Marzari, F. Mauri, N. L. Nguyen, H.-V. Nguyen, A. O. de-la Roza, L. Paulatto, S. Poncé, D. Rocca, R. Sabatini, B. Santra, M. Schlipf, A. P. Seitsonen, A. Smogunov, I. Timrov, T. Thonhauser, P. Umari, N. Vast, X. Wu, S. Baroni, Advanced capabilities for materials modelling with quantum espresso, Journal of Physics: Condensed Matter 29 (2017) 465901. https://doi.org/10.1088/1361-648X/aa8f79.

[23]

P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. D. Corso, S. Fabris, G. Fratesi, S. de Gironcoli, R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov, P. Umari, R. M. Wentzcovitch, Quantum espresso: a modular and open-source software project for quantum simulations of materials, Journal of Physics: Condensed Matter 21 (2009) 395502. https://doi.org/10.1088/0953-8984/21/39/395502.

[24]

S. Q. Wu, M. Ji, C.-Z. Wang, M. C. Nguyen, X. Zhao, K. Umemoto, R. M. Wentzcovitch, K.-M. Ho, An adaptive genetic algorithm for crystal structure prediction, Journal of Physics: Condensed Matter 26 (3) (2014) 035402. https://doi.org/10.1088/0953-8984/26/3/035402.

[25]

X. Zhao, M. C. Nguyen, W. Y. Zhang, C. Z. Wang, M. J. Kramer, D. J. Sellmyer, X. Z. Li, F. Zhang, L. Q. Ke, V. P. Antropov, K. M. Ho, Exploring the Structural Complexity of Intermetallic Compounds by an Adaptive Genetic Algorithm, Physical Review Letters 112 (2014) 045502. https://doi.org/10.1103/PhysRevLett.112.045502.

[26]

W. Xia, C.-Z. Wang, https://doi.org/10.5281/zenodo.17180192(2025). https://doi.org/10.5281/zenodo.17180192. ://doi.org/10.5281/zenodo.17180192.

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials