Generative AI for Architectural Design: A Literature Review

Chengyuan Li\(^{1}\) Tianyu Zhang\(^{2}\) Xusheng Du\(^{2}\) Ye Zhang\(^{1}\)1 Haoran Xie\(^2\)
\(^1\)Tianjin University \(^2\)Japan Advanced Institute of Science and Technology


Abstract

Generative Artificial Intelligence (AI) has pioneered new methodological paradigms in architectural design, significantly expanding the innovative potential and efficiency of the design process. This paper explores the extensive applications of generative AI technologies in architectural design, a trend that has benefitted from the rapid development of deep generative models. Generative Adversarial Networks (GANs) and Variational Autoencoder (VAE) have been extensively applied before, significantly advancing design innovation and efficiency. With continual technological advancements, state-of-the-art Diffusion Models and 3D Generative Models are progressively integrated into architectural design, offering designers a more diversified set of creative tools and methodologies. This article further provides a comprehensive review of the basic principles of generative AI and large-scale models and highlights the applications in the generation of 2D images, videos, and 3D models. In addition, by reviewing the latest literature from 2020, this paper scrutinizes the impact of generative AI technologies at different stages of architectural design, from generating initial architectural 3D forms to producing final architectural imagery. The marked trend of research growth indicates an increasing inclination within the architectural design community towards embracing generative AI, thereby catalyzing a shared enthusiasm for research. These research cases and methodologies have not only proven to enhance efficiency and innovation significantly but have also posed challenges to the conventional boundaries of architectural creativity. Finally, we point out new directions for design innovation and articulate fresh trajectories for applying generative AI in the architectural domain. This article provides the first comprehensive literature review about generative AI for architectural design, and we believe this work can facilitate more research work on this significant topic in architecture.

Keywords: Generative AI, Architectural Design, Diffusion Models, 3D Generative Models, Large-scale models.

1 Introduction↩︎

Nowadays, generative artificial intelligence (AI) techniques increasingly expand their power and revolution in architectural design. Here, generative AI refers to the artificial intelligence technologies dedicated to content generation, such as text, images, music, and videos. Generative AI benefits from the rapid development of deep generative models, including Generative Adversarial Networks (GANs), Variational Autoencoder (VAE), and Diffusion Models (DMs). GANs and VAE are traditional generative models, and have been widely explored in architectural design, as illustrated in Figure 1. In this paper, we focus on the recent progress of generative AI, especially the revolutionary diffusion models. DMs achieved state-of-the-art performance in various content generation tasks such as text-to-image and text-to-3D-models.

Figure 1: Examples of architecture design using generative AI techniques: (a) church design [1]; (b) matrix of cuboid shapes [2]; (c) Frank Gehry’s Walt Disney concert hall [3]; (d) Bangkok urban design [4]; (e) foresting architecture [4]; (f) Urban interiors  [4] and (g) text-to-architectural design [5].

Architectural design may encompass multiple themes and scopes, with each project having distinct design requirements and individual styles, leading to diversity and complexity in design approaches. In this work, we adopt 6 main steps in the architectural design process for the literature review: 1) architectural preliminary 3D forms design, 2) architectural layout design, 3) architectural structural system design, 4) detailed and optimization design of architectural 3D forms, 5) architectural facade design, and 6) architectural imagery expression. After exploring the research papers from 2020 to 2023, we observed there has been a significant increase in the number of research papers in architectural design using Generative AI. The number of research papers using Generative AI technology in different architectural design steps reveals the development trends within each subfield, as illustrated in Figure 2 (a). Most researches are concentrated in the area of architectural plan design. Research in preliminary 3D form design of architecture and architectural image expression has rapidly increased in the past two years. More research needs to be done by scholars on architectural, structural system design, architectural 3D form refinement and optimization design, and architectural facade design.

This sustained growth trend distinctly demonstrates that generative AI in architectural design are expanding at an unprecedented rate while also reflecting the architectural design and computer science community have high level of attention and increasing investment in Generative AI technologies. The most used generative AI techniques are illustrated in Fig 2 (b). In computer science, many studies focus on GAN and VAE, while research on DDPM, LDM, and GPT is in the initial stages. The situation is the same in architecture.

Figure 2: Overview of generative AI applications in architectural design: statistics on research paper numbers and generative models.

1.1 Motivation↩︎

Leveraging the recent generative AI models in architectural design could significantly improve design efficiency, and provide architects with new design processes and ideas to expand the possibilities of architectural design and revolutionize the entire design process. However, the use of advanced generative models in architectural design has not been explored extensively. The primary reasons for hindering the use of advanced generative models in architectural design may have two aspects: the professional barriers and the issue of training data.

In terms of professional barriers, deep learning and architectural design are highly specialized fields requiring extensive professional knowledge and experience. The aim of this study is to narrow the professional barriers between architecture and computer science, and assist architectural designers in bridging Generative AI technologies with applications, promoting interdisciplinary research, and delineating future research directions. This review systematically analyzes and summarizes case studies and research outcomes of Generative AI applications in architectural design, and showcases the possibilities and potential of the intersection between computer science and architecture. This interdisciplinary perspective encourages collaboration among experts from different fields to address complex issues in architectural design, thus advancing scientific research and technological innovation.

In terms of the issue of training data, deep learning models require high-quality training data to analyze and verify their generalization ability. However, data in the field of architecture is usually unstructured. The search and organization of architectural training data pose a significant challenge, making it difficult right from the initial stages of model training. In addition, high-performance Graphics Processing Units (GPUs) are required to train the millions of data for deep learning models, especially those dealing with complex images and datasets. The scarcity of high-performance GPUs and the difficulty of mastering GPU programming skills may prevent the architects to explore the recent diffusion model and large foundation models.

1.2 Structure and Methodology↩︎

This article first introduces the development and application directions of generative AI models, then elaborates on the methods of applying generative AI in the architectural design process, and finally, forecasts the potential application development of generative AI in the architectural field.

In section 2, the article offers an in-depth introduction to the principles and evolution of various generative AI models, with a focus on Diffusion Models (DMs), 3D Generative Models, and Foundation Models. In section 2.1, the article elaborates on the principles and development of Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). In section 2.2, the discourse on Diffusion Models elaborates on the working mechanisms and the developmental trajectories of DDPM and LDM. In section 2.3, the segment on 3D Generative Models zeroes in on 3D shape representation, encompassing Voxels, Point Clouds, Meshes, Implicit functions, and Occupancy Fields. Within Occupancy Fields, the paper details Signed Distance Functions (SDF), Unsigned Distance Functions (UDF), and Neural Radiance Fields (NeRF), explaining their respective operational principles. In section 2.4, the Foundation Models section comprehensively describes the progress and achievements of Large Language Models (LLM) and Large Vision Models. In section 2.5, the paper discusses the applications and developments of these models in image generation, video generation, and 3D model generation.

In section 3, this paper delves into the application development of generative AI models in architectural design. Given the complexity of the architectural design process, this article delineates the architectural design process into six steps, as presented in introduction. In each step, the article summarizes and discusses the current application methods of generative AI models in these six domains. By analyzing these research papers, the study demonstrates how generative AI can facilitate innovation in architectural design, improve design efficiency, and optimize architectural solutions. Throughout this summarization process, literature retrieval was conducted using databases such as Cumincad and Web of Science, supplemented by searches on Litmaps. To ensure the targeted and accurate nature of the search, specific search queries were set for each design process.

In Section 4, this article explores the potential applications of generative AI technology in generating architectural design images, architectural design videos, architectural design 3D models, and human-centric architectural design. In section 4.1, it anticipates applications for architectural design image generation in generating floor plans, facade images, architectural images. In section 4.2, it anticipates architectural design video generation, it foresees applications such as generating videos from a single architectural image, generating videos from architectural images, style transfer for specific video content. In section 4.3, Regarding architectural design 3D model generation, it envisions possibilities in generating 3D models from images and text prompt, transferring styles to 3D models, and generating and editing detailed styles for 3D models. In section 4.4, it elaborates on the potential of generative AI in enhancing the human-centric architectural design process.

2 Generative AI Models↩︎

The generative AI models are currently experiencing rapid development, with new methods continually emerging. The evolution of deep learning-based approaches, particularly Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and Diffusion Models (DM), have significantly advanced and enhanced image generation techniques. VAEs played a pioneering role in deep learning-based generative models. They employ an encoder-decoder architecture integrated with probabilistic graphical models to learn latent representations for image generation [6]. GANs represent a milestone in the realm of image generation with a generator and a discriminator, GANs engage in an adversarial training process to prompt the generator to generate images progressively resembling the distribution of real data [7], [8]. Moreover, the diffusion models stand out as the most revolutionary technologies that have emerged in recent years with remarkable image generation quality [9], [10]

2.1 Generative Adversarial Networks↩︎

Figure 3: The framework of GAN, VAE, and diffusion models (DM). Where \(z\) is a compressed low-dimensional representation of the input.

Generative Adversarial Network (GAN) [11] comprises a generator \(G\) and a discriminator \(D\), as illustrated in Figure 3. The \(G\) is responsible for generating samples for noise \(z\), while the \(D\) determines the authenticity of the generated samples \(G(Z)\) with the ground truth image \(\bar{x}\). Ideally: \[\label{equ:GAN95D} D(\Bar{X})=1, D(G(z))=0\tag{1}\] This adversarial nature enables the model to maintain a dynamic equilibrium between generation and discrimination, propelling the learning and optimization of the entire system. Despite its advantages, GAN still faces challenges, such as mode collapse during training.

2.1.0.1 Conditional GAN

Conditional image generation is an image generation technique that controls the generation process by introducing conditional information to generate images that match given conditions, such as text, labels, and hand-drawn sketches. Conditional image generation introduces additional input conditions, enabling the generator to generate images with specific properties based on conditional information. To address the issue that GAN models exhibit limited controllability, Conditional GAN (CGAN) [12] was introduced that uses additional auxiliary information as a condition to fine-tune both the \(G\) and \(D\). The \(G\) of CGAN receives conditional information besides random noise. By providing conditional information to the \(G\), CGAN can more precisely control the generated results. Additionally, variants such as pix2pix [13] and StyleGAN [7] have been developed.

2.2 Diffusion Models↩︎

In image generation, diffusion models outperform GANs and VAEs [14], [15]. Most diffusion models currently used are based on Denoising Diffusion Probabilistic Models (DDPM) [15] which simplifies the diffusion model through variational inference. As shown in Figure 3, diffusion models contain both forward diffusion process and reverse denoising (inference) processes. The forward process follows the concept of a Markov chain and turns the input image into Gaussian noise. Given a data sample \(x_0\), the Gaussian noise is progressively Increased to the data sample during \(T\) steps in the forward process, producing the noisy samples \(x_t\), where the timestep \(t=\{1, \ldots, T\}\). As \(t\) increases, the distinguishable features of \(x_0\) gradually diminish. Eventually when \(T \rightarrow \infty\), \(x_T\) is equivalent to a Gaussian distribution with isotropic covariance. In addition, the inference process can be understood as a sequence of denoising autoencoders with same weights \(\epsilon_\theta\left(x_t, t\right)\) (\(\epsilon_\theta\) is typically implemented as U-Net[16]), which are trained to forecast denoised images of their corresponding inputs \(x_t\).

2.2.0.1 Latent Diffusion Model

Different from DDPM, Latent Diffusion Model (LDM) [9] does not directly operate on the images but operates in the latent space, called perceptual compression. LDM reduces the dimensionality of the data by projecting it into a low-dimensional, efficient latent space, in which high-frequency, imperceptible details are abstracted away. The framework of LDM is illustrated in Figure 4. After the image \(x\) is compressed by the encoder \(\mathcal{E}\) to latent representation \(z\), the diffusion process is performed on the latent representation space. LDM has a similar diffusion process to the DDPM. Finally, LDM infers the data sample \(z\) from the noise \(z_T\) and \(\mathcal{D}\) restores the data \(z\) to the original pixel space and gets the result images \(\widetilde{x}\).

Figure 4: The framework of the latent diffusion model, which is proposed by Rombach et al[9]

Specifically, given an image \(x \in \mathbb{R}^{H \times W \times 3}\) with height \(H\), wigth \(W\) in \(RGB\) space , LDM first utilizes an encoder \(\mathcal{E}\) to encode the image \(x\) into a latent representation space: \[\label{equ:1} z=\mathcal{E}(x)\tag{2}\] where \(z \in \mathbb{R}^{h \times w \times c}\) with height \(h\) and width \(w\), the constant \(c\) represents the number of channels. Then \(\mathcal{D}\) recover the image from the latent representation space: \[\label{equ:2} \tilde{ x}=\mathcal{D}(z)=\mathcal{D}(\mathcal{E}(x))\tag{3}\]

To accelerate the generation speed, the Latent Consistency Model (LCM) [17] was proposed to optimize the step of denoising inference.

2.3 3D Generative Models↩︎

In the field of three-dimensional shape modeling, implicit functions are commonly represented in three ways: Occupancy Field, Signed Distance Function (SDF), or Unsigned Distance Function (UDF), and the recently emerging Neural Radiance Fields (NeRF).

2.3.0.1 3D Shape Representation

Representation in 3D visual problems can generally be divided into four categories: voxel-based, point cloud-based, mesh-based, and implicit representation-based.

Voxel. As shown in Fig 5 (a). The voxel format describes a 3D object as a matrix of volume occupancy, where the size of the matrix is fixed. Researchers [18] adopted voxel representation in the generation of 3D shapes. Voxel format requires high resolution to describe fine-grained details, so as the shape resolution increases, the computational cost also explodes. The reconstruction results of voxel-based research are limited in resolution and do not provide topological guarantees or represent sharp features.

Point Cloud. As shown in 5 (b). Point clouds are a lightweight 3D representation composed of \((x, y, z)\) coordinate values. Point clouds are a natural way to represent shapes. PointNet [19] extracts global shape features using the max-set operations, and it is used widely as an encoder for point-based generative networks [20]. However, point clouds do not represent topology and are unsuitable for generating watertight surfaces.

Mesh. As shown in 5 (c) meshes are widely used and constructed from vertices and faces. [21] deformed a pre-defined template to restrict a fixed topology using graph convolution. Recently, meshes are used to represent shapes in deep learning techniques [22]. Although meshes are more suitable for describing the topological structure of objects, they usually require advanced preprocessing steps.

Implicit. As shown in 5 (d), implicit representation refers to describing a surface with a zero-crossing point of a volume function \(\psi : R^3 \to R\), whose value can be adjusted. Representing a 3D shape as a set of level sets of a deep network, mapping 3D coordinates to a signed distance function [23] or occupancy field [24]. Implicit representation can create a lightweight, continuous shape representation with no resolution limits.

a

Voxel

b

Point

c

Mesh

d

Implicit

Figure 5: Representation examples of 3D shapes from [24]..

2.3.0.2 Occupancy Field

Occupancy Field is one of the implicit function methods based on deep learning [24]. Occupancy Field assigns binary values to each point in three-dimensional space, determining whether the point is occupied by an object. This approach utilizes neural networks to learn the representation of occupancy fields, facilitating highly detailed three-dimensional reconstruction. The advantage of Occupancy Field lies in its dynamic modeling of object occupancy in scenes, making it suitable for handling complex three-dimensional environments.

Figure 6: DeepSDF [23] representation applied to the Stanford Bunny: (a) depiction of the underlying implicit surface SDF = 0 trained on sampled points inside SDF <0 and outside SDF >0 the surface, (b) 2D cross-section of the signed distance field, (c) rendered 3D surface recovered from SDF = 0. Note that (b) and (c) are recovered via DeepSDF.

Figure 7: An overview of NeRF scene representation and differentiable rendering procedure [25]. Synthesizing images by sampling 5D coordinates (location and viewing direction) along camera rays (a), feeding those locations into an MLP to produce a color and volume density (b), and using volume rendering techniques to composite these values into an image (c). And minimize the residual between synthesized and ground truth observed images (d).

SDF. Building upon Occupancy Field, the Signed Distance Function (SDF) has become a crucial direction in implicit function representation within deep learning. SDF assigns a signed distance value to each point, indicating the shortest distance from the point to the object’s surface. Positive values signify points outside the object, while negative values indicate points inside the object. As shown in Figure 6. DeepSDF [23] provides an end-to-end approach for continuous SDF learning, enabling precise modeling of irregular shapes and local geometry.

UDF. UDF and SDF are two distinct yet interrelated implicit function representation approaches. UDF assigns an unsigned distance value to each point, representing the distance to the nearest surface without considering surface direction. UDF is particularly useful for capturing more intuitive surface distance information without involving directional aspects. Zhao et al. [26] contribute significantly by jointly exploring the learning of both signed and unsigned distance functions. This approach aims to enrich the expressiveness of implicit functions, simultaneously capturing intricate details through both signed and unsigned distance information.

NeRF. Neural Radiance Fields (NeRF) [25] have revolutionized the field of computer vision and graphics by introducing a novel approach to scene representation. As shown in Figure 7. At the heart of NeRF lies the concept of representing a scene as a continuous function capturing radiance information at every point. The fundamental equation driving NeRF is the rendering equation, mathematically formulating the observed radiance along a viewing ray. The NeRF formulation is expressed as:

\[C(\mathbf{p}) = \int T(\mathbf{p}_t) \cdot \sigma(\mathbf{p}_t) \cdot L(\mathbf{p}_t, -\mathbf{d}) \, d\mathbf{p}_t\]

Where \(C(\mathbf{p})\) represents the observed color at point \(\mathbf{p}\), \(\mathbf{p}_t\) represents points along the viewing ray, \(T(\mathbf{p}_t)\) is the transmittance function, \(\sigma(\mathbf{p}_t)\) represents volume density, and \(L(\mathbf{p}_t, -\mathbf{d})\) represents emitted radiance. NeRF introduces an implicit representation, enabling the encoding of detailed and continuous volumetric information. This allows for high-fidelity reconstruction and rendering of scenes with fine-scale structures, surpassing the limitations of explicit representations. Recently, 3D Gaussian Splatting[27] is introduced by projecting 3D information onto a 2D domain using Gaussian kernels, and achieved better performance than NeRF.

2.4 Foundation Models↩︎

In computer science, foundation models also called large-scale models use deep learning models with numerous parameters and intricate structures, particularly in natural language processing and computer vision tasks. These models demand substantial computational resources for training but exhibit exceptional performance across diverse tasks. The evolution from basic neural networks to sophisticated diffusion models, as depicted in Figure 8, illustrates the continuous quest for more robust and adaptable AI systems.

Figure 8: The evolution of prominent large-scale models in computer science.

2.4.1 Large Language Models (LLM)↩︎

Transformer. The Transformer model has achieved remarkable success in natural language processing (NLP) which consists of several components: encoder, decoder, positional Encoding, and the final linear and softmax layers. Both the encoder and decoder are composed of multiple identical layers. Each layer contains several components of attention layers and feedforward network layers. Additionally, positional encoding is used to inject positional information into the text embeddings, indicating the position of words within the sequence. Notably, Transformer has paved the way for two prominent Transformer models: Bidirectional Encoder Representations from Transformers (BERT)[28] and Generative Pre-trained Transformer (GPT)[29]. The main difference is that BERT is based on a bidirectional pre-training language model and fine-tuning, while GPT is based on an autoregressive pre-training language model and prompting.

GPT. GPT aims to pre-train models using large-scale unsupervised learning to facilitate understanding and generation of natural language. The training process involves two primary stages: Initially, a language model is trained in an unsupervised manner on extensive corpora without task-specific labels or annotations. Subsequently, supervised fine-tuning occurs during the second stage, catering to specific application domains and tasks.

BERT. BERT has emerged as a breakthrough approach, achieving state-of-the-art performance across diverse language tasks. BERT’s training methodology comprises two key stages: pre-training and fine-tuning. Pre-training involves the utilization of extensive text corpora to train the language model. The primary objective of pre-training is to endow the BERT model with robust language understanding capabilities, enabling it to effectively tackle various natural language processing tasks. Subsequently, fine-tuning utilizes the pre-trained BERT model in conjunction with smaller labeled datasets to refine the model parameters. This process facilitates the customization of the model to specific tasks, thereby enhancing its suitability and performance for targeted applications.

In recent years, LLMs have witnessed explosive and rapid growth. Basic language models refer to models that are only pre-trained on large-scale text corpora, without any fine-tuning. Examples of such models include LaMDA[30] and OpenAI’s GPT-3[31].

2.4.2 Large Vision Models↩︎

In computer vision, pretrained vision-language models like CLIP[32] have demonstrated powerful zero-shot generalization performance across various downstream visual tasks. These models are typically trained on hundreds of millions to billions of image-text pairs collected from the web. In addition, some research efforts also focus on large-scale base models conditioned on visual input prompts. For example, SAM[33] can perform category-agnostic segmentation from given images and visual prompts (such as boxes, points, or masks).

The current generative models based on the diffusion model present unprecedented understanding and creative capabilities. Stable Diffusion [9] uses the CLIP [32] text encoder and can adjust the model through text prompts. Its diffusion process starts with random noise and gradually denoising until generates a complete data sample . DALLE-3[34] utilizing the diffusion model with massive data to generate amazing results MIDJourney excels at adapting to actual artistic styles to create images with any combination of effects the user desires.

Figure 9: Example of generated results from generative AI models.

2.5 Applications of Generative AI↩︎

In this section, we introduce widely used applications of generative AI, including image generation (Section 2.5.1), video generation (Section 2.5.2), and 3D model generation (Section 2.5.3). Furthermore, we present results from presented models in Figure 9 as illustrative references.

2.5.1 Image Generation↩︎

2.5.1.1 Text-to-image

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision, StackGAN[35] proposed a two-stage model to solve this issue. In the first stage, StackGAN generates the primitive shape and colors of the object based on the given text description, yielding the initial low-resolution images. In the second stage, StackGAN takes the low-resolution result and text prompts as inputs and generates high-resolution images with photo-realistic details. It can rectify defects in the results of the first stage and add exhaustive details during the refinement process. GLIDE[36] extends the core concepts of the diffusion model by adding additional text information to enhance the training process, ultimately generating text-conditioned images. On this basis, by using diffusion models and massive data, With the release of LDM[9], Stable Diffusion based on LDM has also sprung up. These works cover areas such as image editing and more powerful 3D generation, further advancing image generation and making it closer to human needs.

2.5.1.2 Image-to-image

Image-to-image translation can convert the content in an image from one image domain to another, that is, cross-domain conversion between images.

Sketch. The objective of sketch-to-image generation is to ensure that the generated image maintains consistency in both appearance and context with the provided hand-drawn sketch. Pix2Pix[13] stands out as a classic GAN model capable of handling diverse image translation tasks, including the transformation of sketches into fully realized images. In addition, SketchyGAN[37] focuses on the sketch-to-image generation task and aims to achieve more diversity and realism. Currently, ControlNet[38] can control diffusion models by adding extra conditions. The sketch-to-image generation tasks are applied in both photo-realistic and anime-cartoon styles[39], [40].

Layout. Layout typically encompasses details such as the position, size, and relative relationships of individual objects. Layout2Im[41] is designed to take a coarse spatial layout, consisting of bounding boxes and object categories, for generating a set of realistic images. These images accurately depict the specified objects in their intended locations. To enhance the global attention in context, He et al.[42] introduced the Context Feature Conversion Module to ensure that the generated feature encoding for objects remains aware of other coexisting objects in the scene. As for diffusion models, GLIGEN[43] facilitates grounded text-to-image generation in open worlds using prompts and bounding boxes as condition inputs.

Scene Graph. The scene graph was proposed and utilized first in 2018[44], which is used to enable explicit reasoning about objects and their relationships. Thereafter, Sortino et al. [45] proposed a model that can satisfy semantic constraints defined by a scene graph and to model relations between visual objects in the scene by taking into account a user-provided partial rendering of the desired target. Currently, SceneGenie[46] combined the scene graphs with advanced diffusion models to generate high-quality images. Which enforces geometric constraints in the sampling process using the bounding box and segmentation information predicted from a scene graph.

2.5.2 Video Generation↩︎

Since text prompts only generate some discrete tokens, text-to-video generation is more difficult than tasks such as image retrieval and image captioning. Video diffusion model[47] is the first paper to use the diffusion model for video generation tasks. The video diffusion model proposes 3D UNet, which can be applied on variable sequence lengths. Thus it can be jointly trained on video and image modeling goals, making it suitable for video generation tasks. Additionally, Make-A-Video[48] is based on the pre-trained text-to-image model and adds one-dimensional convolution and attention layers in the time dimension to transform it into a text2video model. By learning the connection between text and vision through the T2I model, the single-modal video data is utilized to learn the generation of temporal dynamic content. Furthermore, the controllability and consistency of video generation models have also garnered increased attention from researchers. PIKA[49] has been proposed to support dynamic transformations of elements in the scene based on prompts, without causing the overall image collapse. DynamiCrafter[50] utilizes pre-trained video diffusion priors to add animation effects to static images based on textual prompts. This tool supports high-resolution models, providing better dynamic effects, higher resolution, and stronger consistency.

2.5.3 3D Model Generation↩︎

2.5.3.1 Text-to-3D

Recent advancements in text-to-3D synthesis have demonstrated remarkable progress, with researchers employing various sophisticated strategies to bridge the gap between natural language descriptions and the creation of detailed 3D content. The pioneering work DreamFusion[51] harnesses a pre-trained 2D text-to-image diffusion model to generate 3D models without large-scale labeled 3D datasets or specialized denoising architectures. Magic3D[52] improves upon DreamFusion’s[51] limitations by implementing a two-stage coarse-to-fine approach, accelerating the optimization process through a sparse 3D representation before refining it into high-resolution textured meshes via a differentiable renderer.

2.5.3.2 Image-to-3D

Recent 3D reconstruction techniques particularly focus on generating and reconstructing three-dimensional objects and scenes from a single or few images. NeRF[53] represents a state-of-the-art technique where complex scene representations are modeled as continuous neural radiance fields optimized with sparse input views. CLIP-NeRF[54] leveraging the joint language-image embedding space of CLIP model, proposes a unified framework that allows manipulating NeRF in a user-friendly way, using either a short text prompt or an exemplar image. DreamCraft3D[55] introduces a hierarchical process for 3D content creation that employs bootstrapped score distillation sampling from a view-dependent diffusion model. This two-step method refines textures through personalized diffusion models trained on augmented scene renderings, thereby delivering high-fidelity, coherent 3D objects. While Magic123[56] offers a two-stage solution for generating high-quality textured 3D meshes from unposed wild images. It optimizes a neural radiance field for coarse geometry and fine-tunes details using differentiable mesh representations guided by both 2D and 3D diffusion priors.

3 Generative AI for Architectural Design↩︎

This study delineates the architecture design into six main steps to facilitate a convenient understanding of the process and essence of architectural design. The output of each step is generated based on the project’s objective conditions and the architects’ subjective intentions. Objective conditions (O) include factors such as site area, building height restrictions, and construction standards that must be adhered to by all architects. Subjective intentions (S) refer to the individual architect’s design concept, architectural style, and other subjective preferences. This study explores how generative AI can assist with preliminary design, layout design, structural design, 3D form design, façade design, and imagery expressions based on objective conditions and subjective intentions. It also presents a statistical analysis of generative AI models used in each architectural step and the tasks they accomplished.

3.1 Architectural Preliminary 3D Forms Design↩︎

To begin with, creating a preliminary 3D architecture model involves considering objective factors such as the building’s type and function, site conditions, surroundings environment, and subjective factors such as design concepts and morphological intentions. This process can be expressed by Equations (4).

\[{!}{F_{\text{P-3D}} = \left\{ y_{\text{P-3D}} \mid y_{\text{P-3D}} \in \bigcap_{i=1}^4 f_{\text{P-3D}}(o_{\text{P-3D}}^i) \cap f_{\text{P-3D}}(S_{\text{P-3D}}) \right\}}\]

Where \(y_{\text{P-3D}}\) is the generated preliminary 3D model of the architecture, \(F_{\text{P-3D}}\) is the collection of all the options. \(O_{\text{P-3D}}\) refers to the Objective conditions of the preliminary design, which includes design tasks (\(o_{\text{P-3D}}^1\)), such as building functions, building area, building height restrictions, and the number of occupants; site conditions (\(o_{\text{P-3D}}^2\)), such as the red line of the site, the shape of the boundaries; surroundings conditions (\(o_{\text{P-3D}}^3\)), such as nearby traffic arteries, neighbor buildings; and environmental performance (\(o_{\text{P-3D}}^4\)), such as the daylighting, wind and thermal environment. \(S_{\text{P-3D}}\) refers to the Subjective intentions of the preliminary design.

Figure 10: Architectural preliminary 3D forms design process.

Table 1: Application of Generative AI in the preliminary 3D forms design of architecture
Data Transformation Approach Paper & Methodology
\(parameters\) to \(F_{\text{P-3D}}\) VAE [57]; GAN, VAE [58]; 3D-DDPM [59]; GANs [60]; 3D-GAN, CPCGAN [61]
\(classify\) \(F_{\text{P-3D}}\) VAE [62]; 3D-AAE [63]
\(S_{\text{P-3D\_text}}\) to \(F_{\text{P-3D}}\) CVAE [64]
\(R_{\text{P-3D\_sketch}}\) to \(F_{\text{P-3D}}\) VAE, GAN [65]
\(R_{\text{P-3D\_2d}}\) to \(R_{\text{P-3D\_2d}}\) pix2pix [66], [67];DCGAN [68];pix2pix, CycleGAN [69], [70]
(\(o_{\text{P-3D}}^2\) + \(o_{\text{P-3D}}^3\)) to \(F_{\text{P-3D}}\) pix2pix [71]; ESGAN[72]
(\(S_{\text{P-3D}}\) + \(o_{\text{P-3D}}^3\)) to \(F_{\text{P-3D}}\) cGAN [73]; GAN [74]
\(F_{\text{P-3D}}\) to \(F_{\text{P-3D}}\) TreeGAN [75]; DDPM [76]
(\(F_{\text{P-3D}}\) + \(o_{P-3D}^2\)) to \(o_{P-3D}^4\) VAE [77]; pix2pix, cycleGAN [78]

To elucidate the specific architectural design process, this paper takes the Bo-DAA apartment project in Seoul, South Korea, as an example. The project requirements include multiple residential units and shared public spaces, encompassing a communal workspace, lounge, shared kitchen, laundry room, and pet bathing area (\(o_{\text{P-3D}}^1\)). The site is a regular rectangle with flat terrain (\(o_{\text{P-3D}}^2\)), located in an urban setting surrounded by multi-story residential buildings (\(o_{\text{P-3D}}^3\)). To enhance resident comfort, the design considered lighting and views for each residential unit (\(o_{\text{P-3D}}^4\)). Based on these requirements, the architect chose "Book House" as the design concept (\(S_{\text{P-3D}}\)), creating a preliminary 3D form (\(F_{\text{P-3D}}\)) that gradually tapers from bottom to top. This design provides excellent lighting and views for each residential unit level. This process is illustrated in Figure 10.

The applications of generative AI in this process include four main categories, as shown in the table1: generating \(F_{\text{P-3D}}\) based on parameters and classification; generating \(F_{\text{P-3D}}\) based on 2D images or 1D text (usually from \(o_{\text{P-3D}}^1\), \(o_{\text{P-3D}}^2\) , \(o_{\text{P-3D}}^3\) and \(S_{\text{P-3D}}\), \(S_{\text{P-3D\_text}}\), \(R_{\text{P-3D\_2d}}\), \(R_{\text{P-3D\_sketch}}\)); and generating or redesign \(F_{\text{P-3D}}\) based on 3D model data (usually from \(F_{\text{P-3D}}\)); and generating environmental performance evaluation (usually from \(o_{\text{P-3D}}^4\)) based on 3D model data (usually from \(F_{\text{P-3D}}\)).

Firstly, it facilitates generative AI in generating preliminary 3D forms based on input parameters or in conducting classification analysis on preliminary 3D models. Initially, Variational Autoencoders (VAE) play a pivotal role in reconstructing and generating detailed 3D models (\(F_{\text{P-3D}}\)) from a set of input parameters (\(parameters\) to \(F_{\text{P-3D}}\)) [57]. Building upon this, Generative Adversarial Networks (GAN) further refine the process by training on the point coordinate data of 3D models, utilizing category parameters for more precise reconstructions (\(parameters\) to \(F_{\text{P-3D}}\)) [60]. And the approach facilitates the creation of innovative architectural 3D forms through the technique of input interpolation (\(parameters\) to \(F_{\text{P-3D}}\)) [58]. Also, diffusion probability models offer a unique method of training Taihu stone and architectural 3D models, this training enables the discovery of transitional forms between two distinct 3D models by employing interpolation as an input mechanism (\(parameters\) to \(F_{\text{P-3D}}\)) [59]. The Structure GAN model, focusing on point cloud data, enables the generation of 3D models based on specific input parameters such as length, width, and height (\(parameters\) to \(F_{\text{P-3D}}\)) [61]. In a further enhancement to the modeling process, VAE is also utilized for the in-depth training of 3D models (\(F_{\text{P-3D}}\)). This allows for a comprehensive classification and analysis of the models’ distribution within the latent space, paving the way for more nuanced model creation (\(classify\) \(F_{\text{P-3D}}\)) [62]. Generative AI techniques, such as the 3D Adversarial Autoencoder model, are employed for the training and generation of point cloud representations, facilitating the reconstruction and classification of architectural forms (\(classify\) \(F_{\text{P-3D}}\)) [63].

Secondly, it involves using 1D text data or 2D image data as the generation conditions for generative AI to produce preliminary 3D forms. Variational Autoencoders (VAE) are also applied to train and generate 3D voxel models, guided by textual labels (\(S_{\text{P-3D\_text}}\) to \(F_{\text{P-3D}}\)) [64]. Lastly, the integration of VAE and GAN models facilitates the generation of architectural 3D forms from sketches (\(S_{\text{P-3D\_sketch}}\) to \(F_{\text{P-3D}}\)) [65]. The difficulty of training on 3D data is higher than that of training on 2D image data. Facing the challenges associated with training neural networks on 3D forms, researchers have innovated by transforming 3D forms into 2D representations, such as grayscale images enriched with elevation data. This approach simplifies the training process, enhancing efficiency for architectural forms in specific regions and facilitating the generation of 3D models influenced by the surrounding environment (\(R_{\text{P-3D\_2d}}\) to \(R_{\text{P-3D\_2d}}\)) [67][70]. Moreover, the practice of converting 3D models into 2D images for reconstruction, followed by reverting these 2D images back to 3D forms, significantly reduces both training duration and costs, ensuring accurate restoration of the original 3D models (\(R_{\text{P-3D\_2d}}\) to \(R_{\text{P-3D\_2d}}\)) [66]. In other generative AI training strategies, researchers incorporate parameters such as the design site’s scope (\(o_{\text{P-3D}}^2\)) and characteristics of the immediate environment (\(o_{\text{P-3D}}^3\)) as generative conditions. This enables the creation of preliminary 3D models that adhere to predefined rule settings (\(o_{\text{P-3D}}^2\) + \(o_{\text{P-3D}}^3\) to \(F_{\text{P-3D}}\)) [71], [72]. Furthermore, researchers can create architectural 3D models from design concept sketches (\(S_{\text{P-3D}}\) to \(F_{\text{P-3D}}\)) [73], and even from a singular concept sketch in conjunction with environmental data (\(S_{\text{P-3D}}\) + \(o_{\text{P-3D}}^3\) to \(F_{\text{P-3D}}\)) [74].

Afterwards, utilize 3D models as the basis for generative AI creation, or redesign based on generated 3D models, which is generated by generative AI. TreeGAN is used to train point cloud models of churches, leveraging these models for diverse redesign applications (\(F_{\text{P-3D}}\) to \(F_{\text{P-3D}}\)) [75]. Additionally, diffusion probability models are instrumental in training 3D models, introducing noise into 3D models to create novel forms (\(F_{\text{P-3D}}\) to \(F_{\text{P-3D}}\)) [76]. Lastly, generative AI is utilized to conduct site and architectural environmental performance evaluations based on 3D models. This involves generating images for assessments such as view analysis, sunlight exposure, and daylighting rates, among others (\(F_{\text{P-3D}}\) + \(o_{P-3D}^2\) to \(o_{P-3D}^4\)) [77], [78].

3.2 Architectural Plan Design↩︎

Architectural plan design, the second phase in the architectural design process, involves creating horizontal section views at specific site elevations. Guided by objective conditions and subjective decisions, this step includes arranging spatial elements like walls, windows, and doors into a 2D plan. This process can be expressed by Equations (5).

\[\begin{align} F_{\text{Plan}} = \left\{ y_{\text{Plan}} \mid y_{\text{Plan}} \in \bigcap_{i=1}^{3} f_{\text{Plan}}(o_{\text{Plan}}^i) \right.\\ \left. \cap \bigcap_{j=1}^{2} f_{\text{Plan}}(s_{\text{Plan}}^j) \right\} \end{align}\]

Figure 11: Architectural plan design process.

Where \(y_{\text{Plan}}\) is the generated architectural plan design, \(F_{\text{Plan}}\) is the collection of all the options. \(O_{\text{Plan}}\) refers to the Objective conditions of the architectural plan design, which includes the preliminary architectural 3D form design (\(o_{\text{Plan}}^1\)), which is the result of the prior design phase; spatial requirements and standards (\(o_{\text{Plan}}^2\)), such as space area and quantity needs; Spatial environmental performance evaluations (\(o_{\text{Plan}}^3\)), such as room daylighting ratio, ventilation rate, etc . \(S_{\text{Plan}}\) refers to the Subjective intentions of the architectural plan design. which includes functional space layout (\(s_{\text{Plan}}^1\)), indicating the size and layout of functional spaces; Spatial sequences (\(s_{\text{Plan}}^2\)), such as bubble diagrams and sequence schematics.

By accumulating the plan design results of each layer, the overall plan design outcome is obtained, represented as Equations (6): \[R_{\text{Plan}} = \sum_{i=1}^{n} F_{\text{Plan}}^i\]

Using the Bo-DAA apartment project as an example, architects first create a preliminary 3D model (\(o_{\text{Plan}}^1\)) to outline each floor’s Plan based on the model’s elevation contours. They then design functional spaces (\(s_{\text{Plan}}^1\)) according to spatial requirements (\(o_{\text{Plan}}^2\)), as evacuation distances and space area needs, positioning public areas on the lower floors and residential units above. Spatial sequences (\(s_{\text{Plan}}^2\)) are structured using corridors and atriums to align with the layout. Environmental evaluations (\(o_{\text{Plan}}^3\)) are also conducted to ensure spatial performance. This leads to a comprehensive architectural plan (\(R_{\text{Plan}}\)) that meets all established constraints.This process is showed in Figure 11.

Table 2: Application of generative AI in the architectural plan design
Data Transformation Approach Paper & Methodology
\(o_{\text{Plan}}^1\) to \(s_{\text{Plan}}^1\) to \(F_{\text{Plan}}\) GANs [79][83];
\(s_{\text{Plan}}^1\) to \(F_{\text{Plan}}\) pix2pix [84];
\(s_{\text{Plan}}^2\) to \(F_{\text{Plan}}\) Graph2Plan [85]; pix2pix [86]; CycleGAN [87]
\(F_{\text{Plan}}\) to \(F_{\text{Plan}}\) pix2pix [88]; GANs [89]
\(o_{\text{Plan}}^3\) to \(F_{\text{Plan}}\) pix2pix [90]
(\(s_{\text{Plan}}^1 + o_{\text{P-3D}}^4\)) to \(s_{\text{Plan}}^1\) Genetic-Algorithm, FCN [91]
\(s_{\text{Plan}}^1\) to \(s_{\text{Plan}}^1\) GNN, VAE [92]
\(o_{\text{Plan}}^2\) to \(s_{\text{Plan}}^1\) CoGAN [93]
(\(s_{\text{Plan}}^1 + o_{\text{P-3D}}^3\)) to \(s_{\text{Plan}}^1\) GANs [94][96]; CNN, pix2pixHD [94];
(\(s_{\text{Plan}}^1 + o_{\text{Plan}}^2\)) to \(s_{\text{Plan}}^1\) Transformer [97]
\(s_{\text{Plan}}^2\) to \(s_{\text{Plan}}^1\) GANs [98][105]; Transformer [106]; DM [107]
\(o_{\text{P-3D}}^2\) to \(s_{\text{Plan}}^1\) pix2pix [104], [108], [109]; GauGAN [110]
\(o_{\text{Plan}}^1\) to \(s_{\text{Plan}}^1\) GANs [111]; StyleGAN, Graph2Plan, RPLAN [112]; pix2pix [113]
\(R_{\text{Plan}}\) to \(s_{\text{Plan}}^2\) EdgeGAN [114]; cGAN [115]
\(o_{\text{Plan}}^3\) to \(s_{\text{Plan}}^2\) DCGAN [116]; VQ-VAE, GPT [117]
\(o_{\text{Plan}}^1\) to \(s_{\text{Plan}}^2\) DDPM [118]
\(F_{\text{Plan}}\) to \(o_{\text{Plan}}^3\) cGAN [119]
\(s_{\text{Plan}}^1\) to \(o_{\text{Plan}}^3\) pix2pix [120]; pix2pix [121]

The applications of generative AI in plan design include four main categories, as shown in the table2: generating floor plan \(F_{\text{Plan}}\) based on 2D images (usually from \(o_{\text{Plan}}^1\),\(o_{\text{Plan}}^3\), \(s_{\text{Plan}}^1\), \(s_{\text{Plan}}^2\), and \(F_{\text{Plan}}\)) , generating functional space layout \(s_{\text{Plan}}^1\) based on 2D images (usually from \(s_{\text{Plan}}^1\), \(s_{\text{Plan}}^2\),\(o_{\text{Plan}}^1\),\(o_{\text{Plan}}^2\), \(o_{\text{P-3D}}^2\),\(o_{\text{P-3D}}^3\),\(o_{\text{P-3D}}^4\)), generating spatial sequences \(s_{\text{Plan}}^2\) based on 2D images (usually from \(R_{\text{Plan}}\),\(o_{\text{Plan}}^1\), \(o_{\text{Plan}}^3\)), generating Spatial environmental performance evaluations \(o_{\text{Plan}}^3\) based on 2D images (usually from \(F_{\text{Plan}}\),\(s_{\text{Plan}}^1\)).

Firstly, In terms of generating architectural floor plans, researchers can create functional space layout diagrams from preliminary design range schematics or site range schematics and then generate the final architectural Plan based on these layouts, progressing from \(o_{\text{Plan}}^1\) to \(s_{\text{Plan}}^1\), and finally to \(F_{\text{Plan}}\) [79][83]. Architectural floor plans can directly generated from functional space layout diagrams (\(s_{\text{Plan}}^1\) to \(F_{\text{Plan}}\)) [84]. Additionally, researchers utilize generative models to convert planar spatial bubble diagrams or spatial sequence diagrams into latent spatial vectors, which are then used to generate architectural floor plans (\(s_{\text{Plan}}^2\) to \(F_{\text{Plan}}\)) [85][87]. Moreover, by utilizing GAN models, architectural floor plans can be further refined to obtain floor plans with flat furniture. (\(F_{\text{Plan}}\) to \(F_{\text{Plan}}\)) [88]. Some processes of reconstruction and generation of floor plans are achieved through training architectural floor plans (\(F_{\text{Plan}}\) to \(F_{\text{Plan}}\))[89]. Generating floor plans can also be produced based on space Environmental evaluations, such as lighting and wind conditions (\(o_{\text{Plan}}^3\) to \(F_{\text{Plan}}\))[90].

Secondly, generative AI is not limited to producing architectural floor plans but also plays various roles in the generation of functional space layouts (\(s_{\text{Plan}}^1\)). For instance, it can utilize neural networks and genetic algorithms to enhance functional layouts based on wind environments performance evaluations (\(s_{\text{Plan}}^1 + o_{\text{P-3D}}^4\) to \(s_{\text{Plan}}^1\)) [91]. Moreover, Generative AI can reconstruct and produce matching functional layout diagrams based on the implicit information within the functional space layout maps (\(s_{\text{Plan}}^1\) to \(s_{\text{Plan}}^1\))[92]. Furthermore, it can generate viable functional layouts per spatial requirements(\(o_{\text{Plan}}^2\) to \(s_{\text{Plan}}^1\) )[93]. Additionally, it can predict and generate functional space layouts based on surrounding environmental performance evaluations (\(s_{\text{Plan}}^1 + o_{\text{P-3D}}^3\) to \(s_{\text{Plan}}^1\)) [94][96], [122] . Similarly, Generative AI possesses the ability to complement or augment incomplete functional space layouts based on specific demands (\(s_{\text{Plan}}^1 + o_{\text{Plan}}^2\) to \(s_{\text{Plan}}^1\)) [97], [123], and it use spatial sequences to generate functional space layout diagrams (\(s_{\text{Plan}}^2\) to \(s_{\text{Plan}}^1\)) [98][107] . In addition, it can generate functional space layout diagrams based on the designated red-line boundary of a design site (\(o_{\text{P-3D}}^2\) to \(s_{\text{Plan}}^1\)) [104], [108][110] , and it skillfully use plan design boundaries as conditions to generate functional space layout diagrams (\(o_{\text{Plan}}^1\) to \(s_{\text{Plan}}^1\)) [111][113].

Thirdly, generative AI demonstrates exceptional performance in the generation and prediction of spatial sequences (\(s_{\text{Plan}}^2\)). Specifically, It is capable of identifying and reconstructing wall layout sequences from floor plans (\(R_{\text{Plan}}\) to \(s_{\text{Plan}}^2\)) [114], [115].Additionally, it can construct spatial sequence bubble diagrams directly from these floor plans (\(R_{\text{Plan}}\) to \(s_{\text{Plan}}^2\)) [124]. Moreover, generative AI can employ isovists for predicting spatial sequences (\(o_{\text{Plan}}^3\) to \(s_{\text{Plan}}^2\)) [116], [117] . Lastly, it is also capable of producing these diagrams conditioned on specific plan design boundary ranges (\(o_{\text{Plan}}^1\) to \(s_{\text{Plan}}^2\)) [118] .

Lastly, generative AI can foresee space environmental performance evaluations from floor plans (\(F_{\text{Plan}}\) to \(o_{\text{Plan}}^3\)) [119] , such as light exposure and isovist ranges. It can also predict indoor brightness [120] and daylight penetration [121] using functional space layout diagrams (\(s_{\text{Plan}}^1\) to \(o_{\text{Plan}}^3\)).

Figure 12: Architectural structural system design process.

Table 3: Application of Generative AI in Architectural Structural System Design
Data Transformation Approach Paper & Methodology
\(o_{\text{str}}^2\) to \((x_{\text{str}}, y_{\text{str}})\) GANs [125][127]
(\(l_{\text{str\_text}} + o_{\text{str}}^2\)) to \((x_{\text{str}}, y_{\text{str}})\) GANs [128][131]
\((x_{\text{str}}, y_{\text{str}})\) to \((x_{\text{str}}, y_{\text{str}})\) GANs [132]
(\(s_{\text{Plan}}^1 + o_{\text{str}}^2\)) to \((x_{\text{str}}, y_{\text{str}})\) pix2pixHD [133]
(\((x_{\text{str}}, y_{\text{str}}) + d_{\text{str}}\)) to (\((x_{\text{str}}, y_{\text{str}}) + d_{\text{str}}\)) StructGAN-KNWL [134]

3.3 Architectural Structural System Design↩︎

The third phase in the architectural design process, architectural structure system design, involves architects developing the building’s framework and support mechanisms. This process can be expressed by Equations (7). \[{!}{F_{\text{str}} = \left\{ y_{\text{str}} \, \middle| \, y_{\text{str}} \in \bigcap_{i=1}^{3} f_{\text{str}}(o_{\text{str}}^i) \cap \bigcap_{j=1}^{2} f_{\text{str}}(s_{\text{str}}^j) \right\}}\]

Where \(y_{\text{str}}\) is the generated structure system, \(F_{\text{str}}\) is the collection of all the options. \(O_{\text{P-3D}}\) refers to the objective conditions of the preliminary design, which includes the structural load distribution (\(o_{\text{str}}^1\)), referring to the schematic of the building’s structural load distribution; architectural plan design (\(o_{\text{str}}^2\)), the second step of the design process; the preliminary 3D form of the building (\(o_{\text{str}}^3\)), being the result of the first step in the design process. \(S_{\text{P-3D}}\) refers to the Subjective decisions of the structure system design, which includes structural materials (\(s_{\text{str}}^1\)) , typically encompassing parameters characterizing the materials and texture images; structural aesthetic principles (\(s_{\text{str}}^2\)) , usually involving conceptual diagrams and 3D models of the structural form. The design outcome \(y_{\text{str}}\) in the formula encapsulates various structural information, such as structural load capacity (\(l_{\text{str}}\)), structural dimensions (\(d_{\text{str}}\)), and structural layout (\(x_{\text{str}}, y_{\text{str}}\)). This structural information is determined by a set of objective conditions (\(O_{\text{str}}\)) and a set of subjective decisions (\(S_{\text{str}}\)).

Using the Bo-DAA apartment project as an illustration, the architect utilized the preliminary 3D model (\(o_{\text{str}}^3\)) and the architectural plan (\(o_{\text{str}}^2\)) to define the building’s spatial form and structural load distribution (\(o_{\text{str}}^1\)). Opting for a frame structure, reinforced concrete was chosen as the construction material (\(s_{\text{str}}^1\)), embodying modern Brutalism (\(s_{\text{str}}^2\)). This approach ensured that the final structure (\(R_{\text{str}}\)) adhered to both the aesthetic and the functional constraints.This process is represented in Figure 12.

The applications of generative AI in structural system design primarily involve the prediction of structural layout (\((x_{\text{str}}, y_{\text{str}})\)) and structural dimensions (\(d_{\text{str}}\)).

In the realm of generating architectural structure layout images, generative AI is capable of recognizing architectural floor plans (\(o_{\text{str}}^2\)). Consequently, it leverages this recognition to generate detailed images of the structural layout (\(o_{\text{str}}^2\) to \((x_{\text{str}}, y_{\text{str}})\)) [125][127].

. Moreover, this technology is adept at creating structural layout diagrams that correspond to floor plans based on specified structural load capacities (\(l_{\text{str\_text}} + o_{\text{str}}^2\) to \((x_{\text{str}}, y_{\text{str}})\)) [128][131].Additionally, generative AI holds the capability to refine and enhance existing structural layouts, optimizing the layouts within the same structural space from(\((x_{\text{str}}, y_{\text{str}})\) to \((x_{\text{str}}, y_{\text{str}})\)) [132]. Furthermore, generative AI combines functional space layouts (\(s_{\text{Plan}}^1\)) and architectural floor plans (\(o_{\text{str}}^2\)) to create corresponding architectural structure layout diagrams (\(s_{\text{Plan}}^1 + o_{\text{str}}^2\) to \((x_{\text{str}}, y_{\text{str}})\)) [133].

In terms of predicting and generating structural dimensions, generative AI can forecast and create more appropriate structural sizes and layouts based on the layout and existing dimensions, thereby optimizing these dimensions (\((x_{\text{str}}, y_{\text{str}}) + d_{\text{str}}\) to \((x_{\text{str}}, y_{\text{str}}) + d_{\text{str}}\)) [134]. Furthermore, It can also generate dimensions and layouts that meet load requirements based on structural layout (\(x_{\text{str}}, y_{\text{str}}\)) and load capacity (\(l_{\text{str}}\)), (\((x_{\text{str}}, y_{\text{str}}) + l_{\text{str}}\) to \((x_{\text{str}}, y_{\text{str}}) + d_{\text{str}}\)).

3.4 Architectural 3D Forms Refinement and Optimization Design↩︎

Architectural design’s fourth phase focuses on refining and optimizing 3D models to closely represent the building’s characteristics based on the initial model. This step can enhance the detail and form, and this process can be expressed by Equations (8).

\[\begin{align} F_{\text{D-3D}} = \Bigg\{ y_{\text{D-3D}} \Bigg|y_{\text{D-3D}} \in \\ \bigcap_{i=1}^{4} f_{\text{D-3D}}(o_{\text{D-3D}}^i) \cap \bigcap_{j=1}^{2} f_{\text{D-3D}}(s_{\text{D-3D}}^j) \Bigg\} \end{align}\]

Where \(y_{\text{D-3D}}\) is the generated refined 3D model of the architecture, \(F_{\text{D-3D}}\) is the collection of all the options. \(O_{\text{D-3D}}\) refers to the Objective conditions of the preliminary design, which includes Requirements (\(o_{\text{D-3D}}^1\)) of the refinement of the architectural 3D form by indicator; the preliminary 3D forms design of architectural (\(o_{\text{D-3D}}^2\) ), the result of the first step in the design process; Architectural floor plan design (\(o_{\text{D-3D}}^3\)), the outcome of the second step in the design process; Architectural structural system design (\(o_{\text{D-3D}}^4\)), the result of the third step in the design process. \(S_{\text{D-3D}}\) refers to the subjective decisions of refined 3D model of the architecture, which includes Aesthetic principles (\(s_{\text{D-3D}}^1\)), which are principles used by architects to control the overall form and proportions of a building ; The design style (\(s_{\text{D-3D}}^2\)), which mean a manifestation of a period or region’s specific characteristics and expression methods that can be reflected through elements such as the form, structure, materials, color, and decoration of a building.

Using the Bo-DAA apartment project, the architectural form index (\(o_{\text{D-3D}}^1\)) including key metrics like floor area ratio and height is first established. Next, the preliminary 3D form (\(o_{\text{D-3D}}^2\)) shapes a tapered volume. In the floor plan phase (\(o_{\text{D-3D}}^3\)), refinements such as a sixth-floor setback for public spaces are made. The structural system design (\(o_{\text{D-3D}}^4\)) guides these modifications within structural principles. Aesthetic principles (\(s_{\text{D-3D}}^1\)) and design styles (\(s_{\text{D-3D}}^2\)) are woven throughout, culminating in a refined 3D form (\(R_{\text{D-3D}}\)) that harmonizes constraints with aesthetics..This process is illustrated in Figure 13.

Figure 13: Architectural 3D forms refinement and optimization design process.

Table 4: Application of Generative AI in Architectural 3D Forms Refinement and Optimization Design.
Data Transformation Approach Paper & Methodology
\(generate\) \(F_{\text{D-3D}}\) 3D-GAN [135]
\(classify\) \(F_{\text{D-3D}}\) VAE [136]
\(parameters\) to \(F_{\text{D-3D}}\) DCGAN, StyleGAN [137]; 3D-GAN [138]
\(s_{\text{D-3D\_text}}^2\) to \(F_{\text{D-3D}}\) 3D-GAN [139]
\(F_{\text{D-3D\_2d}}\) to \(F_{\text{D-3D\_2d}}\) StyleGAN, pix2pix [140]; pix2pix, CycleGAN [141]
\(o_{\text{D-3D}}^3\) to \(F_{\text{D-3D}}\) StyleGAN [142]
(\(o_{\text{P-3D}}^1\) + \(s_{\text{Plan}}^2\)) to \(F_{\text{D-3D}}\) GCN [142]; cGAN, GNN [143]

The applications of generative AI in architectural 3D forms refinement and optimization design include two main categories, as shown in the table4: Using parameters or 1D text to generate \(F_{\text{D-3D}}\), or to conduct classification analysis (usually from \(s_{\text{D-3D\_text}}^2\)); generating \(F_{\text{P-3D}}\), which represented by 2D images or 3D models, based on 2D images (usually from \(F_{\text{D-3D\_2d}}\), \(o_{\text{D-3D}}^3\), \(s_{\text{Plan}}^2\)).

Figure 14: Architectural facade design process.

In terms of Using parameters or 1D text to generate refined architectural 3D models by generative AI, researchers have trained voxel expression models (\(generate\) to \(F_{\text{D-3D}}\)) [135] to generate these refined models.Additionally, generative AI has been employed to train Signed Distance Function (SDF) voxels, coupled with performing clustering analysis on shallow vector representations of 3D models (\(F_{\text{D-3D}}\)) [136].Following this, 2D images containing 3D voxel information can be generated based on the input RGB channel values. (parameters to \(F_{\text{D-3D\_2d}}\)) [138] . Furthermore, New forms of 3D elements can be generated based on interpolation, transitioning from (parameters to \(F_{\text{D-3D}}\))[137]. Voxelized and point cloud representations of 3D enhanced model components (\(F_{\text{D-3D}}\)) can be trained , and according to the textual labels of architectural components (\(s_{\text{D-3D\_text}}^2\) to \(F_{\text{D-3D}}\)) [139].

In terms of basing 2D images to generate refined architectural 3D models by generative AI, researchers converted the refined architectural 3D models (\(F_{\text{D-3D}}\)) into sectional images, it is possible to train paired sectional diagrams using Generative Adversarial Networks (GANs) to understand the connections between adjacent sectional images. By inputting a single sectional image into the model to reconstruct a new sectional image and then using the newly generated sectional image as input to continue developing sectional images through iteration of the above process, the reconstruction of the 3D model can be completed. (\(F_{\text{D-3D\_2d}}\) to \(F_{\text{D-3D\_2d}}\)) [140], [141] . Concurrently, generative AI can also generate refined 3D models from architectural floor plans (\(o_{\text{D-3D}}^3\) to \(F_{\text{D-3D}}\)) [142], or based on spatial sequence matrices and spatial requirements (\(o_{\text{P-3D}}^1\) + \(s_{\text{Plan}}^2\) to \(F_{\text{D-3D}}\)) [143], [144]. In an innovative approach, generative AI can learn the 3D models of architectural components (\(F_{\text{D-3D}}\)), combining them to create refined architectural 3D models. For instance, architectural 3D model components can be pixelated into 2D images for training.

3.5 Architectural Facade Design↩︎

The fifth step in architectural design focuses on facade design, aiming to create the building’s external appearance that reflects its style and environmental compatibility, incorporating cultural and symbolic elements. This process can be expressed by Equations (9).

\[\begin{align} F_{\text{Fac}} = \Bigg\{ y_{\text{Fac}} \, \Bigg| \, y_{\text{Fac}} \\ \in \bigcap_{i=1}^{4} f_{\text{Fac}}(o_{\text{Fac}}^i) \cap \bigcap_{j=1}^{2} f_{\text{Fac}}(s_{\text{Fac}}^j) \Bigg\} \end{align}\]

Table 5: Application of generative AI in architectural facade design.
Data Transformation Approach Paper & Methodology
\((a_{w} + p_{w} + a_{win} + p_{win} + s_{\text{Fac}}^1)\) to \(F_{\text{Fac}}\) GANs [145][151] DM, CycleGAN [152]
\((a_{w} + p_{w} + a_{win} + p_{win} + s_{\text{Fac}}^1)\) to \(R_{\text{Fac}}\) pix2pix [153]
\(F_{\text{Fac}}\) to \(F_{\text{Fac}}\) CycleGAN [154]; StyleGAN2 [155]
(\(F_{\text{Fac}} + s_{\text{Fac}}^2\)) to \(F_{\text{Fac}}\) GANs [156]
\((a_{w} + p_{w} + a_{win} + p_{win} + s_{\text{Fac}}^1)\) to \((a_{w} + p_{w} + a_{win} + p_{win} + s_{\text{Fac}}^1)\) GAN [157]

Where \(y_{\text{Fac}}\) is the generated architectural facade, \(F_{\text{Fac}}\) is the collection of all the options. \(O_{\text{Fac}}\) refers to the Objective conditions of architectural facade, which includes performance evaluation of the architectural facade (\(o_{\text{Fac}}^1\)), such as daylighting, heat insulation, thermal retention, etc; architectural plan design (\(o_{\text{Fac}}^2\)), which is the result of the second step in the design process; architectural structural system design (\(o_{\text{Fac}}^3\)), which is the outcome of the third step in the design process; and architectural 3D forms refinement and optimization design (\(o_{\text{Fac}}^4\)), which is the result of the fourth step in design process. \(S_{\text{Fac}}\) refers to the subjective decisions of the preliminary design, which includes facade component elements (\(s_{\text{Fac}}^1\)), referring to specific facade component styles employed by the architect, reflecting the designer’s style and concept; materials and style of the facade (\(s_{\text{Fac}}^2\)), different materials bring various textures and colors to the building, exhibiting unique architectural characteristics and styles.

Subsequently, the final facade design outcome can be achieved by summing up the facade design results from each direction. This process can be expressed by Equations (10).

\[R_{\text{Fac}} = \sum_{i=1}^{4} F_{\text{Fac}}^i\]

Each direction’s final facade design outcome \(y_{\text{Fac}}\), encapsulates various facade information, such as the area (\(a_{w}\)) and position (\(p_{w}\)) of the wall surface, the area (\(a_{win}\)) and position (\(p_{win}\)) of the window surface, and the adoption of a specific style for the facade components (\(c_{\text{Fac}}\)). These information are derived from the set of objective conditions \(O_{\text{Fac}}\) and the set of subjective decisions \(S_{\text{Fac}}\).

In the Bo-DAA apartment project, architects use the architectural plan (\(o_{\text{Fac}}^2\)) to define windows and walls, incorporating glass curtain walls on the ground floor. The structural design (\(o_{\text{Fac}}^3\)) guides facade structuring, ensuring alignment with the building’s structure. The refined 3D model (\(o_{\text{Fac}}^4\)) influences the facade’s shape, with residential windows designed to complement the building’s form. Facade performance is enhanced through simulations (\(o_{\text{Fac}}^1\)). Material selection (\(s_{\text{Fac}}^2\)) favors exposed concrete, echoing Brutalist aesthetics (\(s_{\text{Fac}}^1\)), resulting in a minimalist, sculptural facade (\(R_{\text{Fac}}\)).This process is illustrated in Figure 14.

The applications of generative AI in architectural facade design include two main categories, as shown in the table 5: generating \(F_{\text{Fac}}\) based on 2D images (usually from \(s_{\text{Fac}}^2\), \(F_{\text{Fac}}\), semantic segmentation map of facade); generating semantic segmentation map of facade based on 2D images.

In generating architectural facades, initially, it facilitates the generation of facade images by utilizing architectural facade semantic segmentation map, which annotate the precise location and form of facade elements such as walls, window panes, and other components. Consequently, this process involves generating facade images under the constraints of a given wall area (\(a_{w}\)) and position (\(p_{w}\)), window area (\(a_{win}\)), position (\(p_{win}\)), and component elements (\(s_{\text{Fac}}^1\)), represented as (\(a_{w} + (p_{w}) + a_{win} + (p_{win}) + s_{\text{Fac}}^1\)) to \(F_{\text{Fac}}\) [145][152]. Furthermore, complete facade and roof images for all four directions of a building can be generated using semantic segmentation images (\(a_{w} + (p_{w}) + a_{win} + (p_{win}) + s_{\text{Fac}}^1\) to \(R_{\text{Fac}}\)) [153]. Additionally, generative AI proves instrumental in training architectural facade images for both reconstruction and novel generation processes (\(F_{\text{Fac}}\) to \(F_{\text{Fac}}\)) [154]. Its utility is further demonstrated in the application of style transfer to architectural facades, either by incorporating style images (\(F_{\text{Fac}} + s_{\text{Fac}}^2\) to \(F_{\text{Fac}}\)) [156], [158] or by facilitating style transfer between facade images of diverse architectural styles (\(F_{\text{Fac}}\) to \(F_{\text{Fac}}\)) [155].

In generating semantic segmentation maps for architectural facades, generative AI can be employed for the reconstruction and generation of semantic segmentation maps of facades, like rebuilding the occluded parts of a semantic segmentation maps based on the unobstructed parts(\(a_{w} + (p_{w}) + a_{win} + (p_{win}) + s_{\text{Fac}}^1\) to \((a_{w} + (p_{w}) + a_{win} + (p_{win}) + s_{\text{Fac}}^1)\)) [157].

3.6 Architectural Image Expression↩︎

Architectural image expression synthesizes design elements into 2D images, reflecting the architect’s vision and design process. This process can be expressed by Equations (11).

Table 6: Application of generative AI in architectural image expression process.
Data Transformation Approach Paper & Methodology
\(parameter\) to \(F_{\text{Img}}\) GANs [159];
\(s_{Image\_text}^1\) to \(F_{\text{Img}}\) GANs [160][162]; DMs [4], [163][166]; GANs, DMs [5]
(\(s_{Image\_text}^1 + F_{\text{Img}}\)) to \(F_{\text{Img}}\) DMs [167][170]; GANs [171], [172]; GANs, DMs [173]; GANs, CLIP [174], [175]
(\(o_{\text{Img}}^3 + F_{\text{Img}}\)) to \(F_{\text{Img}}\) GANs [176];
\(s_{Image\_mask}^1\) to \(F_{\text{Img}}\) GANs [177]; CycleGAN [178];
\(s_{\text{Img}}^2\) to \(F_{\text{Img}}\) GANs [179][182]
\(F_{\text{Img}}\) to \(F_{\text{Img}}\) GANs [183][186];
\(s_{\text{Img}}^2\) to \(s_{\text{Img}}^2\) GAN [187];
\(F_{\text{Img}}\) to \(s_{\text{Img}}^1\) VAE [188];

\[\begin{align} F_{\text{Img}} = \Bigg\{ y_{\text{Img}} \, \Bigg| \, y_{\text{Img}} \\ \in \bigcap_{i=1}^{4} f_{\text{Img}}(o_{\text{Img}}^i) \cap \bigcap_{j=1}^{2} f_{\text{Img}}(s_{\text{Img}}^j) \Bigg\} \end{align}\]

Where \(y_{\text{Img}}\) is the generated architectural images, \(F_{\text{Img}}\) is the collection of all the options. \(O_{\text{Img}}\) refers to the Objective conditions of the architectural image expression, which includes Architectural plan design (\(o_{\text{Img}}^1\)), which is the result of the second step in the design process; architectural structural system design, (\(o_{\text{Img}}^2\)), which is the result of the third step in the design process; refined 3D form of architecture (\(o_{\text{Img}}^3\)), which is the result of the fourth step in the design process. architectural facade design (\(o_{\text{Img}}^4\)), which is the result of the fifth step in the design process. \(S_{\text{Img}}\) refers to the Subjective decisions of the architectural plan design. which includes aesthetic principles (\(s_{\text{Img}}^1\)) and image style (\(s_{\text{Img}}^2\)), indicating elements are principles architects use to control the composition and style of architectural images.

The applications of generative AI in architectural image expression include three main categories, as shown in the table 6: generating architectural image \(F_{\text{Fac}}\) based on 1D text (usually from parameter, \(s_{Image\_text}^1\)) ; generating architectural image \(F_{\text{Fac}}\) based on 2D images (usually from ) ; generating architectural different style images or semantic images (\(s_{\text{Img}}^1\),\(s_{\text{Img}}^2\)) based on 2D images (usually from \(s_{\text{Img}}^2\), \(F_{\text{Fac}}\)).

In generating architectural images based on 1D text, researchers employ linear interpolation techniques to create architectural images from varying perspectives (parameter to \(F_{\text{Img}}\)) [159]. Moreover, the direct generation of architectural images from textual prompts simplifies and streamlines the process (\(s_{Image\_text}^1\) to \(F_{\text{Img}}\))[4], [5], [161], [163][165].This approach is also effective for generating architectural interior images, as demonstrated by the use of Stable Diffusion for interior renderings (\(s_{Image\_text}^1\) to \(F_{\text{Img}}\)) [166].

In generating architectural images based on 2D images, several researchers have focused on training architectural images and their paired textual prompts using generative AI models, facilitating the creation of architectural images based on the textual prompts (\(s_{Image\_text}^1 + F_{\text{Img}}\) to \(F_{\text{Img}}\)) [160], [167][175]. Additionly, researchers utilize generative AI models to train architectural images and corresponding textual prompts, generating architectural images based on these prompts (\(o_{\text{Img}}^3 + F_{\text{Img}}\) to \(F_{\text{Img}}\)) [162], [176].Furthermore, the direct generation of architectural images can be achieved through the use of image semantic labels (masks) or textual descriptions. Specifically, generating architectural images from image semantic labels offers precise control over the content of the generated images (\(s_{Image\_mask}^1\) to \(F_{\text{Img}}\)) [177], [178]; Researchers have also explored the transformation of architectural images across different styles, such as generating architectural images from sketches or line drawings (\(s_{\text{Img}}^2\) to \(F_{\text{Img}}\)) [179][182]. By leveraging generative AI models, architectural images can undergo style blending, where images are generated based on two input images, enhancing the versatility of architectural visualization (\(F_{\text{Img}}\) to \(F_{\text{Img}}\)) [186]. Employing GAN models to generate comfortable underground space renderings from virtual 3D space images (\(F_{\text{Img}}\) to \(F_{\text{Img}}\)) [183]; and facilitates the creation of interior decoration images from 360-degree panoramic interior images (\(F_{\text{Img}}\) to \(F_{\text{Img}}\)) [184]. Moreover, Using StyleGAN2 to generate architectural facade and floor plan images (\(F_{\text{Img}}\) to \(F_{\text{Img}}\)) [185] serves as a basis for establishing 3D architectural models.

In generating architectural different style images or semantic images based on 2D images, generative AI can be instrumental in the reconstruction and generation of architectural line drawings (\(s_{\text{Img}}^2\) to \(s_{\text{Img}}^2\)) [187]. And generative AI is capable of producing semantic style images that correspond to architectural images (\(F_{\text{Img}}\) to \(s_{\text{Img}}^1\)) [188].

4 Future Research Directions↩︎

In this section, we illustrate the potential future research directions to apply generative AI in architectural design using the latest emerging techniques of image, video and 3D forms generations (Section 2).

4.1 Architectural Design Image Generation↩︎

4.1.0.1 Floor Plan Generation

Researchers have applied various generative AI image generation techniques to the design and generation of architectural plan images. As technology advances, architects can gradually incorporate more conditional constraints into the generation of floor plans, allowing generative AI to take over the thought process of architects. Architects can supply text data to the generative models, Text data encompasses client design requirements and architectural design standards (\(o_{\text{Plan}}^2\)), such as building type, occupancy, spatial needs, dimensions of architectural spaces, evacuation route settings and dimensions, fire safety layout standards, etc. Architects also can supply image data to the generative models, such as site plans (\(o_{\text{P-3D}}^2\)), which define the specific land use of architectural projects, nearby buildings and natural features (\(o_{\text{P-3D}}^3\)), as well as floor layout diagrams (\(s_{\text{Plan}}^1\)) or spatial sequence diagrams (\(s_{\text{Plan}}^2\)) .

Based on the aforementioned method, some generative AI models hold developmental potential in architectural floor plan generation. "Scene Graph" is a data structure capable of intricately describing the elements within a scene and their interrelations, consisting of nodes and edges. This structure is particularly suited for depicting the connectivity within architectural floor plans. By integrating diffusion models, SceneGenie[46] can accurately generate architectural floor plans using Scene Graphs. Furthermore, technologies such as Stable Diffusion[9] and Imagen[189] allow for further refinement in the generation process of architectural floor plans through text prompts and layout controls.

Figure 15: Existing generative models can generate layout diagrams of rooms based on input text and can also be controlled accordingly based on input layouts.

As shown in Figure 15, existing generative models, such as Stable Diffusion[9] and Imagen[189], can generate complete architectural designs based on textual input. However, the generated images often fail to meet professional standards and may lack rational layout adherence to designers’ intentions. Nonetheless, with the advancement of conditional image generation, it is now possible to incorporate additional constraints such as bounding boxes to control the generation process of diffusion models. This integration holds promise for aligning with layout considerations in architectural design.

4.1.0.2 Elevation Generation

Applications of generative AI on facade generation based on semantic segmentation , textual descriptions, and facade image style transfer. These advancements have made the facade generation process more efficient. With the advancement of generative AI technology, researchers can develop more efficient and superior facade generation models. For instance, architects can provide generative AI models with conditions such as facade sketches, facade mask segmentation images, and descriptive terms for facade generation. These conditions can assist architects in generating corresponding high-quality facade images, streamlining the facade design process, and enhancing design efficiency.

Figure 16: Framework example for building floor plans and facade generation.

The key to applying generative models to architectural design lies in integrating professional architectural data with computational data. As illustrated in Figure 16, layout and segmentation masks can often represent the facade information in architecture in 2D image generation. The architectural constraints can serve as hyperparameter inputs to guide the image generation process by the generative model.

The various methods of generative AI in image generation have also shown unique potential in creating architectural facade images, such as GLIGEN [43], and MultiDiffusion [190]. Moreover, with the development of generative AI technology, ControlNet[38] can precisely control the content generated by diffusion models by adding extra conditions. It is applicable to the style transfer of architectural facades and can enrich facade designs with detailed elements such as brick textures, window decorations, or door designs. Moreover, ControlNet can be used to adjust specific elements in facade drawings, for instance, altering window shapes, door sizes, or facade colors, thereby enhancing the personalization and creativity of the design. Simultaneously, analyzing the style and characteristics of surrounding buildings ensures that the new facade design harmonizes with its environment, maintaining consistency in the scene.

4.1.0.3 Architectural Image Generation

The text-to-image image generation method is capable of producing creative architectural concept designs (\(F_{\text{Img}}\)) based on brief descriptions or a set of parameters (\(s_{Image\_text}^1,s_{Image\_text}^2\)). The image-to-image image generation method enables the generation of architectural images possessing consistent features or styles. This offers the potential to explore architectural forms and spatial layouts yet to be conceived by human designers. Automatically generated architectural concepts can serve as design inspiration, helping designers break traditional mindsets and explore new design spaces. Simultaneously, diffusion probabilistic models can generate realistic architectural rendering images suitable for Virtual Reality (VR) or Augmented Reality (AR) applications, providing designers and clients with immersive design evaluation experiences. This advances higher quality and interactive architectural visualization technologies, making the design review and modification process more intuitive and efficient.

Stable Diffusion[9], DALLE-3[34], and GLIDE[36] have been significantly applied in the domain of architectural image generation, demonstrating robust capabilities in image synthesis. ControlNet[38], with its exceptional controllability, has increasingly been utilized by architects to generate architectural images and style transfer, substantially enriching design creativity and enhancing design efficiency. Similarly, GLIGEN[43] and SceneGenie[46] have shown potential in the control of image content, which also holds significant value in the generation and creation of architectural imagery.

Figure 17: Currently, generative models, such as PIKA[49] and DynamiCrafter[50], are capable of generating high-quality videos from images, supporting multi-angle rotation, and style transfer.

4.2 Architectural Design Video Generation↩︎

4.2.0.1 Video Generation based on Architectural Images and Text Prompt

The application of generative AI-based video generation in architectural design has multiple development directions. Through generative AI technology, performance videos can be generated using a single architectural effect image (\(F_{\text{Img}}\)) along with relevant textual descriptions (\(s_{Image\_text}^1\), \(s_{Image\_text}^2\)), performance videos can be produced. Future advancements include compiling multiple images of a structure from various angles to craft a continuous video narrative. Such an approach diversifies presentation techniques and streamlines the design process, yielding significant time and cost savings.

In the field of architectural video generation, Make-A-Video[48], DynamiCrafter[50], and PIKA[49] each showcase their strengths, bringing innovative presentation methods to the forefront. Make-A-Video transforms textual descriptions into detailed dynamic videos, enhancing the visual impact and augmenting audience engagement, enabling designers to depict the architectural transformations over time through text effortlessly. DynamiCrafter employs text-prompted technology to infuse static images with dynamic elements, such as flowing water and drifting clouds, with high-resolution support ensuring the preservation of details and realism. PIKA, conversely, demonstrates unique advantages in dynamic scene transformation, supporting text-driven element changes, allowing designers to maintain scene integrity while presenting dynamic details, thereby offering a rich and dynamic visual experience.

With advancements in diffusion models, current generative models can now produce high-quality effect videos. As shown in the Figure 17, the first and second rows depict effect demonstration videos generated from input images using PIKA[49], where the buildings undergo minor movements and scaling while maintaining consistency with the surrounding environment. DynamiCrafter[50] can generate rotating buildings, as demonstrated in the third row, where the model predicts architectural styles from different angles and ensures consistent generation. From GANs to diffusion models, mature image-to-image style transfer models have been implemented. The application of these models ensures that the generated videos exhibit the desired presentation effects, greatly expanding the application scenarios for videos.

4.2.0.2 Style Transfer for Specific Video Content

In the outlook for future technologies, the application of generative AI for partial style transfer in architectural video content paves the way to new frontiers in architectural visual presentation. This technology enables designers to replicate an overall style and, more importantly, precisely select which parts of the video should undergo style transformation. Deep learning-based neural style transfer algorithms have proven their efficacy in applying style transfer to images and video content. These algorithms achieve style transformation by learning specific features of a target style image and applying these features to the original content. This implies that distinct artistic styles or visual effects can be applied to selected video portions in architectural videos. Local video style transfer opens up novel possibilities in the architectural domain, allowing designers and researchers to explore and present architectural and urban spaces in ways never before possible. By precisely controlling the scope of style transfer application, unique visual effects can be created, thereby enhancing architectural videos’ expressiveness and communicative value.

PIKA[49] showcases significant advantages in style transfer applications for architectural video content, offering robust support for visual presentation and research within the architectural realm. This technology enables designers and researchers to perform precise and flexible style customization for architectural videos, facilitating style transfers tailored to specific video content. Notably, PIKA allows for the style transfer of specific elements or areas within a video instead of a uniform transformation across the entire content. This capability of localized style transfer enables the accentuation of certain architectural features or details, such as presenting a segment of classical architecture in a modern or abstract artistic style, thereby enhancing the video’s appeal and expressiveness. Furthermore, PIKA excels in maintaining video content’s coherence and visual consistency. By finely controlling the extent and scope of the style transfer, PIKA ensures that the video retains its original structure and narrative while integrating new artistic styles, resulting in an aesthetically pleasing and authentic final product. Additionally, PIKA’s style transfer technology is not confined to traditional artistic styles but is also adaptable to various complex and innovative visual effects, providing a vast canvas for creative expression in architectural video content. Whether emulating the architectural style of a specific historical period or venturing into unprecedented visual effects, PIKA is equipped to support such endeavors.

4.3 Architectural 3D Forms Generation↩︎

4.3.0.1 3D Model Generation based on Architectural Images and Text Prompt

Generating 3D building forms Using architectural images, such as site information (\(o_{\text{P-3D}}^2\)),or text prompt, such as design requirements (\(o_{\text{P-3D}}^1\)), as input can improve modeling efficiency.

In architectural 3D modeling, technologies such as DreamFusion [51], Magic3D [52], CLIP-NeRF [54], and DreamCraft3D[55] have emerged as revolutionary architectural design and visualization tools. They empower architects and designers to directly generate detailed and high-fidelity 3D architectural models from textual descriptions or 2D images, significantly expanding the possibilities for architectural creativity and enhancing work efficiency. Specifically, as shown in Figure 18, DreamFusion[51] and Magic3D[52] allow designers to swiftly create architectural 3D model prototypes through simple text descriptions, accelerating the transition from concept to visualization. Designers can easily modify textual descriptions and employ these tools for iterative design, exploring various architectural styles and forms to optimize design schemes. Moreover, CLIP-NeRF[54] and DreamCraft3D[55] enable designers to extract 3D information from existing architectural images, facilitating the precise reconstruction of historical buildings or current sites for restoration, research, or further development. Additionally, designers can create unique visual effects in 3D models by transforming and fusing image styles, further enhancing the artistic appeal and attractiveness of architectural representations.

a

Castle

b

Abbey

c

Chichen Itza

d

Castle

e

Pisa tower

f

Opera house

Figure 18: Examples of 3D models generated based on textual prompts. Figures (a) – (c) are produced by Dreamfusion[51], while figures (d) – (f) are generated using Magic3D[52]..

4.3.0.2 Detail Style Generation and Editing for architectural 3D Models

With the advancements in generative AI for 3D model generation technology, generative AI can generate architectural 3D models with specific styles and textures based on the input of preliminary architectural 3D models (\(S_{\text{P-3D}}\)) (\(F_{\text{P-3D}} + S_{\text{P-3D}}\) to \(F_{\text{D-3D}}\)) . If this technology enables the modification and editing of 3D models based on highly personalized design requirements and allows designers to make real-time adjustments, it would significantly enhance the efficiency of architectural creation and enrich the avenues for architectural design.

GaussianEditor [191] and Magic123[56] demonstrate their applications and advantages in generating and editing detail styles for architectural 3D models by offering designers greater creative freedom and control over editing. As shown in Figure 19, GaussianEditor’s Gaussian semantic tracing and Hierarchical Gaussian Splatting enable more precise and intuitive editing of architectural details. At the same time, Magic123’s two-stage approach facilitates the transformation from complex real-world images to detailed 3D models, as shown in Figure  20. The development of these technologies heralds a future in architectural design and visualization characterized by a richer diversity and higher customization of 3D architectural models.

a

Turn the bear into a Grizzly bear

b

Make it Autumn

Figure 19: Examples of 3D model style editing using GaussianEditor [191]. Left column shows the original models, right column presents the edited results..

a

b

Figure 20: Examples of input images (left) into detailed 3D models (right, dragon and teapots) using Magic123 [56]..

4.4 Human-Centric Architectural Design↩︎

As society evolves and technology advances, the challenges faced by architectural design become increasingly complex, requiring consideration of more factors. Traditional design methods demand extensive time from designers to meet requirements and adjust designs. Moreover, user needs are becoming more diverse, making it a significant issue to reflect better human requirements in design, which necessitates more intelligent tools for realization. At the same time, the rapid development of AI technology, incredibly generative AI, offers the possibility for more intelligent architectural design. Based on these needs and visions, future generative AI will not only assist in the architectural design process but also, based on human-centric design principles, receive multimodal inputs, including text, images, sound, etc., and through intelligent processing, quickly understand design requirements and adjust design schemes, thereby generating designs that align with the architect’s vision. Such architecturally designed AI large models will be similar to existing co-pilot models but with further enhanced functionality and intelligence.

Realizing this large model requires training the AI model on a vast amount of architectural design data and user feedback to enable it to understand complex design requirements. It also necessitates multimodal input processing, developing technologies capable of handling various types of inputs, such as text, images, and sound, to increase the model’s application flexibility. In addition, developing intelligent interaction interfaces is essential; user-friendly interfaces allow architects to communicate intuitively with the AI model, state their needs, and receive feedback. Finally, the model should provide customized output designs, generating multiple design options based on the input requirements and data for architects to choose and modify.

However, realizing this architectural design AI large model faces numerous challenges: 1)Data collection and processing: high-quality training data is critical to the performance of AI models, and efficiently collecting and processing a vast amount of architectural design data is a significant challenge. 2)The fusion of multimodal inputs, effectively integrating information from different modalities to improve the model’s accuracy and application scope, requires further technological breakthroughs. Another challenge is optimizing user interaction; designing an interface that aligns with architects’ habits and enables accessible communication with the AI model is crucial for the technology’s implementation. 3)Ensuring that AI-generated designs meet practical needs while being innovative and personalized is critical for technological development. By addressing these challenges, the future may see the realization of generative AI models that truly aid architectural design, improving design efficiency and quality and achieving human-centric architectural design optimization.

5 Conclusion↩︎

The field of generative models has witnessed unparalleled advancements, particularly in image generation, video generation, and 3D content creation. These developments span across various applications, including text-to-image, image-to-image, text-to-3D, and image-to-3D transformations, demonstrating a significant leap in the capability to synthesize realistic, high-fidelity content from minimal inputs. The rapid advancement of generative models marks a transformative phase in artificial intelligence, where synthesizing realistic, diverse, and semantically consistent content across images, videos, and 3D models is becoming increasingly feasible. This progress paves new avenues for creative expression and lays the groundwork for future innovations in digital architectural design process. As the field continues to evolve, further exploration of model efficiency, controllability, and domain-specific applications will be crucial in harnessing the full potential of generative AI models for a broad spectrum of architectural design.

In conclusion, the integration of generative AI into architectural design represents a significant leap forward in the realm of digital architecture. This advanced technology has shown exceptional capability in generating high-quality, high-resolution images and designs, offering innovative ideas and enhancing the creative process across various facets of architectural design. As we look to the future, it is clear that the continued exploration and integration of Generative AI in architectural design will play a pivotal role in shaping the next generation of digital architecture. This technological evolution not only simplifies and accelerates the design process but also opens up new avenues for creativity, enabling architects to push the boundaries of traditional design and explore new, innovative design spaces.

References↩︎

[1]
Matias Del Campo, Sandra Manninger, M Sanche, and L Wang. The church of ai—an examination of architecture in a posthuman design ecology. In Intelligent & Informed-Proceedings of the 24th CAADRIA Conference, Victoria University of Wellington, Wellington, New Zealand, pages 15–18, 2019.
[2]
Frederick Chando Kim, Mikhael Johanes, and Jeffrey Huang. Text2form diffusion: Framework for learning curated architectural vocabulary. In 41st Conference on Education and Research in Computer Aided Architectural Design in Europe, eCAADe 2023, pages 79–88. Education and research in Computer Aided Architectural Design in Europe, 2023.
[3]
Xinwei Zhuang, CA Design, ED Phase, C Generative, and AN Network. Rendering sketches. eCAADe 2022, 1:517–521, 1973.
[4]
Daniel Koehler. More than anything: Advocating for synthetic architectures within large-scale language-image models. International Journal of Architectural Computing, page 14780771231170455, 2023.
[5]
Mathias Bank Stigsen, Alexandra Moisi, Shervin Rasoulzadeh, Kristina Schinegger, and Stefan Rutzinger. Ai diffusion as design vocabulary - investigating the use of ai image generation in early architectural design and education. Digital Design Reconsidered - Proceedings of the 41st Conference on Education and Research in Computer Aided Architectural Design in Europe (eCAADe 2023), page 587–596, 2023.
[6]
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. In International conference on machine learning, pages 1462–1471. PMLR, 2015.
[7]
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
[8]
Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10743–10752, 2019.
[9]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
[10]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
[11]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
[12]
Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
[13]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
[14]
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
[16]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
[17]
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023.
[18]
Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T Freeman, and Joshua B Tenenbaum. Learning shape priors for single-view 3d completion and reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV), pages 646–662, 2018.
[19]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
[20]
Andrey Voynov and Artem Babenko. Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning, pages 9786–9796. PMLR, 2020.
[21]
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV), pages 52–67, 2018.
[22]
Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. Polygen: An autoregressive generative model of 3d meshes. In International conference on machine learning, pages 7220–7229. PMLR, 2020.
[23]
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[24]
Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[25]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
[26]
Fang Zhao, Wenhao Wang, Shengcai Liao, and Ling Shao. Learning anchored unsigned distance functions with gradient direction alignment for single-view garment reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12674–12683, 2021.
[27]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
[28]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[29]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. 2018.
[30]
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
[31]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
[32]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
[33]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
[34]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
[35]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 5907–5915, 2017.
[36]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
[37]
Wengling Chen and James Hays. Sketchygan: Towards diverse and realistic sketch to image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9416–9425, 2018.
[38]
Lvmin Zhang and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
[39]
Yichen Peng, Chunqi Zhao, Haoran Xie, Tsukasa Fukusato, and Kazunori Miyata. Difffacesketch: High-fidelity face image synthesis with sketch-guided latent diffusion model. arXiv preprint arXiv:2302.06908, 2023.
[40]
Zhengyu Huang, Haoran Xie, Tsukasa Fukusato, and Kazunori Miyata. Anifacedrawing: Anime portrait exploration during your sketching. In ACM SIGGRAPH 2023 Conference Proceedings. Association for Computing Machinery, 2023.
[41]
Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. Image generation from layout. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[42]
Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Yi-Zhe Song, Bodo Rosenhahn, and Tao Xiang. Context-aware layout to image generation with enhanced object appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15049–15058, June 2021.
[43]
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. arXiv preprint arXiv:2301.07093, 2023.
[44]
Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image generation from scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1219–1228, 2018.
[45]
Renato Sortino, Simone Palazzo, and Concetto Spampinato. Transforming image generation from scene graphs. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 4118–4124. IEEE, 2022.
[46]
Azade Farshad, Yousef Yeganeh, Yu Chi, Chengzhi Shen, Böjrn Ommer, and Nassir Navab. Scenegenie: Scene graph guided diffusion models for image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 88–98, 2023.
[47]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video diffusion models. arXiv:2204.03458, 2022.
[48]
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
[49]
Leijie Wang, Nicolas Vincent, Julija Rukanskaitė, and Amy X Zhang. Pika: Empowering non-programmers to author executable governance policies in online communities. arXiv preprint arXiv:2310.04329, 2023.
[50]
Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Xintao Wang, Tien-Tsin Wong, and Ying Shan. Dynamicrafter: Animating open-domain images with video diffusion priors. arXiv preprint arXiv:2310.12190, 2023.
[51]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2023.
[52]
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, June 2023.
[53]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
[54]
Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. arXiv preprint arXiv:2112.05139, 2021.
[55]
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818, 2023.
[56]
Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
[57]
Jaime de Miguel Rodrı́guez, Maria Eugenia Villafañe, Luka Piškorec, and Fernando Sancho Caparrini. Generation of geometric interpolations of building types with deep variational autoencoders. Design Science, 6:e34, 2020.
[58]
Xinwei Zhuang, Yi Ju, Allen Yang, and Luisa Caldas. Synthesis and generation for 3d architecture volume with generative modeling. International Journal of Architectural Computing, page 14780771231168233, 2023.
[59]
Yubo Liu, Han Li, Qiaoming Deng, and Kai Hu. Diffusion probabilistic model assisted 3d form finding and design latent space exploration: A case study for taihu stone spacial transformation. In The International Conference on Computational Design and Robotic Fabrication, pages 11–23. Springer, 2023.
[60]
Hao Zheng and Philip F Yuan. A generative architectural and urban design method through artificial neural networks. Building and Environment, 205:108178, 2021.
[61]
Panagiota Pouliou, Anca-Simona Horvath, and George Palamas. Speculative hybrids: Investigating the generation of conceptual architectural forms through the use of 3d generative adversarial networks. International Journal of Architectural Computing, page 14780771231168229, 2023.
[62]
Tomas Cabezon Pedroso, Jinmo Rhee, and Daragh Byrne. Feature space exploration as an alternative for design space exploration beyond the parametric space. arXiv preprint arXiv:2301.11416, 2023.
[63]
FREDERICK CHANDO KIM and JEFFREY HUANG. Towards a machine understanding of architectural form.
[64]
Adam Sebestyen, Urs Hirschberg, and Shervin Rasoulzadeh. Using deep learning to generate design spaces for architecture. International Journal of Architectural Computing, page 14780771231168232, 2023.
[65]
Alberto Tono, Heyaojing Huang, Ashwin Agrawal, and Martin Fischer. Vitruvio: 3d building meshes via single perspective sketches. arXiv preprint arXiv:2210.13634, 2022.
[66]
Qiaoming Deng, Xiaofeng Li, Yubo Liu, and Kai Hu. Exploration of three-dimensional spatial learning approach based on machine learning–taking taihu stone as an example. Architectural Intelligence, 2(1):5, 2023.
[67]
Raffaele Di Carlo, Divyae Mittal, and Ondrej Vesel. Generating 3d building volumes for a given urban context using pix2pix gan. Legal Depot D/2022/14982/02, page 287, 2022.
[68]
Steven Jige Quan. Urban-gan: An artificial intelligence-aided computation system for plural urban design. Environment and Planning B: Urban Analytics and City Science, 49(9):2500–2515, 2022.
[69]
Shiqi Zhou, Yuankai Wang, Weiyi Jia, Mo Wang, Yuwei Wu, Renlu Qiao, and Zhiqiang Wu. Automatic responsive-generation of 3d urban morphology coupled with local climate zones using generative adversarial network. Building and Environment, 245:110855, 2023.
[70]
Jingyi Li, Fang Guo, and Hong Chen. A study on urban block design strategies for improving pedestrian-level wind conditions: Cfd-based optimization and generative adversarial networks. Energy and Buildings, page 113863, 2023.
[71]
Ondrej Vesel. Building massing generation using gan trained on dutch 3d city models. 2022.
[72]
Feifeng Jiang, Jun Ma, Christopher John Webster, Xiao Li, and Vincent JL Gan. Building layout generation using site-embedded gan model. Automation in Construction, 151:104888, 2023.
[73]
Diego Navarro-Mateu, Oriol Carrasco, and Pedro Cortes Nieves. Color-patterns to architecture conversion through conditional generative adversarial networks. Biomimetics, 6(1):16, 2021.
[74]
Suzi Kim, Dodam Kim, and Sunghee Choi. Citycraft: 3d virtual city creation from a single image. The Visual Computer, 36:911–924, 2020.
[75]
Dong Wook Shu, Sung Woo Park, and Junseok Kwon. 3d point cloud generative adversarial network based on tree structured graph convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3859–3868, 2019.
[76]
Adam Sebestyen, Ozan Özdenizci, Robert Legenstein, and Urs Hirschberg. Generating conceptual architectural 3d geometries with denoising diffusion models. In 41st Conference on Education and Research in Computer Aided Architectural Design in Europe-Digital Design Reconsidered: eCAADe 2023, 2023.
[77]
Spyridon Ampanavos and Ali Malkawi. Early-phase performance-driven design using generative models. In International Conference on Computer-Aided Architectural Design Futures, pages 87–106. Springer, 2021.
[78]
Chenyu Huang, Gengjia Zhang, Jiawei Yao, Xiaoxin Wang, John Kaiser Calautit, Cairong Zhao, Na An, and Xi Peng. Accelerated environmental performance-driven urban design with generative adversarial network. Building and Environment, 224:109575, 2022.
[79]
Stanislas Chaillou. Archigan: Artificial intelligence x architecture. In Architectural Intelligence: Selected Papers from the 1st International Conference on Computational Design and Robotic Fabrication (CDRF 2019), pages 117–127. Springer, 2020.
[80]
Xiaoni Gao, Xiangmin Guo, and Tiantian Lo. M-strugan: An automatic 2d-plan generation system under mixed structural constraints for homestays. Sustainability, 15(9):7126, 2023.
[81]
Xiao Min, Liang Zheng, and Yile Chen. The floor plan design method of exhibition halls in cgan-assisted museum architecture. Buildings, 13(3):756, 2023.
[82]
Da Wan, Xiaoyu Zhao, Wanmei Lu, Pengbo Li, Xinyu Shi, and Hiroatsu Fukuda. A deep learning approach toward energy-effective residential building floor plan generation. Sustainability, 14(13):8074, 2022.
[83]
Ran Chen, Jing Zhao, Xueqi Yao, Sijia Jiang, Yingting He, Bei Bao, Xiaomin Luo, Shuhan Xu, and Chenxi Wang. Generative design of outdoor green spaces based on generative adversarial networks. Buildings, 13(4):1083, 2023.
[84]
Yubo Liu, Yangting Lai, Jianyong Chen, Lingyu Liang, and Qiaoming Deng. Scut-autoalp: A diverse benchmark dataset for automatic architectural layout parsing. IEICE TRANSACTIONS on Information and Systems, 103(12):2725–2729, 2020.
[85]
Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver Van Kaick, Hao Zhang, and Hui Huang. Graph2plan: Learning floorplan generation from layout graphs. ACM Transactions on Graphics (TOG), 39(4):118–1, 2020.
[86]
Christina Doumpioti and Jeffrey Huang. Intensive differences in spatial design. In 39th eCAADe Conference in Novi Sad, Serbia, pages 9–16, 2021.
[87]
Merve Akdoğan and Özgün Balaban. Plan generation with generative adversarial networks: Haeckel’s drawings to palladian plans. Journal of Computational Design, 3(1):135–154, 2022.
[88]
Ilker Karadag, Orkan Zeynel Güzelci, and Sema Alaçam. Edu-ai: a twofold machine learning model to support classroom layout generation. Construction Innovation, 23(4):898–914, 2023.
[89]
Can Uzun, Meryem Birgül Çolakoğlu, and Arda İnceoğlu. Gan as a generative architectural plan layout tool: A case study for training dcgan with palladian plans and evaluation of dcgan outputs. vol, 17:185–198, 2020.
[90]
Sheng-Yang Huang, Enriqueta Llabres-Valls, Aiman Tabony, and Luis Carlos Castillo. Damascus house: Exploring the connectionist embodiment of the islamic environmental intelligence by design. In eCAADe proceedings, volume 1, pages 871–880. eCAADe, 2023.
[91]
XY Ying, XY Qin, JH Chen, and J Gao. Generating residential layout based on ai in the view of wind environment. In Journal of Physics: Conference Series, volume 2069, page 012061. IOP Publishing, 2021.
[92]
Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, and Dahua Lin. Blockplanner: city block generation with vectorized graph representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5077–5086, 2021.
[93]
Pedram Ghannad and Yong-Cheol Lee. Automated modular housing design using a module configuration algorithm and a coupled generative adversarial network (cogan). Automation in construction, 139:104234, 2022.
[94]
Shuyi Huang and Hao Zheng. Morphological regeneration of the industrial waterfront based on machine learning. In 27th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA 2022), pages 475–484. The Association for Computer-Aided Architectural Design Research in Asia …, 2022.
[95]
Lehao Yang, Long Li, Qihao Chen, Jiling Zhang, Tian Feng, and Wei Zhang. Street layout design via conditional adversarial learning. arXiv preprint arXiv:2305.08186, 2023.
[96]
Hanan Tanasra, Tamar Rott Shaham, Tomer Michaeli, Guy Austern, and Shany Barath. Automation in interior space planning: Utilizing conditional generative adversarial network models to create furniture layouts. Buildings, 13(7):1793, 2023.
[97]
Sepidehsadat Hosseini and Yasutaka Furukawa. Floorplan restoration by structure hallucinating transformer cascades.
[98]
Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Furukawa. House-gan: Relational generative adversarial networks for graph-constrained house layout generation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 162–177. Springer, 2020.
[99]
Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa. House-gan++: Generative adversarial layout refinement network towards intelligent computational agent for professional architects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13632–13641, 2021.
[100]
Shidong Wang, Wei Zeng, Xi Chen, Yu Ye, Yu Qiao, and Chi-Wing Fu. Actfloor-gan: Activity-guided adversarial networks for human-centric floorplan design. IEEE Transactions on Visualization and Computer Graphics, 2021.
[101]
PEDRO VELOSO, JINMO RHEE, ARDAVAN BIDGOLI, and MANUEL LADRON DE GUEVARA. A pedagogical experience with deep learning for floor plan generation.
[102]
Ziniu Luo and Weixin Huang. Floorplangan: Vector residential floorplan adversarial generation. Automation in Construction, 142:104470, 2022.
[103]
Morteza Rahbar, Mohammadjavad Mahdavinejad, Amir HD Markazi, and Mohammadreza Bemanian. Architectural layout design through deep learning and agent-based modeling: A hybrid approach. Journal of Building Engineering, 47:103822, 2022.
[104]
Yubo Liu, Zhilan Zhang, and Qiaoming Deng. Exploration on diversity generation of campus layout based on gan. In The International Conference on Computational Design and Robotic Fabrication, pages 233–243. Springer, 2022.
[105]
Mohammadreza Aalaei, Melika Saadi, Morteza Rahbar, and Ahmad Ekhlassi. Architectural layout generation using a graph-constrained conditional generative adversarial network (gan). Automation in Construction, 155:105053, 2023.
[106]
Jiachen Liu, Yuan Xue, Jose Duarte, Krishnendra Shekhawat, Zihan Zhou, and Xiaolei Huang. End-to-end graph-constrained vectorized floorplan generation with panoptic refinement. In European Conference on Computer Vision, pages 547–562. Springer, 2022.
[107]
Mohammad Amin Shabani, Sepidehsadat Hosseini, and Yasutaka Furukawa. Housediffusion: Vector floorplan generation via a diffusion model with discrete and continuous denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5466–5475, 2023.
[108]
Pei Sun, Fengying Yan, Qiwei He, and Hongjiang Liu. The development of an experimental framework to explore the generative design preference of a machine learning-assisted residential site plan layout. Land, 12(9):1776, 2023.
[109]
Yubo Liu, Yihua Luo, Qiaoming Deng, and Xuanxing Zhou. Exploration of campus layout based on generative adversarial network: Discussing the significance of small amount sample learning for architecture. In Proceedings of the 2020 DigitalFUTURES: The 2nd International Conference on Computational Design and Robotic Fabrication (CDRF 2020), pages 169–178. Springer, 2021.
[110]
Yuzhe Pan, Jin Qian, and Yingdong Hu. A preliminary study on the formation of the general layouts on the northern neighborhood community based on gaugan diversity output generator. In Proceedings of the 2020 DigitalFUTURES: The 2nd International Conference on Computational Design and Robotic Fabrication (CDRF 2020), pages 179–188. Springer, 2021.
[111]
Chao-Wang Zhao, Jian Yang, and Jiatong Li. Generation of hospital emergency department layouts based on generative adversarial networks. Journal of Building Engineering, 43:102539, 2021.
[112]
Wamiq Para, Paul Guerrero, Tom Kelly, Leonidas J Guibas, and Peter Wonka. Generative layout modeling using constraint graphs. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6690–6700, 2021.
[113]
Ricardo C Rodrigues and Rovenir B Duarte. Generating floor plans with deep learning: A cross-validation assessment over different dataset sizes. International Journal of Architectural Computing, 20(3):630–644, 2022.
[114]
Shuai Dong, Wei Wang, Wensheng Li, and Kun Zou. Vectorization of floor plans based on edgegan. Information, 12(5):206, 2021.
[115]
Seongyong Kim, Seula Park, Hyunjung Kim, and Kiyun Yu. Deep floor plan analysis for complicated drawings based on style transfer. Journal of Computing in Civil Engineering, 35(2):04020066, 2021.
[116]
Mikhael Johanes and Jeffrey Huang. Deep learning spatial signature inverted gans for isovist representation in architectural floorplan. In 40th Conference on Education and Research in Computer Aided Architectural Design in Europe, eCAADe 2022, pages 621–629. Education and research in Computer Aided Architectural Design in Europe, 2022.
[117]
Mikhael Johanes and Jeffrey Huang. Generative isovist transformer.
[118]
Peiyang Su, Weisheng Lu, Junjie Chen, and Shibo Hong. Floor plan graph learning for generative design of residential buildings: a discrete denoising diffusion model. Building Research & Information, pages 1–17, 2023.
[119]
Christina Doumpioti and Jeffrey Huang. Field condition - environmental sensibility of spatial configurations with the use of machine intelligence. eCAADe proceedings, 2022.
[120]
Fatemeh Mostafavi, Mohammad Tahsildoost, Zahra Sadat Zomorodian, and Seyed Shayan Shahrestani. An interactive assessment framework for residential space layouts using pix2pix predictive model at the early-stage building design. Smart and Sustainable Built Environment, 2022.
[121]
Qiushi He, Ziwei Li, Wen Gao, Hongzhong Chen, Xiaoying Wu, Xiaoxi Cheng, and Borong Lin. Predictive models for daylight performance of general floorplans based on cnn and gan: a proof-of-concept study. Building and Environment, 206:108346, 2021.
[122]
Tomasz Dzieduszyński. Machine learning and complex compositional principles in architecture: Application of convolutional neural networks for generation of context-dependent spatial compositions. International Journal of Architectural Computing, 20(2):196–215, 2022.
[123]
Viktor Eisenstadt, Jessica Bielski, BURAK Mete, CHRISTOPH Langenhan, Klaus-Dieter Althoff, and ANDREAS Dengel. Autocompletion of floor plans for the early design phase in architecture: Foundations, existing methods, and research outlook. In POSTCARBON-Proceedings of the 27th CAADRIA Conference, Sydney, pages 323–332, 2022.
[124]
Yueheng Lu, RUNJIA TIAN, AO LI, XIAOSHI WANG, and GARCIA DEL CASTILLO LOPEZ JOSE LUIS. Organizational graph generation for structured architectural floor plan dataset. In Presented at the Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), CUMINCAD, pages 81–90, 2021.
[125]
Wenjie Liao, Xinzheng Lu, Yuli Huang, Zhe Zheng, and Yuanqing Lin. Automated structural design of shear wall residential buildings using generative adversarial networks. Automation in Construction, 132:103931, 2021.
[126]
Yifan Fei, Wenjie Liao, Shen Zhang, Pengfei Yin, Bo Han, Pengju Zhao, Xingyu Chen, and Xinzheng Lu. Integrated schematic design method for shear wall structures: a practical application of generative adversarial networks. Buildings, 12(9):1295, 2022.
[127]
Bochao Fu, Yuqing Gao, and Wei Wang. Dual generative adversarial networks for automated component layout design of steel frame-brace structures. Automation in Construction, 146:104661, 2023.
[128]
Wenjie Liao, Yuli Huang, Zhe Zheng, and Xinzheng Lu. Intelligent generative structural design method for shear wall building based on “fused-text-image-to-image” generative adversarial networks. Expert Systems with Applications, 210:118530, 2022.
[129]
Xinzheng Lu, Wenjie Liao, Yu Zhang, and Yuli Huang. Intelligent structural design of shear wall residence using physics-enhanced generative adversarial networks. Earthquake Engineering & Structural Dynamics, 51(7):1657–1676, 2022.
[130]
Yifan Fei, Wenjie Liao, Xinzheng Lu, Ertugrul Taciroglu, and Hong Guan. Semi-supervised learning method incorporating structural optimization for shear-wall structure design using small and long-tailed datasets. Journal of Building Engineering, 79:107873, 2023.
[131]
Pengju Zhao, Wenjie Liao, Yuli Huang, and Xinzheng Lu. Intelligent design of shear wall layout based on graph neural networks. Advanced Engineering Informatics, 55:101886, 2023.
[132]
Wenjie Liao, Xinyu Wang, Yifan Fei, Yuli Huang, Linlin Xie, and Xinzheng Lu. Base-isolation design of shear wall structures using physics-rule-co-guided self-supervised generative adversarial networks. Earthquake Engineering & Structural Dynamics, 2023.
[133]
Pengju Zhao, Wenjie Liao, Hongjing Xue, and Xinzheng Lu. Intelligent design method for beam and slab of shear wall structure based on deep learning. Journal of Building Engineering, 57:104838, 2022.
[134]
Yifan Fei, Wenjie Liao, Yuli Huang, and Xinzheng Lu. Knowledge-enhanced generative adversarial networks for schematic design of framed tube structures. Automation in Construction, 144:104619, 2022.
[135]
Immanuel Koh. Architectural plasticity: the aesthetics of neural sampling. Architectural Design, 92(3):86–93, 2022.
[136]
Michael Hasey, Jinmo Rhee, and Daniel Cardoso Llach. Form data as a resource in architectural analysis: an architectural distant reading of wooden churches from the carpathian mountain regions of eastern europe. Digital Creativity, 34(2):103–126, 2023.
[137]
Ingrid Mayrhofer-Hufnagl and Benjamin Ennemoser. From linear to manifold interpolation.
[138]
Benjamin Ennemoser and Ingrid Mayrhofer-Hufnagl. Design across multi-scale datasets by developing a novel approach to 3dgans. International Journal of Architectural Computing, page 14780771231168231, 2023.
[139]
DONGYUN KIM, LLOYD SUKGYO LEE, and HANJUN KIM. Elemental sabotage.
[140]
Hang Zhang and Ye Huang. Machine learning aided 2d-3d architectural form finding at high resolution. In Proceedings of the 2020 DigitalFUTURES: The 2nd International Conference on Computational Design and Robotic Fabrication (CDRF 2020), pages 159–168. Springer, 2021.
[141]
Hang Zhang and E Blasetti. 3d architectural form style transfer through machine learning (full version), 2020.
[142]
KE Asmar and Harpreet Sareen. Machinic interpolations: a gan pipeline for integrating lateral thinking in computational tools of architecture. In Proceedings of the 24th Conference of the Iberoamerican Society of Digital Graphics, Online, pages 18–20, 2020.
[143]
Hang Zhang. Text-to-form. 08 2021.
[144]
Kai-Hung Chang, Chin-Yi Cheng, Jieliang Luo, Shingo Murata, Mehdi Nourbakhsh, and Yoshito Tsuji. Building-gan: Graph-conditioned architectural volumetric design generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11956–11965, 2021.
[145]
Zhenlong Du, Haiyang Shen, Xiaoli Li, and Meng Wang. 3d building fabrication with geometry and texture coordination via hybrid gan. Journal of Ambient Intelligence and Humanized Computing, pages 1–12, 2020.
[146]
Qiu Yu, Jamal Malaeb, and Wenjun Ma. Architectural facade recognition and generation through generative adversarial networks. In 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pages 310–316. IEEE, 2020.
[147]
Cheng Lin Chuang, Sheng Fen Chien, et al. Facilitating architect-client communication in the pre-design phase. In Projections-Proceedings of the 26th International Conference of the Association for Computer-Aided Architectural Design Research in Asia, CAADRIA 2021, volume 2, pages 71–80. The Association for Computer-Aided Architectural Design Research in Asia …, 2021.
[148]
Cheng Sun, Yiran Zhou, and Yunsong Han. Automatic generation of architecture facade for historical urban renovation using generative adversarial network. Building and Environment, 212:108781, 2022.
[149]
Lei Zhang, Liang Zheng, Yile Chen, Lei Huang, and Shihui Zhou. Cgan-assisted renovation of the styles and features of street facades—a case study of the wuyi area in fujian, china. Sustainability, 14(24):16575, 2022.
[150]
Hongpan Lin, Linsheng Huang, Yile Chen, Liang Zheng, Minling Huang, and Yashan Chen. Research on the application of cgan in the design of historic building facades in urban renewal—taking fujian putian historic districts as an example. Buildings, 13(6):1478, 2023.
[151]
JIAXIN ZHANG, TOMOHIRO FUKUDA, NOBUYOSHI YABUKI, and YUNQIN LI. Synthesizing style-similar residential facade from semantic labeling according to the user-provided example.
[152]
Wenyuan Sun, Ping Zhou, Yangang Wang, Zongpu Yu, Jing Jin, and Guangquan Zhou. 3d face parsing via surface parameterization and 2d semantic segmentation network, 2022.
[153]
Da Wan, Runqi Zhao, Sheng Zhang, Hui Liu, Lian Guo, Pengbo Li, and Lei Ding. A deep learning-based approach to generating comprehensive building façades for low-rise housing. Sustainability, 15(3):1816, 2023.
[154]
JIAHUA DONG, QINGRUI JIANG, ANQI WANG, and YUANKAI WANG. Urban cultural inheritance.
[155]
Shengyu Meng. Exploring in the latent space of design: A method of plausible building facades images generation, properties control and model explanation base on stylegan2. In Proceedings of the 2021 DigitalFUTURES: The 3rd International Conference on Computational Design and Robotic Fabrication (CDRF 2021) 3, pages 55–68. Springer Singapore, 2022.
[156]
Selen Çiçek, Gozde Damla Turhan, and Aybüke Taşer. Deterioration of pre-war and rehabilitation of post-war urbanscapes using generative adversarial networks. International Journal of Architectural Computing, page 14780771231181237, 2023.
[157]
Zhenhuang Cai, Yangbin Lin, Jialian Li, Zongliang Zhang, and Xingwang Huang. Building facade completion using semantic-synchronized gan. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 6387–6390. IEEE, 2021.
[158]
Xue Sun, Yue Wang, Ting Zhang, Yin Wang, Haoyue Fu, Xuechun Li, and Zhen Liu. An application of deep neural network in facade planning of coastal city buildings. In International Conference on Computer Science and its Applications and the International Conference on Ubiquitous Information Technologies and Applications, pages 517–523. Springer, 2022.
[159]
Frederick Chando Kim and Jeffrey Huang. Perspectival gan.
[160]
J Chen and R Stouffs. From exploration to interpretation: Adopting deep representation learning models to latent space lnterpretation of architectural design alternatives. 2021.
[161]
Wolf dPrix, Karolin Schmidbaur, Daniel Bolojan, and Efilena Baseta. The legacy sketch machine: from artificial to architectural intelligence. Architectural Design, 92(3):14–21, 2022.
[162]
Ruşen Eroğlu and Leman Figen Gül. Architectural form explorations through generative adversarial networks. Legal Depot D/2022/14982/02, page 575, 2022.
[163]
Nervana Osama Hanafy. Artificial intelligence’s effects on design process creativity:" a study on used ai text-to-image in architecture". Journal of Building Engineering, 80:107999, 2023.
[164]
Ville Paananen, Jonas Oppenlaender, and Aku Visuri. Using text-to-image generation for architectural design ideation. arXiv preprint arXiv:2304.10182, 2023.
[165]
Hanım Gülsüm Karahan, Begüm Aktaş, and Cemal Koray Bingöl. Use of language to generate architectural scenery with ai-powered tools. In International Conference on Computer-Aided Architectural Design Futures, pages 83–96. Springer, 2023.
[166]
Junming Chen, Zichun Shao, and Bin Hu. Generating interior design from text: A new diffusion model-based method for efficient creative design. Buildings, 13(7):1861, 2023.
[167]
Frederick Chando Kim, Mikhael Johanes, and Jeffrey Huang. Text2form diffusion.
[168]
Gernot Riether and Taro Narahara. Ai tools to synthesize characteristics of public spaces.
[169]
Junming Chen, Duolin Wang, Zichun Shao, Xu Zhang, Mengchao Ruan, Huiting Li, and Jiaqi Li. Using artificial intelligence to generate master-quality architectural designs from text descriptions. Buildings, 13(9):2285, 2023.
[170]
Sachith Seneviratne, Damith Senanayake, Sanka Rasnayaka, Rajith Vidanaarachchi, and Jason Thompson. Dalle-urban: Capturing the urban design expertise of large text to image transformers. In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages 1–9. IEEE, 2022.
[171]
Jonathan Dortheimer, Gerhard Schubert, Agata Dalach, Lielle Brenner, and Nikolas Martelaro. Think ai-side the box!
[172]
Emmanouil Vermisso. Semantic ai models for guiding ideation in architectural design courses. In ICCC, pages 205–209, 2022.
[173]
Daniel Bolojan, Emmanouil Vermisso, and Shermeen Yousif. Is language all we need? a query into architectural semantics using a multimodal generative workflow. In POST-CARBON, Proceedings of the 27th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), volume 1, pages 353–362, 2022.
[174]
Dongyun Kim. Latent morphologies: Encoding architectural features and decoding their structure through artificial intelligence. International Journal of Architectural Computing, page 14780771231209458, 2022.
[175]
Kaiyu Cheng, Paulina Neisch, and Tong Cui. From concept to space: a new perspective on aigc-involved attribute translation. Digital Creativity, 34(3):211–229, 2023.
[176]
Jeffrey Huang, Mikhael Johanes, Frederick Chando Kim, Christina Doumpioti, and Georg-Christoph Holz. On gans, nlp and architecture: combining human and machine intelligences for the generation and evaluation of meaningful designs. Technology| Architecture+ Design, 5(2):207–224, 2021.
[177]
DONGYUN KIM, GEORGE GUIDA, JOSE LUIS GARCÍA, and DEL CASTILLO Y LÓPEZ. Participatory urban design with generative adversarial networks.
[178]
Yeji Hong, Somin Park, Hongjo Kim, and Hyoungkwan Kim. Synthetic data generation using building information models. Automation in Construction, 130:103871, 2021.
[179]
X. Zhuang. Rendering sketches - interactive rendering generation from sketches using conditional generative adversarial neural network. Proceedings of the 40th International Conference on Education and Research in Computer Aided Architectural Design in Europe (eCAADe) [Volume 1], 2022.
[180]
Yuqian Li and Weiguo Xu. Using cyclegan to achieve the sketch recognition process of sketch-based modeling. In Proceedings of the 2021 DigitalFUTURES: The 3rd International Conference on Computational Design and Robotic Fabrication (CDRF 2021) 3, pages 26–34. Springer, 2022.
[181]
Xinyue Ye, Jiaxin Du, and Yu Ye. Masterplangan: Facilitating the smart rendering of urban master plans via generative adversarial networks. Environment and Planning B: Urban Analytics and City Science, 49(3):794–814, 2022.
[182]
YUQIAN LI1and WEIGUO XU. Research on architectural sketch to scheme image based on context encoder.
[183]
Yingbin Gui, Biao Zhou, Xiongyao Xie, Wensheng Li, and Xifang Zhou. Gan-based method for generative design of visual comfort in underground space. In IOP Conference Series: Earth and Environmental Science, volume 861, page 072015. IOP Publishing, 2021.
[184]
Ka Chun Shum, Hong-Wing Pang, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Conditional 360-degree image synthesis for immersive indoor scene decoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4478–4488, 2023.
[185]
Matias del Campo. Deep house-datasets, estrangement, and the problem of the new. Architectural Intelligence, 1(1):12, 2022.
[186]
Matias Del Campo, Sandra Manninger, and Alexandra Carlson. Hallucinating cities: a posthuman design method based on neural networks. In Proceedings of the 11th annual symposium on simulation for architecture and urban design, pages 1–8, 2020.
[187]
Wenliang Qian, Yang Xu, and Hui Li. A self-sparse generative adversarial network for autonomous early-stage design of architectural sketches. Computer-Aided Civil and Infrastructure Engineering, 37(5):612–628, 2022.
[188]
Sisi Han, Yuhan Jiang, Yilei Huang, Mingzhu Wang, Yong Bai, and Andrea Spool-White. Scan2drawing: Use of deep learning for as-built model landscape architecture. Journal of Construction Engineering and Management, 149(5):04023027, 2023.
[189]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
[190]
Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. Multidiffusion: Fusing diffusion paths for controlled image generation. 2023.
[191]
Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting. arXiv preprint arXiv:2311.14521, 2023.

  1. corresponding author, zhang.ye@tju.edu.cn↩︎