Global Mapping of Exposure and
Physical Vulnerability Dynamics
in Least Developed Countries using
Remote Sensing and Machine Learning

Joshua Dimasaka
UKRI CDT in AI for Environmental Risks
Centre for Risk in the Built Environment
Department of Architecture,
University of Cambridge
Cambridge, United Kingdom
Emily So
Centre for Risk in the Built Environment
Department of Architecture,
University of Cambridge
Cambridge, United Kingdom
Christian Geiß
German Aerospace Center (DLR)
Institute of Geography,
University of Bonn
Bonn, Germany


As the world marked the midterm of the Sendai Framework for Disaster Risk Reduction 2015-2030, many countries are still struggling to monitor their climate and disaster risk because of the expensive large-scale survey of the distribution of exposure and physical vulnerability and, hence, are not on track in reducing risks amidst the intensifying effects of climate change. We present an ongoing effort in mapping this vital information using machine learning and time-series remote sensing from publicly available Sentinel-1 SAR GRD and Sentinel-2 Harmonized MSI. We introduce the development of “OpenSendaiBench’’ consisting of 47 countries wherein most are least developed (LDCs), trained ResNet-50 deep learning models, and demonstrated the region of Dhaka, Bangladesh by mapping the distribution of its informal constructions. As a pioneering effort in auditing global disaster risk over time, this paper aims to advance the area of large-scale risk quantification in informing our collective long-term efforts in reducing climate and disaster risk.

1 Introduction↩︎

A global concern on the increasing frequency and intensity of climate disasters, the exacerbating effects of climate change, and the higher rate of increase of exposed human settlements despite a decrease in their vulnerability urged the international community to jointly develop the Sendai Framework for Disaster Risk Reduction (SFDRR) 2015-2030 [1]. However, in its 2023 midterm review, the United Nations reported that "a lack of quality, interoperable, or accessible data" to quantify risk as a product of hazard, exposure, and vulnerability remains a challenge, especially in many least developed countries (LDCs) where data-collection tools have become inequitably unaffordable [2]. In particular, the expensive large-scale operation to standardize exposure datasets (e.g., human settlements) across countries with different and incomplete physical vulnerability characteristics (e.g., building material and construction type) has remained the primary bottleneck to providing a reliable understanding and audit of the evolving climate and disaster risk landscape globally [3].

Early efforts in developing large-scale exposure datasets were able to map the distribution of human settlements and their physical vulnerabilities [4], [5], which have been the basis of several global assessment reports [6][9]. Unfortunately, these datasets contain limited generalizability and inherent biases that favor developed countries. Specifically, LDCs have different and non-standard vulnerability characteristics because of the ubiquity of informal settlements and different construction methodologies [10], [11] and are increasingly outdated because of rapid urbanization [3].

Furthermore, the theme of most data-driven efforts [12], [13] focuses on mere detection of buildings (i.e., a simple binary task to estimate the presence or absence of a building as a geometry feature or a land use class that is inferred from satellite imagery). Despite several geospatial dasymetric efforts using digital elevation (DEM) and surface (DSM) models as a proxy to understand the downscaled distribution of physical vulnerability characteristics [14], it remains difficult because of the high acquisition costs vis-à-vis high-resolution quality and the limited temporal availability of DEMs and DSMs globally. Hence, many recent efforts attempted to use indirect datasets such as the publicly available imagery from Sentinel-1 and Sentinel-2 satellites because of its ability to capture the optical and backscattering signatures of the built environment that could be related to pertinent characteristics such as building height, surface color, and roof roughness [15], [16].

We present an ongoing effort to globally map not only the exposure but also its associated physical vulnerability characteristics using time-series medium-resolution satellite imagery (i.e., 5-30 meters/pixel). We introduce an initially developed benchmark dataset “OpenSendaiBench” (see Figure 1) and present our findings from a multi-pixel and multi-resolution implementation using the ResNet-50 deep convolutional neural network (CNN) architecture [17]. Our primary purpose is to bring into awareness this timely and relevant interdisciplinary problem to advance the area of large-scale risk quantification in informing our collective SFDRR and post-2030 long-term efforts.

Figure 1: Geographical coverage of the “OpenSendaiBench” dataset with 47 countries.

2 The “OpenSendaiBench” dataset↩︎

The global dataset is a 60-GB collection of 47 countries, wherein 45 are LDCs, and is available in our public Zenodo repository [18] with the following folder structure.

2.1 National Census-derived Exposure Data↩︎

We rasterized every country-wide point dataset of building counts from the METEOR project with a defined physical vulnerability type (see 7.1 for a complete list of typology) at a spatial resolution of 15 arcseconds or approximately 500 meters at the equator [5]. We then implemented a rigorous probability-based approach in extracting 100 square tiles for each country. We consider the effect of relative areal extent differences of these countries for future work.

In sampling these 100 square tiles, we considered the number of physical vulnerability types that are present in every pixel to ensure that every label including those unlabeled pixels is represented. Specifically, we analyzed the empirical probability distribution of each present physical vulnerability type and assigned a joint probability for each point on the map, assuming that the probability of being in a particular type is independent of each other (e.g., probability of a point being an informal settlement is not influenced by the probabilities of any other building types). We consider the effects of other highly vulnerable types on the agglomeration of informal settlements for future work.

We used these resulting joint probability values as inputs to our importance sampling technique to ensure a balanced representation of sampled pixels. Instead of individual sampling of pixel locations, we extracted square tiles, which is an 8-pixel-by-8-pixel group of 64 sampled locations.

2.2 Time-series Satellite Imagery↩︎

With the previously extracted geographical extents, we obtained the following pre-processed time-series satellite imagery via Google Earth Engine [19].

Sentinel-1 SAR GRD. At 10-m spatial resolution, we used the annual mean of the Ground Range Detected (GRD) scenes that are acquired from the dual-polarization C-band Synthetic Aperture Radar (SAR) instrument at 5.405GHz of Sentinel-1 satellite [20]. As a result, covering the years from 2019 to 2023, we extracted nine annual mean of the two bands: VV (vertical transmit, vertical receive) and VH (vertical transmit, horizontal receive) signals. To avoid data incompleteness across large areas, we disregarded filtering by orbital number and satellite direction. We also note that there are countries such as Angola, Comoros, Ethiopia, Kiribati, and Tuvalu with either partially or fully complete VV and VH signals because the orbit of Sentinel-1 satellite does not cover these areas for some time or only a single VV signal is available (see 7.2).

Sentinel-2 Harmonized MSI. With similar spatial resolution at 10 meters, we also extracted the annual median of the atmospherically corrected surface reflectance signals represented by the red, green, and blue (RGB) bands that are acquired from the MultiSpectral Instrument (MSI) of Sentinel-2 satellite [21]. The aggregation by year also enables minimizing the unnecessary noisy cloudy or shadowy signals using the available and corresponding Sentinel-2 cloud probability dataset [22]. Unlike Sentinel-1 SAR GRD, the resulting five annual median maps from 2019 to 2023 are all available for 47 countries.

3 Problem Definition: A Multi-resolution Multi-pixel Framing↩︎

Because of the differing spatial resolutions of ground truth labels and satellite imagery inputs, we approach this as a multi-resolution multi-pixel problem. We hypothesize that the 50x finer resolution of satellite imagery contains significant and detailed spatial patterns that could be informative to the learning of our machine learning models. Hence, we performed the upscaling or aggregation within the ResNet-50 architecture so that the resulting predictions have a similar dimensionality as the ground truth labels. Despite the interesting simpler opportunity to investigate a single-pixel approach, we also assume that a multi-pixel representation in the form of 8x8-array considers the pertinent information from neighboring pixels.

Moreover, we frame the problem to have coarser resulting predictions wherein the ground truth labels are not instead downscaled to match the resolution of satellite imagery because of the ethical consideration where overly localized attribution or prediction may pose social harm and cause undesirable impacts to the applied area of climate and disaster risk, particularly in formulating regional policies in climate financing or insurance. In other words, we note the lower-resolution approach is suitable for not only efficiently conducting large-scale risk quantification but also preserving the privacy of confidential household information as to what kind of building material, physical vulnerability, relative economic valuation, or any other indirect variables that could be inferred from a high-resolution study at 10 meters or finer.

Thus, we define our problem that, in every square tile \({T_{i}}^{year}\) for \(i \in \left [ 1, 100 \right ]\), there is a pair of (\(x_{locationIndexX}^{signalBand}\), \(y_{locationIndexY}^{vulnerabilityType}\)) wherein the \(locationIndexX\) and \(locationIndexY\) have geographical alignment mapping relationship. For \(x\), the \(signalBand\) is any combination from the set \(\left \{ red, green, blue, VV, VH \right \}\). For \(y\), the \(vulnerabilityType\) is any combination of physical vulnerability types common to a subset of countries. For example, the “informal constructions” type is present in 33 out of 47 countries.

As \(y\) values could take extremely high and low values (e.g., a highly dense area with hundreds of buildings), instead of \(y\) as a direct input to our CNN model, we computed the \(P_{nonexceedance}\) from a lognormal fit of building counts for a particular \(vulnerabilityType\), as a random variable (i.e., \(y \sim \ln \mathcal{N}(\mu, \sigma^2) \;\)). This transformed representation enables the use of probability values limited to the range \(\left [ 0, 1 \right ]\), which effectively provides a regional measure of dispersion as a decision variable of interest in the practice of performance-based engineering involving earthquakes and other natural hazards [23]. The \(P_{nonexceedance}\) is interpreted as the probability that a building count of a particular vulnerability type will be less than the predicted building count, which is computed as: \[\label{nonexceedance} P_{nonexceedance} = P\left [ Y_{locationIndexY}^{vulnerabilityType} \leq y \right ] = \Phi(\frac{\ln(y)}{\sigma}) = \frac{1}{2} \left[ 1 + \mathrm{erf}\left( \frac{\ln y-\mu}{\sqrt{2} \sigma} \right) \right]\tag{1}\]

4 Baseline Experiment: A ResNet-50 CNN Implementation↩︎

Modifying the ResNet-50 CNN architecture for a custom number of input signals, we trained the models with Adam optimizer, an initial learning rate of 0.0001, a batch size of 64, and a training-validation-testing split ratio of 60-20-20. As shown in Table 1, the numerical findings revealed that the model S1 trained with VV and VH signals resulted in the smallest MSE and MAE scores, implying that it can accurately estimate \(P_{nonexceedance}\) with an average absolute error of around \(\pm\)​1%. In addition, the model S1 improved the MSE and MAE scores of the model S2 by 25% (MSE) and 22% (MAE) and of the model S1+S2 by 4.1% (MSE) and 10% (MAE), respectively. This implies that the backscattering SAR signals, which primarily capture the surface roughness and texture of the ground, were more effective in learning the features than the optical RGB signals.

Table 1: Baseline test set score results for the ‘informal constructions’ type.
Model (Input Bands) \(\boldsymbol{MSE} \left[ 10^{-3} \right ]\) \(\boldsymbol{MAE} \left[ 10^{-2} \right ]\)
S1 (VV, VH) 4.93 1.07
S2 (R, G, B) 6.56 1.38
S1+S2 (VV, VH, R, G, B) 5.14 1.19

However, we note that additional preprocessing investigation may be needed to meaningfully use the optical RGB signals because these capture the optical signatures such as the roof color of the building. Existing building datasets and elevation maps as a prior belief may also be able to prune and enhance the model capability because other land features such as vegetation that rapidly changes through time may have affected the learned model parameters.

Furthermore, Figure 2 shows the predicted distribution of exposure and physical vulnerability of the city of Dhaka, which has many informal constructions, for the year 2019. We observed that, despite the underestimation of the large values of building count, the models can distinguish the areas with relatively high and low counts, which indicates that the probabilistic transformation should be chosen reliably to represent regional building counts with consideration of extreme values.

Figure 2: Predicted 2019 distribution of “informal constructions” of Dhaka, Bangladesh.

5 Conclusion and Future Work↩︎

As we are faced with global uncertainty about whether our local and collective efforts in disaster risk reduction have been progressing, we presented an ongoing machine learning effort that addresses this pressing problem to quantify large-scale risk by mapping the exposure and physical vulnerability characteristics using remote sensing. As a pioneering effort, we introduced the development of “OpenSendaiBench”, a global benchmark dataset that enables the community of both machine learning specialists and disaster risk modelers to contribute and build methodologies. We demonstrated the technical feasibility of this effort using a simple deep-learning model with promising baseline test score results and highlighted the story of informal constructions in Dhaka, Bangladesh.

For future work, we aim to expand the global catalog with Landsat and other Sentinel-2 imagery bands, use elevation maps (DEM/DSM) as prior belief, and incorporate spatial urban morphology growth models to empirically describe the regional dynamics. In partnership with key stakeholders, we plan to localize this effort for some selected cities in the Philippines and Bangladesh. Towards the end, we will implement the probabilistic risk analysis and derive the regional risk metrics, depending on the dominant natural hazards in a given country.

6 Acknowledgments↩︎

This work is funded by the UKRI Centre for Doctoral Training in Application of Artificial Intelligence to the study of Environmental Risks (EP/S022961/1).

7 Appendix↩︎

7.1 List of Building Types↩︎

Table 2: Description of physical vulnerability types and the number of countries with these types.
Symbol Countries Description
A 32 Adobe blocks (unbaked sundried mud block) walls
C 7 Reinforced concrete
C3L 43 Nonductile reinforced concrete frame with masonry infill walls low-rise
C3M 19 Nonductile reinforced concrete frame with masonry infill walls mid-rise
C3H 6 Nonductile reinforced concrete frame with masonry infill walls high-rise
DS 1 Rectangular cut-stone masonry block
INF 32 Informal constructions.
M 23 Mud walls
RE 3 Rammed Earth/Pneumatically impacted stabilized earth
RM 2 Reinforced masonry
RS 21 Rubble stone (field stone) masonry
RS1 3 Local field stones dry stacked (no mortar) with timber floors, earth, or metal roof.
RS2 1 Local field stones with mud mortar.
RS3 3 Local field stones with lime mortar.
S 9 Steel
S1L 1 Steel moment frame low-rise
S1M 1 Steel moment frame mid-rise
S3 8 Steel light frame
S5 1 Steel frame with unreinforced masonry infill walls
UCB 39 Concrete block unreinforced masonry with lime or cement mortar
UFB 33 Unreinforced fired brick masonry
UFB1 1 Unreinforced brick masonry in mud mortar without timber posts
W 28 Wood
W1 2 Wood stud-wall frame with plywood/gypsum board sheathing.
W2 1 Wood frame, heavy members (with area 5000 sq. ft.)
W3 5 Wood light unbraced post and beam frame.
W5 31 Wattle and Daub (Walls with bamboo/light timber log/reed mesh and post).

7.2 Completeness of Satellite Imagery↩︎

Table 3: Availability of Sentinel-1 GRD VV and VH signals for each country. As a reference, 0, 1, 2, and 3 mean ‘none’, ‘available’, ‘uncovered’, and ‘only VV signal is available’, respectively.
Country 2016 2017 2018 2019 2020 2021 2022 2023
AFG 1 1 1 1 1 1 1 1
AGO 1 1 1 1 1 1 2 2
BDI 1 1 1 1 1 1 1 1
BEN 1 1 1 1 1 1 1 1
BFA 1 1 1 1 1 1 1 1
BGD 1 1 1 1 1 1 1 1
BTN 1 1 1 1 1 1 1 1
CAF 1 1 1 1 1 1 1 1
COD 0 1 1 1 1 1 1 1
COM 0 1 1 1 1 1 0 0
DJI 1 1 1 1 1 1 1 1
ERI 1 1 1 1 1 1 1 1
ETH 1 1 1 1 1 1 1 1
GIN 1 1 1 1 1 1 1 1
GMB 1 1 1 1 1 1 1 1
GNB 1 1 1 1 1 1 1 1
HTI 1 1 1 1 1 1 1 1
KHM 1 1 1 1 1 1 1 1
KIR 3 3 3 3 3 3 3 3
LAO 1 1 1 1 1 1 1 1
LBR 1 1 1 1 1 1 1 1
LSO 1 1 1 1 1 1 1 1
MDG 1 1 1 1 1 1 1 1
MLI 1 1 1 1 1 1 1 1
MMR 1 1 1 1 1 1 1 1
MOZ 1 1 1 1 1 1 1 1
MRT 1 1 1 1 1 1 1 1
MWI 1 1 1 1 1 1 1 1
NER 1 1 1 1 1 1 1 1
NPL 1 1 1 1 1 1 1 1
RWA 1 1 1 1 1 1 1 1
SDN 1 1 1 1 1 1 1 1
SEN 1 1 1 1 1 1 1 1
SLB 1 1 1 1 1 1 0 0
SLE 1 1 1 1 1 1 1 1
SOM 1 1 1 1 1 1 1 1
SSD 1 1 1 1 1 1 1 1
STP 1 1 1 1 1 1 1 1
TCD 1 1 1 1 1 1 1 1
TGO 1 1 1 1 1 1 1 1
TLS 1 1 1 1 1 1 1 1
TUV 3 3 3 3 3 3 3 3
TZA 1 1 1 1 1 1 1 1
UGA 1 1 1 1 1 1 1 1
VUT 1 1 1 1 1 1 1 1
YEM 1 1 1 1 1 1 1 1
ZMB 1 1 1 1 1 1 1 1


UNISDR. Sendai framework for disaster risk reduction 2015–2030., 2015. Accessed: 2023-07-01.
UNDRR. Summary of the high-level meeting of the United Nations General Assembly on the midterm review of the implementation of the Sendai Framework for Disaster Risk Reduction 2015–2030., 2023. Accessed: 2023-07-01.
Emily So. Data and its role in reducing the risk of disasters in the built environment. Natural hazards, 119 (2): 1127–1130, 2023.
P Gamba, D Cavalca, K Jaiswal, C Huyck, and H Crowley. The GED4GEM project: Development of a global exposure database for the global earthquake model initiative. Proceedings of the 15th WCEE, Lisbon, 2012.
C Huyck, Z Hu, P Amyx, G Esquivias, M Huyck, and M Eguchi. data classification, metadata population and confidence assessment. report m3. 2/p. Technical report, British Geological Survey, 2019.
UNDRR. Global assessment report on disaster risk reduction 2013. https://www., 2013.
UNDRR. Global assessment report on disaster risk reduction 2015. https://www., 2015.
UNDRR. Global assessment report on disaster risk reduction 2019. https://www., 2019.
UNDRR. Global assessment report on disaster risk reduction 2022. https://www., 2022.
Rashmin Gunasekera, Oscar Ishizawa, Christoph Aubrecht, Brian Blankespoor, Siobhan Murray, Antonios Pomonis, and James Daniell. Developing an adaptive global exposure model to support the generation of country disaster risk profiles. Earth-Science Reviews, 150: 594–608, 2015.
Vitor Silva, Svetlana Brzev, Charles Scawthorn, Catalina Yepes, Jamal Dabbeek, and Helen Crowley. A building classification system for multi-hazard risk assessment. International Journal of Disaster Risk Science, 13 (2): 161–177, 2022.
Thomas Esch, Elisabeth Brzoska, Stefan Dech, Benjamin Leutner, Daniela Palacios-Lopez, Annekatrin Metz-Marconcini, Mattia Marconcini, Achim Roth, and Julian Zeidler. World settlement footprint 3d-a first three-dimensional survey of the global building stock. Remote Sensing of Environment, 270: 112877, 2022.
Wojciech Sirko, Sergii Kashubin, Marvin Ritter, Abigail Annkah, Yasser Salah Eddine Bouchareb, Yann Dauphin, Daniel Keysers, Maxim Neumann, Moustapha Cisse, and John Quinn. Continental-scale building detection from high resolution satellite imagery. arXiv preprint arXiv:2107.12283, 2021.
Christian Geiß, Peter Priesmeier, Patrick Aravena Pelizari, Angélica Rocio Soto Calderon, Elisabeth Schoepfer, Torsten Riedlinger, Mabé Villar Vega, Hernán Santa Marı́a, Juan Camilo Gomez Zapata, Massimiliano Pittore, et al. Benefits of global earth observation missions for disaggregation of exposure data and earthquake loss modeling: evidence from santiago de chile. Natural Hazards, 119 (2): 779–804, 2023.
Konstantin Müller, Robert Leppich, Christian Geiß, Vanessa Borst, Patrick Aravena Pelizari, and Samuel Kounev. Deep neural network regression for normalized digital surface model generation with sentinel-2 imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023.
David Frantz, Franz Schug, Akpona Okujeni, Claudio Navacchi, Wolfgang Wagner, Sebastian van der Linden, and Patrick Hostert. National-scale mapping of building height using sentinel-1 and sentinel-2 time series. Remote Sensing of Environment, 252: 112128, 2021.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
Joshua Dimasaka, Christian Geiß, and Emily So. , 2024. URL
Noel Gorelick, Matt Hancher, Mike Dixon, Simon Ilyushchenko, David Thau, and Rebecca Moore. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 2017. . URL
Copernicus Sentinel data. ., 2024. Accessed: 2024-02-01.
Copernicus Sentinel data. ., 2024. Accessed: 2024-02-01.
Copernicus Sentinel data. ., 2024. Accessed: 2024-02-01.
Pablo Heresi and Eduardo Miranda. Rpbee: Performance-based earthquake engineering on a regional scale. Earthquake Spectra, 39 (3): 1328–1351, 2023.