2.1 Automated simulation framework for flood risk analysis
In order to meet the requirements of real-time and city-scaled flood forecasting with due consideration of climate change and urbanization, the proposed automated simulation framework for flood risk analysis contains four modules as shown in Fig. 1. More details are described below:
(1) Data acquisition and preprocessing: Because of more information in color and building facade, UAV-based oblique photography is performed in this study to acquire the point cloud data needed for flood modeling. This method provides a more efficient and convenient way to collect the 3D up-to-date data of a target area than traditional GIS for flood risk analysis.
(2) Point cloud segmentation based on deep learning: Flood simulation is required to extract the areas of ground, vegetation and buildings, so as to be assigned different runoff parameters. Then, the first step is to perform the point cloud segmentation based on RandLA-Net (Hu et al. 2019), a deep learning-based encoder-decoder network. RandLA-Net is an efficient and lightweight neural network structure that can directly infer point-by-point semantics for large-scale point clouds. RandLA-Net can remarkably reduce the work and enhances the local feature capture. Generally, the point clouds are split into several parts: grounds, residential lands, farmlands, roads, shrubs, and concrete. More details are illustrated in Section 2.2.
(3) Point cloud filtering and DEM reconstruction: High-precision DEM is the basis for flood simulation. An improved technical scheme for high-precision DEM reconstruction is proposed herein. At first, original point clouds are pre-filtered based on the segmentation results, where the points of residential buildings, shrubs and concrete structures are removed. Then, a hierarchical smoothing filtering algorithm is proposed to extract the ground points. As the skeleton of terrain, topographic feature lines are extracted by the planar surface fitting and intersecting method to more accurately describe the abrupt changes of mountainous terrain. At last, the high-precision DEM is reconstructed by integrating the feature lines, and the points of potential flood routing are densified by the inverse distance weighting method. More details are illustrated in Section 2.3.
(4) Flood simulation based on hydrodynamics: The high-precision DEM in TIN format is imported into MIKE 21 software. Grids are generated by partitions using its automatic mesh function and are assigned with the different Manning coefficients according to point cloud segmentation results. Generally, the model settings are implemented by an automated script.
2.2 Point cloud segmentation based on deep learning
Generally, RandLA-Net uses an encoder-decoder architecture with skip connections, and its structure is illustrated in Fig. 2, in which (N, D) represents the number of points and the number of feature dimensions, respectively. As shown in Fig. 2, the input point cloud (N, 3) is first fed into a fully connected layer, and then four encoding and decoding layers are employed to learn the features of each point. Finally, three fully connected layers with a dropout layer are utilized to predict the semantic label of each point. The random down-sampling technique used in RandLA-Net is a key facilitator for processing massive point clouds since it requires less computational time and cost compared with other sampling techniques. Additionally, the introduced local feature aggregation (LFA) module preserves complicated local structures by gradually increasing the receptive field for each 3D point (Hu et al. 2019).
As shown in Fig. 3, the LFA module consists of three neural units (i.e., local spatial encoding, attentive pooling, and dilated residual block). Among them, the coordinates of all neighboring points are embedded into the local spatial encoding unit to explicitly observe the local geometric patterns, thus eventually helping the entire network to learn the complex local features effectively. For more explanation, the local spatial encoding unit operates the following steps: (1) Finding neighboring points. For the \({i}^{th}\) point, the simple K-nearest neighbors (KNN) algorithm is used to gather its neighboring points firstly, based on the point-wise Euclidean distances. (2) Relative point position encoding. For each nearest point \(\{{p}_{i}^{1}\cdots {p}_{i}^{k}\cdots {p}_{i}^{K}\}\), the relative position with the center point \({p}_{i}\) is explicitly encoded by Eq. (1). (3) Point feature augmentation. For each nearest point \({p}_{i}^{k}\), the encoded relative point position \({r}_{i}^{k}\) is concatenated with its corresponding point feature \({f}_{i}^{k}\), obtaining an augmented feature vector \({\widehat{f}}_{i}^{k}\). Eventually, the output of the local spatial encoding unit is a new set of neighboring features \({\widehat{F}}_{i}=\{{\widehat{f}}_{i}^{1}\cdots {\widehat{f}}_{i}^{k}\cdots {\widehat{f}}_{i}^{K}\}\) that explicitly encodes the local geometric features for the center point \({p}_{i}\). Then, the attentive pooling unit gathers the set of neighboring point features \({\widehat{F}}_{i}\) by a shared function for learning a unique attention score for each feature. At last, the LFA module stack multiple local spatial encoding and attentive pooling units with a skip connection as a dilated residual block.
$$\begin{array}{c}{r}_{i}^{k}=MLP\left({p}_{i}\oplus {p}_{i}^{k}\oplus \left({p}_{i}-{p}_{i}^{k}\right)\oplus ‖{p}_{i}-{p}_{i}^{k}‖\right)\left(1\right)\end{array}$$
where \({p}_{i}\) and \({p}_{i}^{k}\) are the positions of points, \(\oplus\) is the concatenation operation, and \(‖\bullet ‖\) calculates the Euclidean distance between the neighboring and center points.\({r}_{i}^{k}\) represents the encoded relative point position.
In the network of this study, four encoding layers are adopted for progressively reducing the resolution of the point cloud. The encoding layers down-sample the point cloud at a sampling rate of 4 times, while increasing the feature dimension of each point to retain more information. Each encoding layer uses the random sampling as the down-sampling strategy. Since random sampling discards points non-selectively, each encoding layer also has an effective local feature aggregation module (LFA) that learns features of the point cloud without losing important information (Hu et al. 2019). Four decoding layers are used after the four encoding layers. For each decoding layer, the K-Nearest Neighbor (KNN) algorithm is used to find the nearest neighbor of each point, whose features are then up-sampled through the nearest neighbor interpolation. These up-sampled features are concatenated with the intermediate feature maps generated by encoding layers using skip connections and then fed into the shared MLP layer. The final output of RandLA-Net is the semantic prediction label for each point, denoted as \(N\times {n}_{ciass}\), where \({n}_{ciass}\) represents the number of classes in the input point cloud.
2.3 Point cloud filtering and DEM reconstruction
Most common filtering algorithms usually distinguish ground points and non-ground points by mature digital image processing theory or height differences among points, judging whether the difference between pixels or points is larger than the pre-determined threshold. To ensure the accuracy and resolution of DEM reconstruction, this study proposes an improved algorithm based on the hierarchical smoothing method (Xu and Wan 2010), as described below:
Step 1: Reference surface construction. To improve the efficiency of data processing, regular grids are used to re-sampled the point clouds and then a reference dataset is formed by mature imagery filtering theory. It is noted that the generated reference dataset is only taken as a threshold surface to avoid the accuracy loss of original point clouds, and the following filtering process is still conducted with the original point clouds.
Step 2: Hierarchical smoothing. For each grid cell of the reference dataset, the average vertical coordinate value (or height) of all points in a grid cell is taken as the cell value. Then, the reference dataset is smoothed by the hierarchical smoothing method, illustrated in Eqs. (2) and (3). If the height difference between the center cell value and the minimum value of grid cells in four directions is larger than the threshold, the center cell value will be replaced by the predicted value. Otherwise, the center cell maintains the initial value.
$$\begin{array}{c}{Z}_{\left[i,j\right]}=\left\{\begin{array}{c}{\sum }_{k=0}^{n}{Z}_{{P}_{k}}/n, \text{i}\text{f} \varDelta h\le {T}_{h}\\ \text{min}\left\{{Z}_{1}, {Z}_{2},{Z}_{3},{Z}_{4}\right\}, \text{i}\text{f} \varDelta h>{T}_{h}\end{array}\right.\left(2\right)\end{array}$$
$$\begin{array}{c}{T}_{h}=\left\{\begin{array}{c}{h}_{0}, \\ {h}_{0}+{S}_{w}\epsilon ,\end{array}\right.\begin{array}{c}\text{i}\text{f} {S}_{w}\le 1\\ \text{i}\text{f} {S}_{w}>1\end{array}\left(3\right)\end{array}$$
where \({Z}_{\left[i,j\right]}\) is the value of the grid cell \(\left[i,j\right]\); \({P}_{k}\) is one point within the cell \(\left[i,j\right]\), and the total number of points in the cell is denoted by \(n\). \({Z}_{1}, {Z}_{2},{Z}_{3},{Z}_{4}\) are the prediction values of the grid cells in four directions; \(\varDelta h\) and \({T}_{h}\) denote the height difference and the threshold, respectively; \({h}_{0}\) represents the minimum height difference; \({S}_{w}\) is the resolution of the dataset, and \(\epsilon\) represents the scale factor ranging from 0 to 1.
Subsequently, a new dataset with higher resolution is generated by integrating original point clouds and the smoothed reference dataset. If the height difference between the new dataset and the smoothed reference dataset is larger than the threshold, the new cell value will be replaced by the corresponding value of smoothed reference dataset. At last, the above steps are repeated until the final dataset resolution is less than the threshold. During this process, the dataset resolution is varied with the threshold, and a higher resolution requires a stricter threshold.
Step 3: DEM reconstruction with features embedded. To improve the DEM reconstruction accuracy in mountainous areas, this study extracts the topographic feature lines of mountainous areas by planar surface fitting and intersecting method at first (Xu et al. 2014; Zhao and Wang 2022), and the local area is divided into two parts. Then, ground points around the topographic feature lines are densified by the inverse distance weighting method (Huang et al. 2023; Zhao and Wang 2022). The specific densification method can be expressed as Eq. (4).
$$\begin{array}{c}\left\{\begin{array}{c}{w}_{i}={d}_{i}^{-2}/{\sum }_{i=1}^{S}{d}_{i}^{-2}\\ {h}_{j}={\sum }_{i=1}^{S}{w}_{i}{h}_{i}\end{array}\right.\left(4\right)\end{array}$$
where \({w}_{i}\) denotes the weight of each known ground point in the semicircle; \({d}_{i}\) is the distance between the current position to be densified and the known ground point; \({h}_{i}\) and \({h}_{j}\) represent the elevation of each known ground point in the semicircle and the elevation of the current position to be densified, respectively.
The points on the topographic feature lines are used as the center of the search semicircle with a radius R, and a search semicircle is generated on one side of the topographic feature lines for point cloud densification. In this study, the number of known ground points (\(S\)) used for densification is 12 by default. If the known ground points in the search semicircle cannot reach \(S\), the radius of the search semicircle is dynamically adjusted until the number of known ground points reaches \(S\). In a similar way, the ground point densification is conducted on the other side of the topographic feature lines, as well as potential flood routing areas and the holes in the point clouds induced by the removal of the non-ground points.
At last, the high-precision DEM reconstruction is conducted, meeting the requirement of the Delaunay criterion. A strategy of point-by-point insertion and growth algorithm is applied to connect the discrete point clouds into a triangulated network based on the Delaunay criterion. As for the topographic feature lines or the boundary lines of different land covers (such as buildings, farmland, residential land, and so on), the elevation of the inserted node must be calculated first by interpolating the elevation of points on both sides of the lines.