Skip to main content

Failure Mode Detection of Reinforced Concrete Shear Walls Using Ensemble Deep Neural Networks


Reinforced concrete structural walls (RCSWs) are one of the most efficient lateral force-resisting systems used in buildings, providing sufficient strength, stiffness, and deformation capacities to withstand the forces generated during earthquake ground motions. Identifying the failure mode of the RCSWs is a critical task that can assist engineers and designers in choosing appropriate retrofitting solutions. This study evaluates the efficiency of three ensemble deep neural network models, including the model averaging ensemble, weighted average ensemble, and integrated stacking ensemble for predicting the failure mode of the RCSWs. The ensemble deep neural network models are compared against previous studies that used traditional well-known ensemble models (AdaBoost, XGBoost, LightGBM, CatBoost) and traditional machine learning methods (Naïve Bayes, K-Nearest Neighbors, Decision Tree, and Random Forest). The weighted average ensemble model is proposed as the best-suited prediction model for identifying the failure mode since it has the highest accuracy, precision, and recall among the alternative models. In addition, since complex and advanced machine learning-based models are commonly referred to as black-box, the SHapley Additive exPlanation method is also used to interpret the model workflow and illustrate the importance and contribution of the components that impact determining the failure mode of the RCSWs.


In building design, shear walls are commonly employed to protect the structure from lateral forces such as earthquake ground motions. Shear walls are a cost-effective way to reinforce a building’s structural system against lateral loads. A reinforced concrete structural wall (RCSW) improves the building’s stiffness in the wall plane, lowering the building’s lateral sway and boosting its stability in that plane. For this reason, this system is widely used in buildings. Existing RCSW buildings are assessed and retrofitted following local jurisdictional regulations and it is critical to predict the severe damage and failure modes of RCSWs accurately. According to the aspect ratio (the ratio of height to the wall length), RCSWs are classified as slender or squat (Barkhordari et al., 2021; Massone et al., 2021). Slender walls (aspect ratio > 3) are more prone to ductile failure characterized by bar buckling and concrete crushing, bar fracture, or global or local lateral instabilities. Squat walls (aspect ratio < 1.5) are prone to have shear-controlled failure mechanisms, which can be characterized by diagonal tension, diagonal compression (web crushing), or shear sliding at the base. Walls with an aspect ratio of 1.5 to 3.0 (moderate-aspect-ratio walls) display behavior that is characterized by yielding in flexure and failing in shear.

In recent years, researches on the application of machine learning models for damage estimation and monitoring in civil engineering have recently been published. They can be categorized into two main types, including: (1) regression-based methods and (2) classification-based methods. Some of the previous studies used various machine learning algorithms to estimate the performance of shear walls. Moradi et. al. (2020) studied the application of the radial basis function network to assess the impacts of the rectangular opening on the behavior of the steel plate shear walls. They suggested the proposed network that can be used to design new walls or retrofit existing ones while consuming less time and requiring no specific software knowledge. Using the extreme gradient boosting (XGBoost) technique, Feng et. al. (2021) developed a forecasting model for predicting the shear strength of reinforced concrete squat walls. According to studies, the XGBoost model provides a decent prediction for shear strength, with an average computed-to-measured ratio of 1.0. Chen et. al. (2018) and Nguyen et. al. (2021) utilized neural networks for shear strength prediction of reinforced concrete squat walls. Gondia et. al. (2020) used genetic programming, a kind of artificial intelligence, to develop an expression for the shear strength of reinforced concrete squat walls. Keshtegar et. al. (2021a, 2021b) used neural network merged with adaptive harmony search algorithm and support vector regression with response surface model to estimate the ultimate shear capacity of RCSWs. Pizarro et. al. (2021) and Pizarro and Massone (2021) developed a convolutional network-based solution to produce the ultimate engineering floor plan of reinforced shear wall concrete buildings using a dataset of 165 Chilean residential layouts. Barkhordari and Tehranizadeh (2021) developed a hybrid technique, the neural network with simulated annealing, for predicting the response of the RCSWs, such as forces and bending moment at the base of the wall, curvature of the wall, and normal strain in the vertical/horizontal direction. Parsa and Naderpour (2021) used the support vector regression with meta-heuristic optimization algorithms for the estimation of the shear strength of the RCSWs. Mangalathu et. al. (2020) investigated the effectiveness of eight machine learning techniques in detecting RCSW failure mechanisms. To sum up, only Mangalathu et. al. (2020) examined the efficiency of the ensemble learning algorithms, except ensemble deep neural network algorithms, in predicting the failure mode of the RCSWs.

Even though the aspect ratio marks tendencies of general behavior, several factors affect the failure mode of the RCSWs. In addition, appropriate methodologies for assessing their likely mechanism of failure during earthquake occurrences are required to acquire a better knowledge of the seismic behavior of existing structures and to create appropriate retrofit solutions. In addition, although there have been attempts to predict the failure mode with different machine learning approaches (Mangalathu et al., 2020), no previous research has used ensemble deep neural network models to predict the failure mechanism of the RCSWs. In this study, model averaging, weighted average, and integrated stacking are three ensemble learning approaches that are employed for failure mode detection of the RCSWs. Moreover, the relevance of input variables such as aspect ratio (or more general, moment-to-shear length ratio), steel ratio, concrete strength, axial load, among others, has not been studied. The aim here is to evaluate the efficiency of the ensemble deep neural network models for assessing the failure mechanism of the RCSWs and investigate the relevance of the main variables and how they relate to failure modes.

Method and Material

Here, Keras (Chollet 2015) is used to develop deep learning models, which is an open-source library for artificial neural networks that supports a Python interface. A brief description of the data used in this study, various parts of the base neural networks, and ensemble algorithms are provided below.


Test data generated by experimental studies on RCSWs, where specimens are tested using cyclic loading protocols, is used in this study, 393 experimental data of RCSWs tested that were collected from the literature by Mangalathu, Jang (Grammatikou et al., 2015; Mangalathu et al., 2020; Usta et al., 2017). It is worth noting that the RC walls in this repository are traditional RCSWs; they do not contain testing of repaired or precast RCSWs, nor do they include walls that have been subjected to dynamic loading. RCSWs have a distribution of 3 types of cross-section configurations: 238 rectangular walls, 95 barbell walls, and 60 flanged walls (Fig. 1). The distribution of the failure mode (output) of RCSWs is shown in Table 1. Design parameters are listed in Table 2. In Table 2, \(P\) is the axial load, \(A_{{\text{g}}}\) is the gross area of the section, \(f_{{\text{c}}}\) is the compressive strength of concrete, \(A_{{\text{b}}}\) is the boundary element area, \(\rho_{i,x/y}\) is the reinforcing ratio in the horizontal/vertical direction, \(t_{{\text{w}}}\) is the wall thickness, \(H\) is the height of the wall, \(l_{{\text{w}}}\) is the length of the wall, and \(f_{{\text{y}}}\) is the yield strength of the reinforcement. Due to insufficient research funds and scientific equipment capacities, many experimental investigations have been undertaken with small-scale specimens. The use of dimensionless values for the input variables is desirable to estimate the failure of the shear walls. Therefore, parameters are normalized to use dimensionless variables. The input variables are \(M/Vl_{{\text{w}}}\), \(A_{{\text{b}}} /A_{{\text{g}}}\), \(l_{{\text{w}}} /t_{{\text{w}}}\), \(P/f_{{\text{c}}} A_{{\text{g}}}\), \(\rho_{{{\text{vb}}}} f_{{{\text{y}},{\text{vb}}}} /f_{{\text{c}}}\), \(\rho_{{{\text{vw}}}} f_{{{\text{y}},{\text{vw}}}} /f_{{\text{c}}}\), \(\rho_{{{\text{hb}}}} f_{{{\text{y}},{\text{hb}}}} /f_{{\text{c}}}\), and \(\rho_{{{\text{hw}}}} f_{{{\text{y}},{\text{hw}}}} /f_{{\text{c}}} .\) Table 3 shows the statistical properties of the input variables, where Min., Max., and STD are the minimum, maximum, and standard deviation of variables, respectively. All 393 datasets are split randomly into training and testing sets. 80% of the total of 393 datasets is utilized randomly for model development and 20% of the datasets are chosen to determine the model accuracy. Input data are normalized so that all values are within the range of − 1 and 1. The limitation of the data is that the target class has an uneven distribution of observations.

Fig. 1
figure 1

Cross-sections of the reinforced concrete walls.

Table 1 Distribution of failure mode.
Table 2 Statistical features of database.
Table 3 Statistical characteristics of the input variables.

Fig. 2 provides a stacked bar for types of walls/section vs. failure mode. In Fig. 2, F, FS, S, and SL stand for ‘flexural failure’, ‘flexure–shear failure’, ‘shear failure’, and ‘sliding shear failure’. Fig. 2a shows the distribution of types of walls vs. failure mode. It is clear that the number of failure modes is different for different types of walls. The graph shows that all slender walls have only flexural failure. Furthermore, there is some variation in the graph. For instance, there is a large growth in the number of walls with shear failure when comparing the moderate sector with the squat sector. Fig. 2b shows the distribution of section vs. failure mode. It is apparent from this graph that rectangular walls mostly have flexural failure. In addition, flanged walls mostly have shear failure. This is most likely because when walls have the boundary element, their flexural strength increases. As a result, shear failure may occur before flexural failure under lateral loading if the lateral force relating to flexural strength is greater than the lateral force corresponding to shear strength.

Fig. 2
figure 2

Distribution of section/types of walls vs. failure mode. a Distribution of types of walls vs. failure mode. b Distribution of section vs. failure mode.

Basic Models

For almost all ensemble methods, a series of models must first be created as basic models (or sub-model) to form an ensemble model. This means that several models are taught using training data. The baseline models are constructed of five different deep neural networks. Models in Keras are defined as a sequence of layers. Each layer has some nodes (neurons). When generating deep neural networks, one of the most common questions is what should be the number of neurons per layer? Here, the learning rate, activation functions, optimizer, and the number of neurons per layer are determined using the Keras-Tuner library that helps to pick the optimal set of hyperparameters for deep neural networks. Fig. 3 is a simplified form of the workflow. Here, six activation functions (Sigmoid, Relu, Softplus, Tanh, Selu, and Elu) are considered.

Fig. 3
figure 3

Model workflow.

The optimizer is a relevant component of the training phase. The optimizer function assists the network in determining how to update the weights to minimize the loss. Eight optimization algorithms are utilized namely adaptive gradient (Adagrad) (Duchi et al., 2011), Adaptive delta (Adadelta) (Zeiler & Adadelta, 2012), Stochastic Gradient Descent (SGD) (Krogh & Hertz, 1992), Root Mean Square Prop (RMSProp) (Zeiler & Adadelta, 2012), Adaptive Moment Estimation (Adam) (Kingma & Ba, 2014), Adamax (a variant of Adam based on infinity norm) (Kingma & Ba, 2014), Nadam (Adam with Nesterov momentum) (Dozat, 2016), and Follow-the-regularized-leader (Ftrl) (McMahan et al., 2013). In this study, the basic input parameters contain \(M/Vl_{{\text{w}}}\), \(A_{{\text{b}}} /A_{{\text{g}}}\), \(l_{{\text{w}}} /t_{{\text{w}}}\),\(\rho_{{{\text{vb}}}} F_{{{\text{y}},{\text{vb}}}} /f_{{\text{c}}}\), \(\rho_{{{\text{hb}}}} F_{{{\text{y}},{\text{hb}}}} /f_{{\text{c}}}\), \(P/f_{{\text{c}}} A_{{\text{g}}}\), \(\rho_{{{\text{hw}}}} F_{{{\text{y}},{\text{hw}}}} /f_{{\text{c}}}\) , and \(\rho_{{{\text{vw}}}} F_{{{\text{y}},{\text{vw}}}} /f_{{\text{c}}} .\) The output of the model is predicted failure mode class, including ‘flexural failure’, ‘flexure–shear failure’, ‘shear failure’, and ‘sliding shear failure’.

Among all, five basic models are selected in this study based on their performance on test data. Table 4 summarizes the information of the final basic models (sub-model) that had the highest accuracy. Overfitting occurs when a model learns the knowledge and noise in the training set to the degree where it degrades the model’s performance on hold-out data. Sometimes in ensemble techniques, the sub-models may have overfitting problems. In this study, the learning curve of all models was monitored to ensure that overfitting did not occur. In this study, the learning curve (e.g., Fig. 4) of all sub-models is monitored to ensure that overfitting did not occur.

Table 4 Characteristics of the basic models.
Fig. 4
figure 4

Learning curve of sub-model 1.

Model Averaging Ensemble (MAE)

MAE is an ensemble method that involves training many models on the same data set (Brownlee, 2018). The outputs from each of the trained models are then added together, and the average is used as the final predicted value. The number of models needed for the ensemble can vary depending on the solution space complexity. One technique is to construct new models on a constant schedule (increasing the number of layers), add them to the group, and then assess their contribution to performance by predicting on a test set. Fig. 5 shows how the MAE method works using sub-models (Table 4).

Fig. 5
figure 5

Model averaging ensemble framework.

Weighted Average Ensemble (WAE)

The averaging performed in the MAE method means that the output values of the sub-models have an equal effect on the predicted final value (Brownlee, 2018). The WAE approach makes it possible for superior models to have a larger share of the predicted final value, while less talented models have a smaller share. In this method, a weight is assigned to the output of each subset model. The value of these weights is usually determined by an optimization algorithm.

Metaheuristic search algorithms are divided into three categories (Ahmadianfar et al., 2021), named by swarm-based algorithms, evolutionary-based algorithms, and trajectory-based algorithms. Evolutionary algorithms are developed mostly from Darwin’s theory of urgency and natural selection. Differential evolution (DE) is a kind of evolutionary algorithm (Bennis & Bhattacharjya, 2020) that initializes the members of the algorithm with several potential solutions at the start. The differential evolution iterative technique is then continued by applying the difference vector based on the DE operators, including mutation, crossover, and the selection mechanism. Following that, each solution is evaluated using a specified objective function in an iterative optimization process. The mutation process aims to vary the population member vector for the next iteration based on any available information from the previous step in the search (Eq. 1):

$$\vec{y}_{i} (t + 1) = m_{{\text{r}}} \cdot \left( {x_{2,j} (t) - x_{3,j} (t)} \right) + x_{1,j} (t),$$

where \(\vec{y}_{i}\) is the mutant vector (MV) of the ith member, \(x_{1,j}\), \(x_{2,j}\), and \(x_{3,j}\) are chosen at random from the population, and \(m_{{\text{r}}}\) is the scaling factor that fine-tunes the size of perturbation in the process. The crossover operator is utilized to broaden the population genetic variation among the MVs. As a result, the MV replaces its elements with those of the current population. The selection mechanism is used to determine which of the offspring individuals (and their parent) will survive in the next cycles, as well as to keep the pre-determined population size constant. The population is built using the individuals, which are selected between the trial and their predecessor vectors, which have better performance in terms of the objective function. Here, differential evolution is used to calculate the weight of each sub-model because of its advantages, such as discovering the global minimum of a search space independent of the initial values, fast convergence, and the usage of a few control factors (Karaboga & Cetinkaya, 2004). Fig. 6 shows the flowchart of the WAE method with the differential evolution algorithm.

Fig. 6
figure 6

Flowchart of the WAE method with differential evolution algorithm.

Integrated Stacking Ensemble (ISE)

Although the average of the model can be improved by weighting the influences of each sub-model, it may be further improved by teaching a completely new model (a neural network) to discover how to better combine each of the sub-models, using the so-called Integrated Stacking Model (ISE) (Brownlee, 2018; Naimi & Balzer, 2018). The new model is usually called meta-learner, where the sub-models are integrated with a neural network. In other words, the ISE can be interpreted as a single huge model which then learns how to merge the results from every single sub-model in the most efficient way possible. Here, the architecture of the meta-learner consists of only one hidden layer with 5 neurons. The process of determining the hidden layer’s neurons is just a case of trial and error (after examining the range of 2 to 20 neurons, the best performance of the ISE model with at least 5 neurons is obtained). Fig. 7 represents a diagram to understand the ISE model process.

Fig. 7
figure 7

Flowchart of the ISE method.

Results and Discussion

For the MAE method, the number of members can change the result. Therefore, the influence of the number of sub-models on the model’s accuracy is explored, and the best model with the minimum members is chosen. Fig. 8 shows the effect of the number of members versus the accuracy. Increasing the size of the ensemble model (adding sub-model) has been done by first creating a new model with the first two sub-models, that is, sub-model 1 and sub-model 2 from Table 4, and for each subsequent ensemble model another sub-model is added to the previous group, examining the accuracy of the ensemble model on the test set. It can be seen that from 1 to 2 sub-models, there is a marked rise in the ensemble model's accuracy. From 2 to 3 sub-models, there is a modest rise in the ensemble model's accuracy. This is followed by a constant accuracy for models with more than three members. As a result, a model with three members (sub-models 1–3) is selected for this method.

Fig. 8
figure 8

Effect of the number of ensemble members.

As mentioned, the WAE model permits higher-performing models to have a bigger proportion, while lower-performing models have a lesser share by assigning a weight to the sub-models’ output. Table 5 shows optimized weights, which are determined using the differential evolution algorithm. The WAE models’ accuracy with the optimized weights is 0.987.

Table 5 Optimized weights.

The last ensemble model is the ISE model. The accuracy of the ISE model also is 0.962. Considering the stochastic nature of neural networks’ learning algorithm, it’s possible that each time a neural network model is trained, it will discover a mild/significantly various version of the mapping function between inputs and outputs, that is, neural networks have a high variance, resulting in differences in performance on the training and test sets. WAE models work virtually well in most situations because different neural networks do not always produce the same errors on the test set (Goodfellow et al., 2017). In addition, sub-models have different numbers of layers and neurons in each layer. Change in the number of layers helps to consider the various levels of nonlinearity. The number of neurons in the layer is also important since they consider the interaction between the parameters. This means that by increasing the number of neurons and creating more relationships, if these relationships are not appropriate, it may reduce the efficiency of the neural network and the accuracy of network prediction and vice versa.

ISE model performs worse than WAE. This could be due to the local minima. The feed-forward neural network, which is trained using backpropagation has a variety of drawbacks, such as falling into local minima and learning at a slow rate (Lee et al., 1991).

Fig. 9 shows the confusion matrix of various models. In Fig. 9, F, FS, S, and SL stand for ‘flexural failure’, ‘flexure–shear failure’, ‘shear failure’, and ‘sliding shear failure’. The failure modes successfully identified by the classification algorithm are represented by the diagonal cells in the confusion matrix, whereas the failure modes incorrectly predicted are represented by the off-diagonal cells. The lowest cell on the right of the figure shows the overall accuracy (Eq. 2). The column on the far right of the confusion matrix indicates the precision metric (Eq. 3). The row at the bottom of the confusion matrix indicates the recall metric (Eq. 4):

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FP}} + {\text{FN}} + {\text{TN}}}},$$
$${\text{Precision }} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}},$$
$${\text{Recall }} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}},$$

where TN indicates that the model predicted ‘False’ and the real outcome was ‘False’, FP denotes that the model predicted ‘True’ but the real outcome was ‘False’, FN means that the real outcome was ‘False’, and the model predicted ‘True’, and TP denotes that the model predicted ‘True’ and the real outcome was ‘True’. Among the ensemble algorithms, the WAE model fares much better. The WAE model has the highest accuracy with 0.987 for the test set, whereas the MAE and ISE models have the same accuracy (0.962). It appears that recognizing the FS failure mode is often challenging, and the MAE model, WAE model, and ISE model all have 0.88, 0.94, and 0.84 precision in identifying the FS failure mode in the test set, respectively.

Fig. 9
figure 9

Confusion matrix of various models. a ISE model. b MAE model. c WAE model.

Comparisons with Previous Studies

Mangalathua et. al. (2020) used eight machine learning models to establish a model to distinguish the failure mode of the RCSWs. The following is a list of machine learning models that Mangalathu et. al. (2020) utilized to determine the failure modes of concrete shear walls.

  1. 1.

    Naive Bayes classifier (Domingos & Pazzani, 1997; Osisanwo et al., 2017): A probabilistic machine learning technique called a Naive Bayes classifier is utilized to perform classification tasks. The Bayes theorem lies at the core of the classifier. It is one of the most basic Bayesian network models, but when combined with kernel density estimation, it can reach higher precision.

  2. 2.

    k-Nearest Neighbors regression (Altman, 1992): The k-Nearest Neighbors regression is the most basic nonparametric regression method. The method tries to find the closest one of the k groups that contain given input x and return the mean of the data values in that group. In other words, the KNN algorithm believes that similar objects are close together. To put it another way, related items are close together.

  3. 3.

    Decision tree (Jaworski et al., 2017; Quinlan, 1983): A decision tree is generated by repeatedly dividing the dataset into a sequence of subsets. The training set is made up of pairs (x, y), where y is the label that corresponds to the pair (x, y). The learning approach divides the training data set into classes based on x, seeking to make each group’s assignments as similar as possible. The teaching process must choose a characteristic and a corresponding threshold for that characteristic, using which the data will be divided.

  4. 4.

    Random Forests (Breiman, 2001): Using ensemble learning, it is feasible to merge a group of decision trees into a bigger composite tree that outperforms its individual elements. The composite tree helps to reduce decision trees’ main flaw: large variance. Random forest classifiers aid by averaging out the estimations of numerous basic trees to reduce variation by training the integrated trees using random subsets of the training dataset.

  5. 5.

    Ensembles are groups of models that work together to achieve a classifier (Friedman et al., 2001). Bagging and boosting are the two main methods for creating ensembles. Individual high-variance classifiers benefit from bagging since the majority of the classifiers try to smooth out the individual classifiers, resulting in a more reliable joint solution. Mangalathua et. al. (2020) used CatBoost, XGBoost, AdaBoost, and LightGBM which are boosting methods. Boosting, on the other hand, is especially useful for high-bias classifiers that take a long time to adjust to new content.

Tables 6, 7 and 8 show the three performance measures of the models used by Mangalathu et. al. (2020) and the ensemble models examined in this study. It should be noted that the Mangalathu et. al. (2020) database was used in this study. Weighted-average precision or recall means that precision or recall are calculated for each class and weight by the number of instances of each class. Overall, we see the WAE model outperforms the other approaches. The best model of Mangalathu et. al. (2020) has an accuracy of 0.86 while the WAE model has an accuracy of 0.99. In terms of other performance measures, the WAE model also fares well.

Table 6 Accuracy of various methods.
Table 7 Weighted-average precision of various methods.
Table 8 Weighted-average recall of various methods.

Because various splits of the data can generate significantly diverse results, repeated tenfold cross-validation is performed to measure the best deep neural network’s performance. This indicates that the data is apportioned into training and test sets with a 90–10 split every time. Fig. 10 shows this by presenting model performance using tenfold cross-validation for 10 repetitions. The green triangle reflects the arithmetic mean, whereas the orange line denotes the distribution’s median. The average appears to be around 0.85, 0.84, and 0.8 for the Random Forest (Fig. 10b), CatBoost (Fig. 10c), and Decision Tree (Fig. 10d), respectively. For the WAE model, the accuracy score fluctuates slightly around 0.98. As a result, these scores can be considered the most reliable estimate of models’ performance. Also, the analysis of the test accuracy of the Random Forest, CatBoost, and Decision Tree clearly demonstrates a variance in the performance of the models trained on the dataset using tenfold cross-validation. It is now understood that although common machine learning techniques provide more flexibility and can scale in response to the amount of accessible training data, they learn using a stochastic learning algorithm (Brownlee, 2018; Maclin & Opitz, 2011), which means they are susceptible to the specifics of the training data and may discover a various set of hyperparameters each time they are trained, which in turn considers different levels of nonlinearity and level of interaction between parameters, resulting in different predictions (Brownlee, 2018; Maclin & Opitz, 2011). These algorithms include a lot of instability, which can be problematic when trying to come up with a final model to utilize for generating predictions (Brownlee, 2018; Maclin & Opitz, 2011). Training deep neural networks instead of a single model and combining the results from these models is a powerful way to lower the variation. This is also known as ensemble deep neural network models, which is used in this study, because it can not only minimize prediction variance, but also produce results that are better than any single model.

Fig. 10
figure 10

Box and Whisker plots of accuracy. a WAE model. b Random Forest (Mangalathua’s model). c CatBoost (Mangalathua’s model). d Decision Tree (Mangalathua’s model).

Model Features Analysis

In this work, SHAP (Lundberg & Lee, 2017) is utilized to analyze the WAE model's predictions. SHAP is a game theory-based technique that can be used to indicate how the parameters affect the response. The output model in SHAP is created by adding input variables in a linear form (Eq. 5):

$$f(x) = k(x^{\prime}) = \varphi_{0} + \sum {\varphi {}_{i}x^{\prime}_{i} } ,x = h\left( {x^{\prime}} \right).$$

In Eq. 5, \(f(x)\) is the original model, \(x\) is the original input, k is the explanation model for \(f(x)\). A connection is made between \(x\) and \(x^{\prime}\) employing a function called \(h_{x} (x^{\prime})\). The decision score for each class is averaged across the samples in the training set to approximate \(\varphi_{0}\), which is stored as the expected value attribute of the explainer. The unknowns of Eq. 5 are calculated using Eq. 6:

$$\begin{aligned} \varphi_{i} (f,x) & = \sum\limits_{{z^{\prime} \subseteq x^{\prime}}} {\frac{{\left| {z^{\prime}} \right|!(M - \left| {z^{\prime} - 1} \right|)!}}{M!}} \left[ {f_{x} (z^{\prime}) - f_{x} (z^{\prime}\backslash i)} \right],z^{\prime} \subseteq x^{\prime} \\ f_{x} (z^{\prime}) & = f(h_{x} (z^{\prime})) = E[f(z)\left| {z_{s} } \right.],z^{\prime} \subseteq x^{\prime}. \\ \end{aligned}$$

In Eq. 6, \(M\) is the number of simplified input, \(\left| {z^{\prime}} \right|\) the count of entries that are non-zero in \(z^{\prime}\), \(S\) is the set of non-zero indices in \(z^{\prime}\), \((z^{\prime}\backslash i)\) denote setting \(z_{i}^{0} = 0\), and \(E[f(z)\left| {z_{s} } \right.]\) is SHAP explanation.

The SHAP summary chart, shown in Fig. 11, ranks features according to their importance in identifying failure modes. As we can see, the model’s most critical component is the aspect ratio (\(M/Vl_{{\text{w}}}\)). This is most likely due to the relative involvement of shear and flexural deformations. Flexural deformations cause the majority of lateral deformations in slender walls. The contribution of shear deformations is notably higher for moderate-aspect-ratio walls and especially short walls due to the presence of load transmission systems (e.g., strut action). The effect of each feature on each output (type of failure mode) is also shown in different colors. As an example, the ratio of length to thickness of the wall (\(l_{{\text{w}}} /t_{{\text{w}}}\)) has a greater effect on the wall with flexural-shear failure mode. In other words, for the ratio of length to thickness of the wall, the mean (|SHAP|) value is about (0.16–0.09) = 0.07 in shear failure mode class, and (0.28–0.17) = 0.11 in flexural-shear failure mode class, which means that the feature \(l_{{\text{w}}} /t_{{\text{w}}}\) can influence predicting the flexural-shear failure mode more than the shear failure mode. The other features that mostly affect the detection of different types of failure modes are the boundary element area to area of cross-section ratio (\(A_{{\text{b}}} /A_{{\text{g}}}\)), and the vertical and horizontal boundary element reinforcing contribution (\(\rho_{{{\text{vb}}}}\)\(f_{{{\text{y}},{\text{vb}}}} /f_{{\text{c}}}\); \(\rho_{{{\text{hb}}}}\)\(f_{{{\text{y}},{\text{hb}}}} /f_{{\text{c}}}\)).

Fig. 11
figure 11

Failure mode—SHAP summary plot.

To visualize the impact of the characteristics on the decision scores associated with each class, a different type of summary plot is employed (Fig. 12). In Fig. 12, the attributes that have the greatest impact on the decision score for each class are located at the top and blue-colored or red-colored points represent low values or high values of the parameter, respectively. Except for the flexure–shear failure mode and shear failure mode class (Fig. 12b, c), the aspect ratio has the most impact on the model output. As the value of the aspect ratio increases, their impact also increases and the model is more likely to predict flexure failure class (Fig. 12a) which corresponds to a larger probability of walls yielding in flexure before reaching the nominal shear strength of the wall; consequently, flexural behavior dominates the inelastic response.

Fig. 12
figure 12

Summary plot for each class. a Flexural failure mode. b Flexure–shear failure mode. c Shear failure mode. d Sliding-shear failure mode.

On the other hand, the boundary element area to area of cross-section ratio is the next most important feature (Fig. 12a), and lower values of this feature correspond to a higher chance to predict flexure failure class. This observation is likely related to the wall flexural strength increasing when the boundary element area is augmented. As a result, if the lateral force related to flexural capacity is greater than the lateral force corresponding to shear capacity, shear failure may happen before flexural failure under lateral loading. In the case of the shear failure class (Fig. 12c), the boundary element area to area of cross-section ratio has the most impact on the model output. The impact of the aspect ratio is almost inverse to its effect in the flexure failure class (Fig. 12a). Low values of the feature increase the likelihood of shear failure. In the case of the flexure–shear failure mode class (Fig. 12b), although mostly influenced by the ratio of length to thickness of the wall, there is no clear correlation probably associated with the difficulty of assigning and identifying such failure mode. In the case of shear sliding failure mode, the aspect ratio feature also has the most impact. The model is more likely to anticipate shear sliding failure mode as the aspect ratio lowers (Fig. 12d) because the shear sliding strength tends to remain constant with wall height, but lateral load might increase with height reduction by preventing flexural failure. Regarding the least important factors, the SHAP values of axial load ratio and web reinforcing ratio in vertical/horizontal direction are almost close to zero for all failure types signifying that the axial load ratio and the web reinforcing ratio are the least important factors compared to other parameters. The effect of these factors on the failure mode identification cannot be interpreted in Fig. 12 since the dots are mixed and do not show the change in the SHAP value with the variation of the input features appropriately. In addition, the maximum SHAP value of the aspect ratio for flexural failure cases is higher than the other cases, which means that a small increase in the aspect ratio value increases the probability of flexural mode more than in other cases. For flexure–shear failure mode (Fig. 12b), the cross-sectional aspect ratio, defined as the ratio of length to thickness of the wall, is the most dominant parameter. A similar trend was reported by Lowes et. al. (ACI Committee, 2019) for the cross-sectional aspect ratio, where their study showed that walls with moderate to high shear stress demand and higher cross-sectional aspect ratio are susceptible to flexure–shear failures.

Fig. 13a shows the value of the aspect ratio on the x-axis and the SHAP value of it with respect to flexural failure mode on the y-axis by changing the reinforcing ratio in the vertical direction (\(\rho_{{{\text{vb}}}} f_{{{\text{y}},{\text{vb}}}} /f_{{\text{c}}}\)). The blue points represent lower values of \(\rho_{{{\text{vb}}}} f_{{{\text{y}},{\text{vb}}}} /f_{{\text{c}}}\). Blue dots are almost on the right-hand side of Fig. 13a, where values of aspect ratio are high. Hence, increasing the aspect ratio while the boundary element reinforcing ratio in the vertical direction is low is resulting in a higher chance of flexural failure mode. For Fig. 13b, despite some noise, SHAP values (with respect to shear failure) for low aspect ratio are above zero, which suggests that increasing the boundary element reinforcing ratio in the vertical direction (\(\rho_{{{\text{vb}}}} f_{{{\text{y}},{\text{vb}}}} /f_{{\text{c}}}\)), while the wall aspect ratio is low, increases the probability of shear failure mode which can be caused by the reinforcement. The use of boundary element reinforcement can help to increase the wall flexural strength, delay the beginning of bar buckling and enhance the inner concrete’s normal strain capacity.

Fig. 13
figure 13

SHAP dependency analysis. a Dependence plot for flexural failure class. b Dependence plot for shear failure class.

Comparisons with Design Code

Three types of failure for structural walls, namely flexural, shear and shear sliding failures can be categorized using ACI 318–19 design code (ACI Committee, 2019) and the concept of strength calculation. It is evident that if the shear strength of the RCSWs (Eq. 7) is lower than the shear force associated with flexural capacity, the failure occurs in shear mode. The ACI suggested a shear friction limit (Eq. 8), which is commonly used for shear sliding in walls. This equation is used as a guideline for when sliding shear takes over. In this paper, ACI 318–19 is utilized for calculating the shear and flexural capacity of RCSWs. The ACI suggested that a shear stress limit of \(0.66\sqrt {f^{\prime}_{{\text{c}}} }\) MPa be used as a guideline to prevent diagonal compression failure:

$$\begin{aligned} & V_{n} = A_{{{\text{cv}}}} (\alpha_{{\text{c}}} \sqrt {f^{\prime}_{{\text{c}}} } + \rho_{t} f_{y} ) \\ & {\text{if}}\;{\text{wall}}\;{\text{aspect}}\;{\text{ratio}} \le 1.5 \to \alpha_{{\text{c}}} = 0.25 \\ & {\text{if}}\;{\text{wall}}\;{\text{aspect}}\;{\text{ratio}} \ge 2.0 \to \alpha_{{\text{c}}} = 0.17, \\ \end{aligned}$$
$$\begin{aligned} &V_{n} = \mu A_{{{\text{vft}}}} f_{{\text{y}}} < V_{\max }^{n} ,\mu = 1.4 \\ & V_{\max }^{n} = \min (0.2f^{\prime}_{{\text{c}}} A_{{\text{c}}} ,(3.3 + 0.08f^{\prime}_{{\text{c}}} )A_{{\text{c}}} ,11A_{{\text{c}}} ), \\ \end{aligned}$$

where \(V_{{\text{n}}}\) is nominal shear strength, \(A_{{{\text{cv}}}}\) is the gross area of the section, \(f^{\prime}_{{\text{c}}}\) is concrete strength, \(\rho_{{\text{t}}}\) is the transverse reinforcement ratio, \(f_{{\text{y}}}\) is the yield strength of the transverse reinforcement, \(\mu\) is coefficient of friction, and \(A_{{{\text{vf}}}}\) is the area of reinforcement crossing the assumed shear plane to resist shear, \(A_{{\text{c}}}\) is the area of concrete section resisting shear transfer. Fig. 14 shows the confusion matrix based on the code concept for the test set. It can be seen that by using ACI 318–19 the accuracy in failure mode prediction is almost 53.2% which is much lower than the WAE model’s accuracy (with 98.7%). In addition, code concepts cannot be used to identify flexure–shear failure mode. For this reason, walls with flexure–shear failure mode are either in the flexural failure mode group or in the shear sliding failure mode group.

Fig. 14
figure 14

Confusion matrix based on code concept.


Reinforced concrete structural walls (RCSWs) are often utilized as the major lateral force-resisting mechanism for residential and commercial low-to-high rise buildings in locations prone to severe magnitude earthquakes. Many analytical models and experimental studies have been carried out to investigate the nonlinear behavior of reinforced concrete structural walls, identifying failure modes of the RCSWs. In addition, there have been several studies that have investigated the failure mode of the RCSWs using machine learning methods. This study aimed to examine and determine the efficiency of the ensemble neural networks to predict the failure mechanism of the RCSWs. The strongest model for predicting the failure mode of the RCSWs is determined by evaluating ensemble deep neural network models: model averaging, weighted average, and integrated stacking. Ensemble models are based on 5 neural network sub-models, whose performance in terms of accuracy (R2 score) ranges between 0.81 and 0.84. The following is a summary of the primary conclusions:

  • The weighted average ensemble model outperforms the other ensemble neural network models, yielding an accuracy of 0.987, since it is capable of carrying forward the better sub-models with higher weights.

  • The performance of the weighted average ensemble model of this study is compared with well-known ensemble models (AdaBoost, XGBoost, LightGBM, and CatBoost) and traditional machine learning methods (Naïve Bayes, K-Nearest Neighbors, Decision Tree, and Random Forest). The weighted average ensemble model outperforms traditional well-known ensemble models in detecting the failure mode since it has the highest accuracy, precision, and recall among the other models.

  • Merging the estimations from multiple deep neural networks counters the variance of a single trained model. The outcomes are predictions that are less susceptible to the specific details of the training data, and selection of the training scheme and the process of finding the right combination of hyperparameter values.

  • A game theory-based technique is employed to explain the weighted average ensemble model's predictions. The results of this technique showed that the effective parameters in shear wall failure depend on the failure mechanism. But in all four types of failure modes, the aspect ratio of the wall was ranked either first or second.

  • Other features that mostly affect the detection of different types of failure modes are the boundary element area to area of cross-section ratio, the ratio of length to thickness of the wall, and the vertical and horizontal boundary element reinforcing contribution.

  • Comparison between the results of the weighted average ensemble model against internationally recognized building standard code (ACI 318–19 design code and the concept of strength calculation) shows that the machine learning model is more accurate in identifying the failure mechanism of the RCSWs. In addition, code concepts cannot be used to identify flexure–shear failure mode.

The results evidence the capability of ensemble models to improve the predictability capacity of failure modes in shear walls based on neural networks sub-models, also explaining consistently the impact of feature values in the failure mode occurrence.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.


  • Ahmadianfar, I., et al. (2021). RUN beyond the metaphor: An efficient optimization algorithm based on Runge Kutta method. Expert Systems with Applications, 181, 115079.

    Article  Google Scholar 

  • ACI Committee. (2019). Building code requirements for structural concrete (ACI 318–19). Building code requirements for structural concrete. American Concrete Institute.

    Google Scholar 

  • Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.

    MathSciNet  Google Scholar 

  • Barkhordari, M. S., & Tehranizadeh, M. (2021). Response estimation of reinforced concrete shear walls using artificial neural network and simulated annealing algorithm. Structures. Elsevier.

    Google Scholar 

  • Barkhordari, M. S., Tehranizadeh, M., & Scott, M. H. (2021). Numerical modelling strategy for predicting the response of reinforced concrete walls using Timoshenko theory. Magazine of Concrete Research, 73(19), 988–1010.

    Article  Google Scholar 

  • Bennis, F., & Bhattacharjya, R. K. (2020). Nature-inspired methods for metaheuristics optimization: Algorithms and applications in science and engineering (Vol. 16). Springer.

    Book  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Brownlee, J. (2018). Better deep learning: Train faster, reduce overfitting, and make better predictions. Machine Learning Mastery.

    Google Scholar 

  • Chen, X., et al. (2018). Prediction of shear strength for squat RC walls using a hybrid ANN–PSO model. Engineering with Computers, 34(2), 367–383.

    Article  Google Scholar 

  • Chollet, F. (2015). Keras.

  • Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2), 103–130.

    Article  Google Scholar 

  • Dozat, T. (2016). Incorporating nesterov momentum into adam.

  • Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2121–2159.

    MathSciNet  MATH  Google Scholar 

  • Feng, D.-C., et al. (2021). Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls. Journal of Structural Engineering, 147(11), 04021173.

    Article  Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning Springer series in statistics (Vol. 1). Springer.

    MATH  Google Scholar 

  • Gondia, A., Ezzeldin, M., & El-Dakhakhni, W. (2020). Mechanics-guided genetic programming expression for shear-strength prediction of squat reinforced concrete walls with boundary elements. Journal of Structural Engineering, 146(11), 04020223.

    Article  Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2017). Deep learning Adaptive computation and machine learning series (pp. 321–359). The MIT Press.

    MATH  Google Scholar 

  • Grammatikou, S., Biskinis, D., & Fardis, M. N. (2015). Strength, deformation capacity and failure modes of RC walls under cyclic loading. Bulletin of Earthquake Engineering, 13(11), 3277–3300.

    Article  Google Scholar 

  • Jaworski, M., Duda, P., & Rutkowski, L. (2017). New splitting criteria for decision trees in stationary data streams. IEEE Transactions on Neural Networks and Learning Systems, 29(6), 2516–2529.

    MathSciNet  Article  Google Scholar 

  • Karaboga, N., & Cetinkaya, B. (2004). Performance comparison of genetic and differential evolution algorithms for digital FIR filter design. In International conference on advances in information systems. Springer.

  • Keshtegar, B., et al. (2021a). Novel hybrid machine leaning model for predicting shear strength of reinforced concrete shear walls. Engineering with Computers.

    Article  Google Scholar 

  • Keshtegar, B., et al. (2021b). Predicting load capacity of shear walls using SVR–RSM model. Applied Soft Computing, 112, 107739.

    Article  Google Scholar 

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint retrieved from arXiv:1412.6980

  • Krogh, A., & Hertz, J. A. (1992). A simple weight decay can improve generalization. Advances in neural information processing systems. MIT Press.

    Google Scholar 

  • Lee, Y., Oh S.- H., & Kim, M. W. (1991). The effect of initial weights on premature saturation in back-propagation learning. In IJCNN-91-Seattle international joint conference on neural networks. IEEE.

  • Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems.

  • Maclin, R., & Opitz, D. (2011). Popular ensemble methods: An empirical study. ArXiv e-prints retrieved from arXiv:1106.0257

  • Mangalathu, S., et al. (2020). Data-driven machine-learning-based seismic failure mode identification of reinforced concrete shear walls. Engineering Structures, 208, 110331.

    Article  Google Scholar 

  • Massone, L. M., López, C. N., & Kolozvari, K. (2021). Formulation of an efficient shear-flexure interaction model for planar reinforced concrete walls. Engineering Structures, 243, 112680.

    Article  Google Scholar 

  • McMahan, H. B., et al. (2013). Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining.

  • Moradi, M. J., et al. (2020). Prediction of the load-bearing behavior of spsw with rectangular opening by RBF network. Applied Sciences, 10(3), 1185.

    Article  Google Scholar 

  • Naimi, A. I., & Balzer, L. B. (2018). Stacked generalization: An introduction to super learning. European Journal of Epidemiology, 33(5), 459–464.

    Article  Google Scholar 

  • Nguyen, D.-D., et al. (2021). A machine learning-based formulation for predicting shear capacity of squat flanged RC walls. Structures. Elsevier.

    Google Scholar 

  • Osisanwo, F., et al. (2017). Supervised machine learning algorithms: Classification and comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128–138.

    Article  Google Scholar 

  • Parsa, P., & Naderpour, H. (2021). Shear strength estimation of reinforced concrete walls using support vector regression improved by teaching–learning-based optimization, Particle Swarm optimization, and Harris Hawks optimization algorithms. Journal of Building Engineering, 44, 102593.

    Article  Google Scholar 

  • Pizarro, P. N., et al. (2021). Use of convolutional networks in the conceptual structural design of shear wall buildings layout. Engineering Structures, 239, 112311.

    Article  Google Scholar 

  • Pizarro, P. N., & Massone, L. M. (2021). Structural design of reinforced concrete buildings based on deep neural networks. Engineering Structures, 241, 112377.

    Article  Google Scholar 

  • Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. Machine learning (pp. 463–482). Springer.

    Google Scholar 

  • Usta, M., et al. (2017). ACI 445B shear wall database.

  • Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. arXiv preprint retrieved from arXiv:1212.5701

Download references


Not applicable.


Not applicable.

Author information

Authors and Affiliations



MSB: formal analysis, investigation, resources, data curation, and writing—original draft. LMM: conceptualization, methodology, and writing—review. Both authors read and approved the final manuscript.

Authors’ information

(1) Mohammad Sadegh Barkhordari, Ph.D., Department of Civil and Environmental Engineering, Amirkabir University of Technology, Tehran, Iran.

(2) Leonardo M. Massone, Professor, Department of Civil Engineering, University of Chile, Blanco Encalada 2002, Santiago, Chile.

Corresponding author

Correspondence to Leonardo M. Massone.

Ethics declarations

Consent for publication

Authors give consent for the publication of identifiable details, which can include photograph(s) and/or videos and/or text and/or details within the text (“Material”) to be published in the International Journal of Concrete Structures and Materials.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Journal information: ISSN 1976-0485 / eISSN 2234-1315

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barkhordari, M.S., Massone, L.M. Failure Mode Detection of Reinforced Concrete Shear Walls Using Ensemble Deep Neural Networks. Int J Concr Struct Mater 16, 33 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • deep neural network
  • failure mode
  • shear wall
  • classification