Skip to main content

Improved Meta-learning Neural Network for the Prediction of the Historical Reinforced Concrete Bond–Slip Model Using Few Test Specimens

Abstract

The bond–slip model plays an important role in the structural analysis of reinforced concrete structures. However, many factors affect the bond–slip behavior, which means that a large number of tests are required to establish an accurate bond–slip model. This paper aims to establish a data-driven method for the prediction of the bond–slip model of historical reinforced concrete with few test specimens and many features. Therefore, a new Mahalanobis-Meta-learning Net algorithm was proposed, which can be used to solve the implicit regression problem in few-shot learning. Compared with the existing algorithms, the Mahalanobis-Meta-learning Net achieves fast convergence, accurate prediction and good generalization without performing a large number of tests. The algorithm was applied to the prediction task of the bond–slip model of square rebar-reinforced concrete. First, the first large pretraining database for the bond–slip model, BondSlipNet, was established containing 558 samples from the existing literature. The BondSlipNet database can be used to provide a priori knowledge for learning. Then, another database, named SRRC-Net, was obtained by 16 groups of pull-out tests with square rebar. The SRRC-Net database can be used to provide the posteriori knowledge. Finally, based on the databases, the algorithm not only successfully predicted the bond–slip model of square rebar-reinforced concrete, but also that of the other 23 types of reinforced concrete. The research results can provide a scientific basis for the conservation of square rebar-reinforced concrete structures and can contribute to the bond–slip model prediction of the other types of reinforced concrete structures.

Introduction

The accuracy of the bond–slip constitutive model between the rebar and concrete influences the reliability of the calculation results of the bearing capacity (Nabilah et al., 2020; Wang et al., 2021), regardless of whether theoretical derivation or finite element numerical simulation is applied. However, many new types of concrete or rebar still result in new bond–slip behavior, which needs to be studied. For example, square section rebar and concrete with a special mix ratio were found in historical buildings in China (built in approximately 1910s–1950s). This type of square rebar-reinforced concrete is abbreviated as SRRC hereafter. As shown in Fig. 1, SRRC concrete has unique proportions and different material configuration requirements. At the same time, SRRC rebar was mainly ribbed square rebar, which is very different from the current deformed rebar, thereby determining a unique bond–slip model (Zhang et al., 2021a, 2021b). Therefore, the prediction of the bond–slip model between square rebar and SRRC concrete is a new prediction task.

Fig. 1
figure 1

Square rebar and concrete.

In recent years, many meaningful studies have been performed on the bond–slip problem. The smooth surface caused by epoxy coating or zinc coating will reduce the bonding performance. In addition, sand-coating treatment on the rebar surfaces will improve bonding performance (Islam et al., 2020; Pauletta et al., 2020; Rabi et al., 2020; Wang et al., 2020; Zhou & Qiao, 2018). Some studies show that the types of concrete, such as green concrete, high-strength concrete, fine aggregate concrete, and recycled aggregate concrete, also exert a certain effect on the bond–slip performance (Liu et al., 2020; Nguyen et al., 2020; Paswan et al., 2020; Wang, 2016; Wardeh et al., 2017). According to the literature (Fu et al., 2021; Hou et al., 2018; Yang et al., 2015; Zhou et al., 2020a, 2020b), the corrosion of rebar exerts a very complex effect on the bond–slip properties. On the one hand, the iron rust attached to the surface of the rebar will increase the friction between the rebar and the concrete, but on the other hand, the uneven volume expansion will cause the concrete near the rebar to crack, reducing the grip force between the concrete and the rebar. In addition, the rib shape, spacing and projection area of rebar (Cai et al., 2020; Li et al., 2019; Metelli & Plizzari, 2014; Prince & Singh, 2015), the cover thickness (Rockson et al., 2020), the diameter and spacing of the stirrups (Koulouris & Apostolopoulos, 2020), the size of specimens (Zhang et al., 2020a, 2020b), and the loading modes (pull-out, beam-type drawing or beam-type bending) (Kaffetzakis & Papanicolaou, 2016; Rockson et al., 2020) all affect the bond–slip model.

Thus, the bond–slip model between rebar and concrete is very complex, and scholars have made valid attempts to determine the bond–slip model. Based on the elastic mechanical solution for a thick-walled cylinder under constant pressure, the analytical solution was derived for the key point coordinates of the bond–slip curve (Gao et al., 2019). However, as shown in Fig. 1, the SRRC rebar has a square section, which does not validate the hypothesis in the derivation process, and the results of SRRC rebar contain complex variable functions, which are not convenient for engineering applications. The application of two-dimensional and three-dimensional finite element methods for bond–slip models was proposed (Biscaia & Soares, 2020; Liu et al., 2020). However, the premise of finite element calculation depends on accurate prior knowledge of the bond–slip behavior. Since the friction coefficient and chemical interactions between rebar and concrete are difficult to accurately test, it is challenging to use the finite element method to directly predict the bond–slip model. A unified formula was constructed for a bond–slip model by collecting a large amount of test data in the literature (Wu & Zhao, 2013). However, the present results only have a good confidence level for the bond–slip model between ordinary concrete and deformed rebar. At the same time, the model expression (polynomial, exponential expression, or logarithmic expression) of bond–slip needs to be determined according to the experience of researchers. Therefore, if the prediction task changes, the regression parameters and regression function expressions in the new and old tasks may be difficult to balance.

Therefore, data-driven method can be used to solve the complex problem of prediction (Cai et al., 2019; Jang et al., 2019; Kakavand et al., 2021; Vanneschi et al., 2018). A deep neural network (DNN) method was proposed to predict the bond strength between fiber reinforced polymers (FRPs) and concrete, thereby obtaining good results (Naderpour et al., 2019; Zhou et al., 2020a, 2020b). However, the bond–slip relationship between rebar and concrete is a new research topic. Moreover, only the ultimate bond strength was predicted, and the other characteristic points of the rising stage, descent stage and residual stage were not predicted, which means that the prediction task in this paper contains more features of the output. In fact, the existing neural network technology for predicting bond strength often uses the traditional DNN framework. The results of this training method depend on the characteristic distribution of the dataset and the size of the dataset. It is difficult to use the existing network for transfer learning with a small amount of test data for new types of concrete or rebar. In contrast, a large number of tests must be performed to study the strength of concrete, specimen size, stirrup arrangement, cover thickness, rib shape, proportion of concrete and other factors and then train a new DNN for learning, which consumes considerable resources and time. In addition, a large number of square rebar samples are difficult to obtain for tests to build a large neural network training database because SRRC reinforced concrete buildings are mainly cultural relic protection units. Fortunately, a large number of tests have been conducted on bond–slip between other types of rebar and concrete, which provide a considerable degree of prior knowledge. Moreover, the bond behavior of the steel–concrete composite structure is very complex and has similar characteristics to rebar-reinforced concrete (Wang et al., 2019), which contains a sufficient number of features for training and provides prior knowledge for the prediction task. Although different prediction tasks will lead to differences in the final bond–slip model, some basic knowledge about the model can be shared, which provides the basis for the Mahalanobis-Meta-learning Net (MMN).

In summary, to solve the task of using a small amount of test data to predict the bond–slip model, such as SRRC concrete reinforced with square rebar, the following method was proposed in this paper. First, in Sect. 2, the databases used for training were established. Based on the test results from 36 studies, a database named BondSlipNet was established by determining the main factors affecting the bond–slip model between rebar and concrete. This database can be used to provide prior knowledge through the so-called pretraining process. Then, the pull-out test of the SRRC reinforced concrete was designed to obtain the bond–slip curve data, which can be used to establish the database for the SRRC task named SRRC-Net. This database is different from the BondSlipNet database due to its few samples, and it is used to provide posterior knowledge through the so-called fine-tuning process. Second, in Sect. 3, based on the improvement in the model-agnostic meta-learning (MAML) algorithm (Finn et al., 2017), the MMN algorithm was established and used to solve the prediction task of the SRRC bond–slip model. It should be noted that although the prediction task of the SRRC bond–slip model was used as an example, the MMN algorithm and the BondSlipNet database can be used to solve other prediction tasks with few test specimens and many features, which is discussed in Sect. 4. Finally, the MMN algorithm was compared with other algorithms to verify the prediction ability and generalization level of the network.

Dataset Preparation

BondSlipNet Database for Prior Knowledge

In this paper, based on 36 references (Biscaia & Soares, 2020; Cai et al., 2020; Coccia et al., 2016; Fu et al., 2021; Hou et al., 2018; Islam et al., 2020; Kaffetzakis & Papanicolaou, 2016; Khaksefidi et al., 2021; Koulouris & Apostolopoulos, 2020; Leibovich et al., 2020; Li et al., 2019, 2020; Liu et al., 2020; Metelli & Plizzari, 2014; Mo & Chan, 1996; Nguyen et al., 2020; Paswan et al., 2020; Pauletta et al., 2020; Prince & Singh, 2013, 2015; Qi et al., 2020; Rabi et al., 2020; Rafi, 2019; Rockson et al., 2020; Wang, 2016; Wang et al., 2018, 2020; Wardeh et al., 2017; Xiao & Falkner, 2007; Xing et al., 2015; Yang et al., 2015; Yeih et al., 1997; Zhang et al., 2020a, 2020b, 2021a, 2021b; Zhou & Qiao, 2018; Zhou et al., 2020a, 2020b) with 2039 experimental tests, a large bond–slip database containing 558 samples (after averaging repeated groups), BondSlipNet, is labeled and established. The slip value was measured using an extensometer, and the bond stress was determined by rebar strain gauge or drawing force. Limited by the manuscript length, 2000 tests were collected from 36 articles, and after considering the repetitive samples, 558 samples were ultimately included. These samples mainly focus on the pull-out test and also cover a small amount of beam pull-out and bending tests. In fact, many other meaningful research results that can provide prior knowledge, such as lap splice tests, are still available. These valuable results remain to be added to the dataset in future studies. However, the main innovation of this paper is the method used to construct large databases and the task-based MMN algorithm based on existing databases. Algorithms and database construction methods can be easily applied to new data for test results, such as FRP rod-reinforced concrete and lap splice tests.

The following assumptions were used to simplify the labeling process:

(1) If the labeled parameters were not specified in the original literature, they could be determined according to other relevant literature (Béton 1990; Gao et al., 2019; CAoB 2010).

(2) Since the coefficient of friction between rebar and concrete is difficult to be determined quantitatively, only qualitative sorting was used according to the surface roughness to distinguish sand-coating surfaces, rusted surfaces, ordinary surfaces, epoxy-coated surfaces, zinc-coated surfaces, etc., during the labeling process.

(3) If the concrete batching contained substitute materials, the substitute materials were removed from the proportion of concrete, which means that only the proportion of ordinary batching was calculated to distinguish concrete with special batching.

(4) If the rebar yields or the concrete breaks, the descending stage is not tracked in most literature. Therefore, it is assumed that the bond stress reaches 0 after a very small amount of relative slip, and at this time, the remaining characteristic points of the descending stage and the residual stage are linearly interpolated.

(5) Since the Combin 39 element used to simulate the bond–slip characteristics in ANSYS only supports 20 feature points, the output curve feature points in this database are selected as 20. Ten of the feature points with a uniform abscissa distribution were used to describe the behavior of the ascent stage, and the other ten feature points were used for the descent stage.

(6) Since task-based MMN learning is performed in the following section, BondslipNet distinguishes different prediction tasks based on five subitems. The subitems include different concrete types, rebar types, specimen cross-sectional shapes, loading modes, and stirrup arrangement modes. When dividing the samples into different tasks, there may be some samples that belong to more than two tasks. In this case, these samples will be divided equally to ensure that each task will not have too few samples.

Then, the samples in BondSlipNet were divided into 23 prediction tasks. Among these 23 prediction tasks, 2 tasks will be selected randomly in Sect. 4 to construct the test set together with the SRRC tasks, which will not be fed to the network until the test process. More details about the method for dividing the training set and test set will be discussed in Sect. 4. In the following text, the above different classifications are abbreviated in Table 1. For example, specimen #6 was labeled AcirBplaCnorDnstiEnor, which means that the specimen is normal concrete with a circular section and normal rebar without stirrups.

Table 1 Letter abbreviation used in the prediction task classification.

The BondSlipNet database contains three sets of data: data X (features), data Y-S (slip), and data Y–T (bond stress). Among them, X, as the resource of the input layer of the subsequent network, contains a total of 558 samples, and each sample has 24 features, as shown in Table 2 (24 features are from the row of Water to the row of Coefficient of friction). Y-S and Y–T are the resources of the output layers of the subsequent network, each containing 558 samples (namely, curves of bond–slip) corresponding to X. In general, each sample of the output layer contains 40 features (20 points for bond stress Y–T and 20 points for slip Y-S), which are the point coordinates in the curves. It should be noted since the training process of the DNN is based on the samples and that of the MMN is based on the tasks, the resource of the input/output layers means the training/test set was established indirectly or directly based on the resource. Y-S and Y–T are plotted as shown in Fig. 2.

Table 2 Data X in the BondSlipNet database.
Fig. 2
figure 2

Bond–slip curves in BondSlipNet database output layer.

SRRC-Net Database for Posterior Knowledge

Although the BondSlipNet database has been established, the tests described below were designed using SRRC square rebar to obtain the distribution characteristics of the SRRC reinforced concrete bond–slip model (Zhang et al., 2021a, 2021b). Sixteen groups of pull-out specimens were prepared considering 16 different conditions, such as different batches of concrete, different rebar section sizes, different cover thicknesses, different stirrup ratios, and different degrees of corrosion. Each group contains duplicate test specimens. The 16 groups of test details were labeled according to the format of data X in BondSlipNet, as an example shown in Table 3. A 1000 kN hydraulic testing machine was used as the device to apply the pull-out force. The test specimens and loading device are shown in Fig. 3.

Table 3 Data X in the SRRC-Net database.
Fig. 3
figure 3

Loading device and specimen detail diagram.

At displacement test points 1 and 2, the displacement relative to the fixed end (Sa, Sb) is measured using a laser displacement meter. At displacement test point 3, the displacement of the fixed end (Sc) is measured using a laser extensometer. The pull-out force P is measured by a load sensor on the machine and applied by the oil pressure control method at a speed of 2 mm/min. The bond–slip curves of the 16 groups of specimens are shown in Fig. 4, and the key point details can be found in the Additional file 2.

Fig. 4
figure 4

Bond–slip curves in SRRC-Net database output layer.

Figs. 2 and Fig. 4 show that since the section, rib form and concrete ingredients of the SRRC specimens were special, the bond strength, peak slip value, characteristic point coordinates and overall trend of the final bond–slip curves were different from those of modern reinforced concrete.

Now, the databases (BondSlipNet and SRRC-Net) for the MMN algorithm in this paper have been basically completed. The specific algorithm of the MMN will be introduced in the next section.

Training Network Construction

MAML Model Algorithm

For many years, the training sample size has been an important issue that cannot be avoided when using neural networks for prediction. The prediction ability of a neural network based on a database with a small sample size is unable to easily reach an ideal accuracy. However, in the big data era, there is almost no unique task in most of the existing prediction tasks that cannot find any similar tasks to provide certain prior knowledge. Taking the bond–slip model prediction of SRRC reinforced concrete as an example, although only a few tests have been performed before, the bond–slip performance between other types of rebar and concrete has been extensively studied. In fact, the function expression form of the bond–slip curve or the range of parameters of the expression can be easily changed with various influencing factors. However, some properties have common characteristics, such as the stronger the tensile strength of concrete is, the larger the maximum bond stress will be under the premise of no rebar yield and rib failure. Alternatively, the existence of stirrups will improve the ductility of the concrete and increase the residual bond stress. These common characteristics provide a good basis of knowledge for neural network learning. Therefore, when the sample size of the target prediction task is small, it is recommended to conduct prior knowledge learning on a large dataset with similar task objectives and then transfer it to the target task. In the field of deep learning, this problem is called few-shot learning, and the MAML algorithm has been proposed in the literature (Finn et al., 2017) to solve this problem.

The traditional DNN model is shown in Fig. 5, which can be described as follows and can be realized by Box 1.

Fig. 5
figure 5

Traditional model framework of DNN.

figure a

For a certain distribution \(P(X,Y,\theta )\), X is the input sample, and Y is the output result. The dimensions of X are m × nx, where m is the number of samples and nx is the number of features of each sample. The dimensions of Y are m × ny, where ny is the number of features of each output, and \(\theta\) is the parameters of the neural network (containing weight, bias and other learning parameters). For each different task, \(\theta\) starts from random initialization, and then the predicted value \(\hat{y}\left( {X,\theta } \right)\) of each iteration step will be obtained after forward propagation. Generally, let the loss function of the regression problem be \(L = \frac{1}{m}\sum\nolimits_{i = 1}^{m} {(y - \hat{y}\left( {X,\theta } \right))^{2} }\) (measuring the error between the predicted value and the labeled value). Finally, the parameters are updated by calculating \(\theta^{\prime} = \theta - \alpha \nabla_{\theta } L\left( {X,Y,\theta } \right)\) until the most suitable \(\theta\) is found to describe the corresponding relationship between X and Y, where \(\nabla_{\theta }\) represents the gradient vector to \(\theta\), and \(\alpha\) represents the learning rate. However, in many engineering problems, the gradient descent process of neural networks is not the optimization process of smooth convex functions. If the number of samples is small and the distribution of new and old tasks is different, the network will have difficulty escaping from the saddle point or prematurely entering the local optimum, thereby obtaining lower prediction accuracy.

The traditional DNN training objects are the samples, namely, X itself. The goal of the MAML algorithm is to ‘learn how to learn’, and the training objects are tasks, that is, \(\theta\). In other words, the DNN framework training object is each sample point in BondSlipNet, and the MAML framework training object is first the tasks divided from BondSlipNet, and then is the sample in the specific prediction task. Therefore, the MAML algorithm can be divided into two parts. The first part is meta-learning, which uses large databases to train with different tasks and finally obtains \(\theta^{ * }\) with good generalization performance. Then, \(\theta^{ * }\) is used for a small sample database, and after a small amount of gradient update, the final model is obtained, which is the second part called the fine-tuning process.

The MAML algorithm is shown in Fig. 6, which can be described as follows and can be realized as Box 2.

Fig. 6
figure 6

Framework of MAML algorithm.

figure b

The global task can be divided into B batches, and one batch will be extracted for updating each time. Assuming that the number of tasks in a batch is mB, the global model parameter is initialized to \(\theta\), and the training set R and test set R' are extracted from the mtth task. First, the training set R is trained, and the model parameter is updated by the gradient. Then, for the R' set, the loss is calculated, and the model parameter is updated. The loss function of the mtth task is set to \(l_{{m_{t} }}\), and the parameters for mB training tasks in this batch are initialized to \(\theta\). The parameters of the mtth task will be changed to \(\theta_{{m_{t} }}^{i}\) using iterative Eq. (1) after i times of updates:

$$\theta_{{m_{t} }}^{i} = \theta_{{m_{t} }}^{i - 1} - \alpha \nabla_{{\theta_{{m_{t} }}^{i - 1} }} l_{{m_{t} }} (\theta_{{m_{t} }}^{i - 1} ),$$
(1)

\(\alpha\) is the learning rate for each task.

After the training tasks in each batch are established, the loss function of the global model is set to \(L\left( \theta \right)\), then:

$$L\left( \theta \right) = \sum\limits_{{m_{t} = 1}}^{{m_{B} }} {l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)} = \sum\limits_{{m_{t} = 1}}^{{m_{B} }} {l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i - 1} - \alpha \nabla_{{\theta_{{m_{t} }}^{i - 1} }} l_{{m_{t} }} (\theta_{{m_{t} }}^{i - 1} )} \right)} .$$
(2)

\(L\left( \theta \right)\) is the functional of \(\theta\), where \(m_{B}\) represents the number of tasks processed in each batch. The global parameter \(\theta\) can be updated to \(\theta^{^{\prime}}\) using Eq. (3):

$$\theta^{\prime} = \theta - \beta \nabla_{\theta } L\left( \theta \right) = \theta - \beta \nabla_{\theta } \sum\limits_{{m_{t} = 1}}^{{m_{B} }} {l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)} = \theta - \beta \sum\limits_{{m_{t} = 1}}^{{m_{B} }} {\nabla_{\theta } l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)} .$$
(3)

Note that both \(\theta \left( {w_{j} } \right),j = 1,2,3,...,k\) and \(\theta_{{m_{t} }}^{i} \left( {w_{{l \cdot m_{t} }}^{i} } \right),l = 1,2,3, \ldots ,k\) contain k parameters, and the second item of Eq. (3) is written as Eq. (4):

$$\nabla_{\theta } l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right) = \left[ {\begin{array}{*{20}c} {\partial l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)/\partial w_{1} } \\ \vdots \\ {\partial l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)/\partial w_{j} } \\ \vdots \\ \end{array} } \right] = \left[ {\begin{array}{*{20}l} {\sum\limits_{l = 1}^{k} {\frac{{\partial l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)}}{{\partial w_{{l \cdot m_{t} }}^{i} }}\frac{{\partial w_{{l \cdot m_{t} }}^{i} }}{{\partial w_{1} }}} } \\ \vdots \\ {\sum\limits_{l = 1}^{k} {\frac{{\partial l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)}}{{\partial w_{{l \cdot m_{t} }}^{i} }}\frac{{\partial w_{{l \cdot m_{t} }}^{i} }}{{\partial w_{j} }}} } \\ \vdots \\ \end{array} } \right],$$
(4)

\(w\) is the component of array \(\theta\), which refers to a single value of weight or bias. Similarly, Eq. (1) can be rewritten as Eq. (5):

$$\begin{array}{*{20}l} {w_{{l \cdot m_{t} }}^{1} = w_{l} - \alpha \frac{{\partial \left( {l_{{m_{t} }} (\theta )} \right)}}{{\partial w_{l} }}} \\ \vdots \\ {w_{{l \cdot m_{t} }}^{i} = w_{{l \cdot m_{t} }}^{i - 1} - \alpha \frac{{\partial \left( {l_{{m_{t} }} (\theta_{{m_{t} }}^{i - 1} )} \right)}}{{\partial w_{{l \cdot m_{t} }}^{i - 1} }}} \\ \vdots \\ \end{array} .$$
(5)

Combining Eqs. (4), (5), \(\frac{{\partial w_{l \cdot m_{t}}^{i}}}{{\partial w_{j} }} = \frac{{\partial w_{{_{l} }} }}{{\partial w_{j} }} - \alpha \left( {\frac{{\partial^{2} l_{{m_{t} }} (\theta )}}{{\partial w_{{_{l} }} \partial w_{j} }} + \sum\limits_{s = 1}^{i - 1} {\frac{{\partial^{2} l_{{m_{t} }} (\theta_{{m_{t} }}^{s} )}}{{\partial w_{l \cdot m_{t}}^{s}\partial w_{j} }}} } \right)\) can be obtained. Then, each component \(w^{\prime}_{j}\) of \(\theta^{\prime}\left( {w^{\prime}_{j} } \right),j = 1,2,3,...,k\) in Eq. (3) can be simplified as Eq. (6):

$$w^{\prime}_{j} = w_{j} - \beta \sum\limits_{{m_{t} = 1}}^{{m_{B} }} {\sum\limits_{l = 1}^{k} {\frac{{\partial l_{{m_{t} }} \left( {\theta_{{m_{t} }}^{i} } \right)}}{{\partial w_{{l \cdot m_{t} }}^{i} }}\left( {\frac{{\partial w_{{_{l} }} }}{{\partial w_{j} }} - \alpha \left( {\frac{{\partial^{2} l_{{m_{t} }} (\theta )}}{{\partial w_{l} \partial w_{j} }} + \sum\limits_{s = 1}^{i - 1} {\frac{{\partial^{2} l_{{m_{t} }} (\theta_{{m_{t} }}^{s} )}}{{\partial w_{{l \cdot m_{t} }}^{s} \partial w_{j} }}} } \right)} \right)} } ,$$
(6)

It should be noted that the MAML has made two simplifications in deriving Eq. (6), that is, only one gradient update is conducted in meta-learning, and the role of the Hessian is ignored. Through mathematical derivation and computational experimental analysis, it can be concluded that for implicit learning tasks, these two simplifications may reduce the accuracy of the training network, and the specific explanations will be given in Sect. 3.3.2 and Sect. 4.3.

At this point, the training of one batch in the MAML algorithm is completed, and then the aforementioned process is repeated to train and update the remaining batches until all batches are input into the network to complete the training, which is regarded as the completion of one epoch. It should be noted that in each epoch, the selection of task sets is random, which has a similar effect as cross-validation. After several epochs, the meta-learning process is completed. Based on the obtained optimal initialization parameters, traditional DNN training is performed here for the specific tasks using a database with a small sample size. Then, the parameters of the shallow layer in the DNN are frozen, and only the parameters in the subsequent layer are trained, after which the fine-tuning process is finally completed. The shallow layer represents the first several layers, and the subsequent layer represents the last several layers. In this study, the parameters in the shallow layer indicate those between the Input layer, the Hidden layer 1 and Hidden layer 2. The parameters in the subsequent layer indicate those between the Hidden layer 2, Hidden layer 3 and Output layer.

Modification of the MAML–MMN

The original MAML algorithm was used to train the regression problem with the sine function as an example, which obtained good results. However, for the vast majority of engineering problems, an explicit relationship is often unavailable between the input and output of data. The training of the network is not only aimed at obtaining the regression of several parameters but also needs to determine the number of parameters and even the function itself or the implicit relationship between the data. At the same time, many regression problems in engineering have larger input feature dimensions, output feature dimensions and smaller sample sizes (compared with those of image classification problems). Therefore, many problems arise when directly using the MAML algorithm to predict the bond–slip model of reinforced concrete (more details are provided in Sect. 4).

In this paper, the MMN algorithm is created according to the particularity of the implicit regression problem of the bond–slip model, as shown in Fig. 7.

Fig. 7
figure 7

Framework of MMN algorithm.

Compared with the MAML algorithm, the following improvements are incorporated in the MMN algorithm.

(1) For the expression of the loss function, the single-layer perceptron is used to modify the Mahalanobis distance loss, replacing the mean square error (MSE) loss.

(2) Multiple gradient updating is considered in meta-learning.

(3) The overall framework is changed into a multitask learning framework. The output task is divided into two tasks, namely, the prediction of the slip stage curve and the prediction of the failure stage curve, which use joint learning (Sun et al., 2020). The multitask learning framework plays a dimension reduction role in learning tasks. Furthermore, when using a multitask learning framework for joint training, the different tasks establish linkages between the minimum value through shared parameter constraints. Thus, the multitask learning framework has a parameter sharing mechanism, which improves the generalization of the network and reduces the risk of overfitting. Dropout (Srivastava et al., 2014) and L2 regularization technology (Rahaman et al., 2018) were added to improve the generalization level of the network, and gradient clip (Zhang et al., 2020a, 2020b) was added to avoid gradient explosion. BN normalization is changed to FRN normalization, which also prevent the model from overfitting (Singh & Krishnan, 2020).

The MMN algorithm can be realized using the Box 3. Mathematical derivation will be performed in Sect. 3.3 to analyze its improvement significance. Among them, the improvements in (3) have already been explained by many studies. Therefore, the next section mainly focuses on the explanation of improvement points (1)–(2).

figure c

Mathematical Explanation of the MMN

Modified Mahalanobis Distance Loss

Let the output results of the MMN equal \(\hat{y}_{a \times b}\), where a represents the sample size and b represents the output feature dimension. Assuming that the number of predicted feature points in this task is m, then b = 2 × m. \(\hat{y}_{a \times b}\) is rewritten as matrix \(\hat{y}_{{2 \times \frac{ab}{2}}}\) with 2 rows and ab/2 columns. Then, the first row of \(\hat{y}_{{2 \times \frac{ab}{2}}}\) is the abscissa (slip Si) of the predicted points, and the second row is the ordinate (bond stress Ti). Similarly, the labels in the dataset are set as \(y_{{2 \times \frac{ab}{2}}}\).

The Mahalanobis distance loss between the output and the label item can be calculated by Eq. (7):

$$l_{{m_{t} }} \left( {y_{{2 \times \frac{ab}{2}}} ,\hat{y}_{{2 \times \frac{ab}{2}}} } \right) = \overline{tr} \left[ {\left( {\hat{y}_{{2 \times \frac{ab}{2}}} - y_{{2 \times \frac{ab}{2}}} } \right)^{T} C_{{}}^{{ - 1}} \left( {\hat{y}_{{2 \times \frac{ab}{2}}} - y_{{2 \times \frac{ab}{2}}} } \right)} \right],$$
(7)

\(\overline{tr}\) is defined as an operator for finding the mean value after taking diagonal elements and \(C_{{}}^{{ - 1}}\) is the inverse matrix of the covariance matrix of matrix \(Y_{2 \times ab} = \left[ {\hat{y}_{{2 \times \frac{ab}{2}}} ,y_{{2 \times \frac{ab}{2}}} } \right]\). Equation (7) shows that the Mahalanobis distance is equivalent to the following progress. First, principal component analysis (PCA) of the data points in the sample space will be performed. Then, the sample space is rotated according to the principal component so that the dimensions are independent of each other. Finally, the distance between the sample points can be obtained by standardization.

Only when the covariance matrix is a unit matrix, that is, when each dimension is independent and identically distributed, the Mahalanobis distance degenerates to the Euclidean distance. Thus, it can be seen that the Euclidean distance treats the relationship between the dimensions of the sample ‘fairly’, ignoring the different distribution characteristics between various dimensions. For the implicit regression problem of the bond–slip model, the mapping between the input features and the output coordinate points has multiple expressions or can be regarded as an implicit relationship. The horizontal and vertical coordinates of the output coordinate point have different practical significance and exhibit different distribution characteristics. For example, it is assumed that there is such a set of data points as shown in Fig. 8 that satisfy the distribution P (x, y) of the predicted bond–slip model. Among them, the coordinate of point A is (2.5 mm, 5.8 MPa), and the initial output of that point is A1 (5.5 mm, 6.8 MPa), and the second output is A2 (3.5 mm, 2.8 MPa), both of which have the same Euclidean distance between points A. However, it is obvious that A2 is more likely to meet the distribution P (x, y).

Fig. 8
figure 8

Disadvantages of Euclidean distance.

Equation (7) shows that the Mahalanobis distance between two points is independent of the measurement unit, which can eliminate the interference of correlation between variables. However, the Mahalanobis distance can be easily affected by outlier samples, thus sacrificing the overall accuracy.

Therefore, the single-layer perceptron was introduced to modify the matrix \(C_{{}}^{{ - 1}}\), which is one type of neutral network with only one layer. In essence, to weaken the effect of the outliers, the modified Mahalanobis distance using single-layer perceptron enables the element values of covariance matrix of the original Mahalanobis distance to participate in the training process. Since the number of outliers is generally small, and the covariance matrix of Mahalanobis distance is a 2 × 2 matrix, the learning ability of single-layer perceptron is sufficient for this task. Therefore, the modified Mahalanobis distance obtained by single-layer perceptron has strong anti-noise ability. At the same time, the complexity of the model will not be increased substantially due to the small number of parameters introduced by the single-layer perceptron, which is convenient to ensure the training speed and generalization ability. Let \(C_{{}}^{{ - 1}} = \left[ {\begin{array}{*{20}c} {c_{11} } & {c_{12} } \\ {c_{21} } & {c_{22} } \\ \end{array} } \right]\), \(C^{\prime - 1} = \left[ {\begin{array}{*{20}ll} {\eta \tan h\left( {\omega_{11} c_{11} + \delta_{11} } \right)} & {\eta \tan h\left( {\omega_{12} c_{12} + \delta_{12} } \right)} \\ {\eta \tan h\left( {\omega_{21} c_{21} + \delta_{21} } \right)} & {\eta \tan h\left( {\omega_{22} c_{22} + \delta_{22} } \right)} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {c^{\prime}_{11} } & {c^{\prime}_{12} } \\ {c^{\prime}_{21} } & {c^{\prime}_{22} } \\ \end{array} } \right]\), where the coefficient \(\eta\) limits the parameter interval, which constrains the effect of the outliers on \(C_{{}}^{{ - 1}}\). The perceptron is initialized by \(C_{{}}^{{ - 1}} = \left[ {\begin{array}{*{20}c} {c_{11} } & {c_{12} } \\ {c_{21} } & {c_{22} } \\ \end{array} } \right]\), the learning parameters are \(\omega_{ij}\), \(\delta_{ij}\) (i, j = 1,2) and \(\eta\), and then \(C_{{}}^{{ - 1}}\) is updated to further learn the distribution of data.

The corrected \(C^{\prime - 1}\) is substituted into Eq. (7). Equation (8) can be obtained as follows:

$$l_{{m_{t} }} (y_{{2 \times \frac{ab}{2}}} ,\hat{y}_{{2 \times \frac{ab}{2}}} ) = \overline{tr} \left[ {\left( {\begin{array}{*{20}c} \vdots \vdots \\ {\Delta S_{i} }{\Delta \tau_{i} } \\ \vdots \vdots \\ \end{array} } \right)\left[ {\begin{array}{*{20}c} {c^{\prime}_{11} } & {c^{\prime}_{12} } \\ {c^{\prime}_{21} } & {c^{\prime}_{22} } \\ \end{array} } \right]\left( {\begin{array}{*{20}c} \cdots {\Delta S_{i} } \cdots \\ \cdots {\Delta \tau_{i} } \cdots \\ \end{array} } \right)} \right].$$
(8)

Then, let \(W = c^{\prime}_{11} \Delta S_{i}^{2} + \left( {c^{\prime}_{12} + c^{\prime}_{21} } \right)\Delta S_{i} \Delta \tau_{i} + c^{\prime}_{22} \Delta \tau_{i}^{2}\). Thus, in addition to weighting the slip deviation and bond stress deviation in the Euclidean distance, the product of bond stress deviation and slip deviation is also considered. Let \(S\sim \left( {\overline{S} \pm \sigma (S)} \right)\) and \(\tau \sim \left( {\overline{\tau } \pm \sigma (\tau )} \right)\), if \(\sigma (S) \approx \sigma (\tau )\), the data dispersion degree of the two distributions is approximate, and then the role of \(\Delta S_{i} \Delta \tau_{i}\) can be ignored (and vice versa), which should also validate the above analysis.

Finally, \(W = c^{\prime}_{11} \Delta S_{i}^{2} + \left( {c^{\prime}_{12} + c^{\prime}_{21} } \right)\Delta S_{i} \Delta \tau_{i} + c^{\prime}_{22} \Delta \tau_{i}^{2}\) can be regarded as a cone curve about \(\Delta S_{i}\) and \(\Delta \tau_{i}\), so it must satisfy W > 0 to have practical significance. Therefore, the modified Mahalanobis distance loss function proposed in this paper can add the following strengthened constraint condition: \(\begin{array}{*{20}c} { - 2 < \frac{{\left( {c^{\prime}_{12} + c^{\prime}_{21} } \right)}}{{\sqrt {c^{\prime}_{11} c^{\prime}_{22} } }} < 2} & {and} & {c^{\prime}_{11} > 0} \\ \end{array}\).

Hessian Matrix

In Sect. 3.1, the parameter updating Eq. (6) in meta-learning is derived. The Hessian matrix term essentially measures the second-order sensitivity of the loss function \(l_{{m_{t} }} (\theta )\) of each task in the meta-learning stage to the network parameters. The Hessian matrix term reflects the curvature of the global loss function and can help escape saddle points and local minima during gradient descent. In addition, in Sect. 3.3.1, the following conclusion is derived: for the implicit regression problem of the bond–slip model, the loss function \(l_{{m_{t} }} (\theta )\) is related to \(\Delta S_{i} \Delta \tau_{i}\), which means that there is a certain internal connection between the neurons in the output layer. In addition to sharing certain parameters between the layers, a second-order effect is observed between the parameters of some neurons. Therefore, the Hessian matrix term cannot be omitted.

For the explicit regression problem, since the number of parameters to be regressed is known, the network parameter space may search for the direction of global parameters according to the gradient direction of the update results of the internal task gradient of meta-learning, as shown in Fig. 9. However, for implicit regression problems, it is necessary to use the Hessian matrix correction term to search the parameter space of the mapping mode according to the curvature variation characteristics of the loss function, and then the direction of updating the global parameters in this parameter space is searched.

Fig. 9
figure 9

Analysis of gradient descent.

Results and Discussion

To verify the generalization level of the MMN prediction method, in addition to the SRRC bond–slip prediction task, task-A (AsquBplaCnorDnstiEcoa) and task-B (AsquBplaCnwatDstiEnor) were randomly selected to test the network performance. It should be noted that task-A and task-B are randomly selected and never trained by the network. Therefore, it can be regarded as a test set to verify whether the algorithm can be used in practical engineering applications. In general, the ratio of the training and validation datasets ranges from 7:3 to 8:2. However, when the dataset is small, the trade-off between bias and variance is inevitable. Many scholars, such as Finn et al. (2017), have attempted to solve this problem, and few-shot learning has been proposed. In the problem of few-shot learning, some special operators, such as MAML, have been used to train a better model with only a few training and test samples, which includes the learning framework to weaken the effect of overfitting. Therefore, the samples of each task were divided into a training set and a validation set with a proportion of approximately 10:1 during meta-learning. On this partition ratio, for the SRRC tasks, task-A and task-B, the training set is only used during fine-tuning, and the test set will only be fed to the MMN when testing the net performance. In the present study, the training and validation datasets were not fixed at each batch or epoch. After completing one batch or epoch of learning, the model will shuffle the dataset and resample at the ratio of 10:1.

At present, the four-stage linear model and the three-stage nonlinear model are mostly used in the prediction models of bond–slip between rebar and concrete. Therefore, in this paper, the test results are compared with the calculation results of the eight models described in Table 4. The models were realized using the python 3.6.2 and tensorflow 1.6.0 in the Jupyter Notebook. The tuning process for hyperparameters can be described as follows, which were determined based on the time cost and the accuracy of the results. The epoch in Box 1 and epoch1 or epoch2 in Boxes 2 and 3 were changed from 30 to 60, and finally set to 50. The learning rates, such as \(\alpha\) and \(\beta\), were changed from 0.01 to 0.00001, and finally took 0.0001. The hyperparameters i, which were used to control the update times in each task, were changed from 2–10, and finally set to 6.

Table 4 Description of calculation models.

Training and Test Results of the MMN

Since the number of samples is too large to display them all, the training results of the MMN in the target tasks are randomly selected, as shown in Fig. 10. R2 (coefficient of determination) was introduced in this paper to evaluate the prediction results, which can be calculated as follows:

$$R^{2} = \left\{ {\begin{array}{*{20}l} {\begin{array}{*{20}l} 0 & \qquad \qquad \qquad{R^{2} \le 0} \\ \end{array} } \\ {\begin{array}{*{20}l} {1 - \frac{{\sum {(y_{i} - \hat{y}_{i} )^{2} } }}{{\sum {(y_{i} - \overline{y})^{2} } }}} & \quad{0 \le R^{2} \le 1} \\ \end{array} } \\ {\begin{array}{*{20}l} 1 & \qquad \qquad \quad\quad {R^{2} \ge 1} \\ \end{array} } \\ \end{array} } \right.,$$
Fig. 10
figure 10figure 10

Results of MMN network. a Prediction results of training set in task A. b Prediction results of training set in task B. c Prediction results of training set in SRRC task. d Prediction results of test set in task A. e Prediction results of test set in task B. f Prediction results of test set in SRRC task.

\(y_{i}\) is the ith test value, \(\hat{y}_{i}\) is the ith predicting value, and \(\overline{y}\) is the average value of the test values.

Fig. 10 shows that for the three tasks, the R2 of the training set ranges from 0.8973 to 0.9913, and the R2 of the test set ranges from 0.9181 to 0.9902. Therefore, both the training and test sets of the MMN perform well, and only some outliers appear in the descending stages of some specimens. The explanation for this phenomenon is that hypothesis (4) in Sect. 2.1 may introduce some errors.

Therefore, the MMN prediction method can not only be used to solve the SRRC task but can also be used to solve the other bond–slip prediction tasks, which means that the MMN method has a high generalization level. In the next section, the MMN method will be further compared with the existing prediction methods to verify its reliability.

Algorithm Comparison

In Fig. 11, the results of eight methods were compared with the test results. Both the R2 and mean square error (MSE) are introduced in this section to evaluate the prediction results. The MSE can be calculated as follows:

$$\text{MSE} = \frac{{\sum {(y_{i} - \hat{y}_{i} )^{2} } }}{m},$$
Fig. 11
figure 11figure 11

Learning accuracy of eight algorithms on three tasks. a Comparison of training set. b Comparison of test set.

\(y_{i}\) is the ith test value, \(\hat{y}_{i}\) is the ith predicting value, and “m” is the sample number. If R2 is closer to 1 and MSE is closer to 0, the final result indicates a better result. In addition, the success rate can be determined as the ratio of the number of samples with R2 > 0 and the total sample number.

Because the fitting objects of the samples are ordinary rebar and ordinary concrete, Model (6) (CAoB 2010) and Model (8) (Wu & Zhao, 2013) have low prediction accuracy for the three special tasks: SRRC task, task-A (coated-rebar-task) and task-B (geopolymer concrete task). In addition, training results from other model are compared in Table 5. Table 5 shows that the MMN performs the best of all the compared models in almost all cases. The prediction accuracy of the four-stage model (83.33% success rate) and MAML (58.33% success rate) is only second to that of the MMN, which has a certain anti-interference ability. However, when there are outlier samples in the prediction task (e.g., #320 and #66), the accuracy of the test set decreases. The other models perform poorly and are almost unsuccessful in some tasks (success rate less than 50%).

Table 5 Details of the results of each model.

Taking Sample #320 as an example, the causes of the above situation were further explained: #320 sample (Mo & Chan, 1996) belongs to the coated-rebar-task, but it comes from the other batch of concrete (Wang et al., 2020) with a curve that is quite different from those of other samples in this task. Due to the existence of the #320 sample, the models of overtraining small datasets fail to predict the curve of this sample. Moreover, if there are other similar samples in the task as the training set, the process of updating the parameters in the training set will sacrifice the overall accuracy to take these outlier samples into account. Therefore, only the MMN model and MAML model, which are based on task learning, have certain anti-interference effects. However, when the number or difference of such samples increases, the MAML model using the Euclidean distance to measure error also has low accuracy or even fails because it does not fully learn the relationship between the sample dimensions (e.g., the task AsquBplaCnwatDstiEnor has outliers: #66, #75, and #78 in the training set). The explanation for the low prediction accuracy of MAML compared to MMN can be further discussed as follows. On the one hand, the loss function of the MMN is the Mahalanobis distance modified by the single-layer perceptron replacing the Euclidean distance. Therefore, the MMN model further learns the distribution of the output data in two dimensions. On the other hand, as described in Sect. 3.3.2, the Hessian matrix correction term was used in the MMN model to help search the parameter space of the mapping mode, which will help the model jump out of the local minimum and approach as close to the global minimum as possible.

In summary, the MMN can be considered to have better applicability and accuracy for predicting the bond–slip performance. In the next section, the reason why the MMN model has advantages will be further verified.

Advantages of the MMN

Smaller Loss

The training loss of the five neural networks is shown in Fig. 12. The figure shows that the loss of the MMN is smaller than those of the other models. The loss of the MMN converges at approximately 20 epochs and gradually tends to be flat after 40 epochs.

Fig. 12
figure 12

Training loss versus epoch curve.

However, other networks still fluctuate obviously until the later stage. The fine-tune model even jumps out of the convergence range in the later stage of training, which is due to the sample shuffling during each iteration. For the fine-tune model, its anti-interference is not outstanding, and the distribution of data between each task is quite different, which causes the loss to be nonconvergent.

Better Initial Value

The meta-learning process of two randomly selected samples, Sample #185 (from task AsquBplaCarepDnstiEcor) and Sample #297 (from task AsquBplaCarepDnstiEnor), in the MMN is shown in Fig. 13. After 50 epochs, even though the training results of meta-learning are directly used for prediction, a certain accuracy can be achieved. It should be noted that the DNN prediction models train the data based on the errors between the points, which means that during the training process, there may be some points that do not satisfy the physical meanings (as the bond strength increases, slip decreases at a certain level). As more epochs were carried out, this phenomenon was gradually alleviated. However, some limitations based on the model itself will make the above phenomenon unable to completely disappear. As shown in Fig. 13, the results of Sample #185 show that the MMN model makes a good contribution to solving this problem, and those of Sample #297 show that the MMN model helps alleviate this phenomenon. The explanation for the result of Sample #297 can be explained as follows. It should be noted that Fig. 13 was used to present the training process in the meta-leaning period, which is not the final result. In this process, the task with sample #297 has not been trained using Fine-tune, which means the data distribution of the samples in this task has not been trained specifically. That will not effectively play the role of the modified Mahalanobis distance in the MMN model.

Fig. 13
figure 13

Output curve during meta-learning.

The prediction results of the MMN, MAML, fine-tuning and DNN (all datasets) algorithms after the initial parameter optimization process are compared, as shown in Fig. 14. The training results of the MMN before fine-tuning are even better than those of the DNN that has completed the training in the overall trend.

Fig. 14
figure 14

Comparison of initialization parameter optimization ability of different algorithms.

Easier Gradient Descent

The relationship between the parameters (randomly selected two-dimensional) of the five models and the global loss is shown in Fig. 15, and the gradient search direction is marked in the figure. The surface of the MMN method is relatively smooth because the distribution of the sample itself was learned when considering the loss function. Moreover, the MMN takes the Hessian matrix into consideration so that with the curvature variation characteristics of the loss function, the search direction of the optimal path is clearer. The global loss of other models has many local minimum points or saddle points that are difficult to jump out. Even if the model introduces noise, in most cases, it may jump out of the original extreme point and may even fall into a worse extreme point. Therefore, the MMN can be considered easier for gradient descent.

Fig. 15
figure 15figure 15

Gradient descent of different algorithms. a) Loss function of the MMN algorithm. b Loss function of the MAML algorithm. c Loss function of the Fine-tune algorithm. d Loss function of the DNN (all data set) algorithm. e Loss function of the DNN (target data set) algorithm.

Parameter Analysis About Section Size of Rebar

In this section, based on the database and model established in this paper, the change in the ultimate bond strength of SRRC and C30 ordinary reinforced concrete with the rebar section size (under the same other design conditions: concrete cover thickness c = 5 d, anchorage length L = 5 d) is analyzed, as shown in Fig. 16a. In addition, the variation in the relative slip “Su” corresponding to the ultimate bond strength with the section size of the rebar is also analyzed in Fig. 16b. The ultimate bond strength of SRRC and C30 ordinary reinforced concrete with rebar section size changes according to the same general rule, which has peak values of d = 12–14 mm. Compared with C30 ordinary reinforced concrete, the ultimate bond strength of SRRC decreased on average by approximately 33.59% (coefficient of variation is about 0.28, standard deviation is 9.45%). For Su, the slip of C30 ordinary reinforced concrete is far less than that of SRRC, and the average decrease is about 74.63% (coefficient of variation is about 0.15, standard deviation is 11.35%).

Fig. 16
figure 16

Parameter analysis about rebar section size and bond–slip behavior. a Rebar size versus ultimate bond strength. b Rebar size versus Su slip.

Conclusion

Based on more than 2000 tests performed by 36 references, a large database BondSlipNet containing 558 samples was built, and 16 groups of bond–slip tests on SRRC reinforced concrete were performed, based on which the SRRC-Net database was established. These two databases can be directly used for researcher to study the data-driven method for the bond–slip problem. Then, a neural network MMN algorithm was proposed, which is suitable for the prediction task with a small sample size and implicit relationship between bond–slip characteristics and curves. This algorithm had sufficient accuracy for the SRRC task and the other two randomly selected tasks and was compared with the existing seven algorithms to verify its accuracy, convergence ability and generalization level. The well-trained MMN neural network can be used directly in real engineering for each task involved in BondSlipNet database. The applicable design parameter range in the MMN model is outlined as shown in the second column of Table 2, which covers most of the common parameter conditions of reinforced concrete bond–slip behavior. Furthermore, if the prediction tasks are out of the scope of the BondSlipNet database (the parameter range exceeds the values listed in the second column of Table 2), the MMN algorithm and the built databases can also be used to transfer the prior knowledge for the new tasks. The following conclusions and prospects can be drawn:

(1) The concrete proportion and section shape of the rebar in SRRC reinforced concrete are significantly different from those of modern reinforced concrete, leading to different bond–slip performances. Overall, the bond strength between SRRC rebar and concrete is relatively low (compared with C30 ordinary reinforced concrete, the average strength decline is approximately 33.59% with a coefficient of variation of approximately 0.28 and a standard deviation of 9.45%), and it is difficult to explicitly develop a unified bond–slip model.

(2) The existing prediction models for bond–slip often only aim at a specific prediction task, which are difficult to apply directly when transferring to new tasks. The traditional DNN model is not sensitive to new tasks because of overfitting the characteristics of the sample. The four-linear-stage model and three-nonlinear-stage model require many tests on new tasks to re-regress the parameters and even need to reselect the fitting function, which greatly increases the test cost and research effort.

(3) The MMN algorithm proposed in this paper possesses the characteristics of MAML algorithm learning based on tasks and uses the learned single-layer perceptron-modified Mahalanobis distance loss to replace the MSE loss. At the same time, the network is rewritten as a multitask learning framework, and the role of the Hessian matrix is considered. The calculation results and mathematical analysis show that these improvements contribute to the prediction of the bond–slip model.

The accuracy of the MMN exceeds that of some existing algorithms, but it also falls into a certain bottleneck. For some specific samples, there will be some outlier points in the descent stage. In addition, although the overall trend of the model is close to that of the test, the distance between the predicted points and label points can be improved. This can be achieved by further improving the labels, expanding the database capacity and adjusting the local network algorithm.

Availability of data and materials

All data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

References

Download references

Acknowledgements

The authors gratefully acknowledge sponsorship by the Jiangsu Provincial Key Research and Development Program (Grant No. BE2017717) and Scientific Research Foundation of Graduate School of Southeast University (YBPY2101).

Funding

This research was support by the Jiangsu Provincial Key Research and Development Program (Grant No. BE2017717) and Scientific Research Foundation of Graduate School of Southeast University (YBPY2101).

Author information

Authors and Affiliations

Authors

Contributions

CZ contributed to the conceptualization, methodology, software, formal analysis, and writing—original draft. QC contributed to validation, supervision, project administration, funding acquisition and writing—review and editing. AS contributed to resources, data curation, writing—review and editing and visualization. YL contributed to resources, investigation and writing—review and editing. HW contributed to resources and investigation. All authors read and approved the final manuscript.

Authors’ information

Chengwen Zhang: Ph.D., School of Architecture, Southeast University, Nanjing, Jiangsu, 210096, China, ORCID: https://orcid.org/0000-0002-5938-7559. E-mail: zhang1chengwen@163.com.

Qing Chun: Professor, School of Architecture, Southeast University, Nanjing, Jiangsu, 210096, China (corresponding author). E-mail: cqnj1979@163.com.

Ao Sun: Ph.D. Candidate, School of Civil Engineering, Southeast University, Nanjing, Jiangsu, 210096, China, E-mail: 821,031,305@qq.com.

Yijie Lin: Ph.D., School of Architecture, Southeast University, Nanjing, Jiangsu, 210096, China, E-mail: linyijie1223@163.com.

Haoyu Wang: Ph.D. Candidate, School of Architecture, Southeast University, Nanjing, Jiangsu, 210096, China, E-mail address: mathewwhy@163.com.

Corresponding author

Correspondence to Qing Chun.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Journal information: ISSN 1976-0485 / eISSN 2234-1315

Supplementary Information

Additional file 1. 

A large pretraining database for bond-slip model.

Additional file 2. 

A small fine-tune database for bond-slip model.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Chun, Q., Sun, A. et al. Improved Meta-learning Neural Network for the Prediction of the Historical Reinforced Concrete Bond–Slip Model Using Few Test Specimens. Int J Concr Struct Mater 16, 41 (2022). https://doi.org/10.1186/s40069-022-00530-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40069-022-00530-y

Keywords

  • neural network
  • bond–slip model
  • meta-learning
  • architectural heritage
  • square rebar