Skip to main content

Machine Learning Based Reactivity Prediction of Fly Ash Type F Produced from South Korea

Abstract

Fly ash (FA) is the most commonly used supplementary cementitious material in the world. However, the reactivity of FA varies substantially. In this study, new machine learning (ML) model has been developed to efficiently predict the amorphous content in FA type F. Compared to the existing ML model using types F and C of FA from different countries, this study more focused on the improved prediction of FA type F only produced from South Korea. It was found that the contents of CaO and SiO2 impact high in predicting the amount of aluminosilicate glass. However, the contribution of Al2O3 and Fe2O3 are ranked differently. The improved model algorithm was proposed as a combination of three ensemble techniques of bagging, boosting, and stacking. As a result of the test, the final model shows \({R}^{2}\) of 0.80 in predicting the amount of aluminosilicate glass in FA type F.

1 Introduction

Limestone (CaCO3) is used as a raw material in the manufacturing of cement, a main binding component of concrete. CaCO3 completely decomposes at around 800 ℃, emitting a large amount of CO2. In producing 1 ton of cement, approximately 0.8 ton of CO2 is emitted. This amounts to about 5–8% of global CO2 emissions (Hasanbeigi et al. 2012). As carbon neutrality has emerged as a global concern, there has been a growing interest in researching supplementary cementitious materials (SCMs) as substitutes for ordinary Portland cement (Paris et al. 2016). Due to its abundance as an industrial byproduct and its ability to enhance the quality of concrete, FA has become a popular material as an SCM, offering significant economic benefits. However, the quality of FA varies greatly depending on the facilities and operating conditions of coal-fired power plants, and on the types of raw coal (Xu and Shi 2018). Since the quality of FA has a significant impact on concrete performance (Chancey et al. 2010; Oey et al. 2017), it is critical to judge the material properties of FA before being used in cement-based materials. However, according to ASTM standards, FA is simply classified as C-class or F-class according to the CaO content. Many studies, however, have found that such classifications are inaccurate (Göktepe et al. 2008; James and Maria 2001; John 2017; Suárez-Ruiz et al. 2017). These claims can be also supported by a recent research showing that the strength development of concrete is more influenced by the complex reactivity of FA, rather than solely by its CaO content (Donatello et al. 2010; Snellings and Scrivener 2016).

Although the chemical composition of FA significantly varies depending on the raw coal type, the mineral composition of the crystalline phase primarily consists of quartz (SiO2) and mullite (3Al2O32SiO2). Meanwhile, the amorphous (noncrystalline) phase exists in the range of 40–80 wt.% in FA (Vassilev and Vassilev 1996). FA’s reactivity is generally governed by its amorphous phase, because the crystalline phase does not actively participate in the reaction (Ward and French 2006; Williams and Riessen 2010). It is recognized that the component with the greatest reactivity in the amorphous content is aluminosilicate glass, which is a combination of alumina (Al2O3) glass and silicate (SiO2) glass (Brouwers and Eijk 2002; Moomen and Siddiqui 2022; Pietersen et al. 1989; Sindhunata et al. 2006). As a result, estimating the quantity of aluminosilicate glass (amorphous aluminosilicate) is critical in predicting the strength of FA contained concrete. The amorphous component of FA could be indirectly analyzed using quantitative x-ray diffraction (QXRD). This analysis applies the partial or known crystal structures (PONKCS) method, which defines an unknown mineral phase as a virtual crystal structure and quantitatively analyzes its mixture with other minerals using the Rietveld method (Kim et al. 2018). The chemical composition of FA, on the other hand, can be simply obtained by quick XRF analysis.

According to certain studies, the amorphous phase composition of FA is correlated with (but not identical to) chemical compostion of bulk FA (i.e., cystalline and amorphous phase) (Aughenbaugh et al. 2016; Xu and Shi 2018). However, mapping results between the amorphous phase composition and chemical composition of bulk FA remains uncertain. Such linkages could be proposed by machine learning (ML) technique. QXRD analysis of mineralogical phase composition has limits in that a skilled experimenter is needed and the process is rather difficult considering the easy implementation of XRF test. Thus, the fact that such mapping can be produced by ML is significant in and of itself. Furthermore, even in the absence of an exact solution about this mapping, ML has a potential to rapidly predict the amoutn of amorphous phase from the quick XRF data.

Meanwhile, most FA-related research focuses solely on concrete compression strength and durability, which definitely vary according to the content or type of FA. There has been little research on the ML to forecast the chemical reactivity or structure of FA. This attempt was recently reported by Song et al. (2021). This paper firstly attempted to predict the chemical component of amorphous phase (i.e., calculated by the QXRD) from the XRF-based chemical compositions of FA. However, considering the number and characteristics of used FA data, application of ANN(artificial neural network) algorithm may have certain limitations as they concluded so in the paper. The motivation of this study is from the inaccruate prediction result of exising ML model on the FA type F from Korea. FA itself has high complexity and it property (i.e., chemical compatibility with cement-based materials) should be greatly influenced by its geographical origin and the operational conditions of thermal power plants (Cho and Lee 2019). Therefore, it is not surprising that the existing ML established based on interntional database of FA both F and C was not able to accurately predict the reactivity of FA type F from a certain country. Furthermore, it is well known that the hydration mechanism of FA type F and C is different in cementitious materials (Shon 2004; Sumer 2012; Wardhono 2017; Yoon et al. 2022). Therefore, it is rational to separate the type F and C for constructing reliable ML model. This study aims to propose a modified ML to tackle the issue. The performance of the new ML model is evaulated by how accurately the target value (aluminosilicate glass content estimated by QXRD) of the given data set (the test set) was predicted (i.e., \({R}^{2}\) value). First, we recreated the ML model (ANN) of Song et al. (2021). The model was made using FA from various countries (i.e., the United States, India, Canada, the Netherlands, Spain, Greece, and Italy). Then, Korean FA was added as new data set to validate the model’s applicability in Korea. Second, another ML model was built using only Korean FA type F. Third, this new model has been refined and validated.

2 Materials and Methods

2.1 Data Materials

2.1.1 ML Model A

To make an ML model, data is required from which a machine learns and can then test a trained model. The first model was built based on FA from various countries. It will be referred to as model A. As data for model A, 90 FA samples from 13 papers were used (Abualrous 2017; Aughenbaugh et al. 2014; Bhagath Singh et al. 2016; Chancey et al. 2010; Durdziński et al. 2015, 2017; Fowler 2013; Moreno et al. 2005; Mukhopadhyay et al. 2019; Oey et al. 2017; Saraber 2017; Sheare 2014; Singh and Subramaniam 2018). These papers cover a wide variety of FA from 7 countries. Each FA samples has several features. Features are divided into input and target (output) features. The goal of ML is to derive a data-driven function of input features through given data, and then to predict output features of the test data using only its input features. In model A, each FA sample has six input features and one output feature. Inputs comprise of the chemical composition (wt.%) of six major oxides: (1) Al2O3, (2) CaO, (3) Fe2O3, (4) SiO2, (5) MgO, and (6) Na2O + 0.658K2O (i.e., total alkali content). The output is the aluminosilicate glass content. Fig. 1 shows the distribution of model A dataset, and variations of the six selected input features are summarized in Table 1. All of the 90 FA samples are compliant with the ASTM C618 requirement; That is, the sum of the Al2O3, SiO2, Fe2O3 chemical compositions exceeds 50% (ASTM C168 2019; Song et al. 2021). In input features, the oxide component of the highest fraction and variance is SiO2, followed by Al2O3, and CaO. The values for each feature and the relevant references for model A can be found in the Additional file 1.

Fig. 1
figure 1

Histogram, density curve and rug plot of model A dataset

Table 1 Statistical information on the chemical composition (wt.%) of six input features of model A as determined by XRF

2.1.2 ML Model B

Because of the variation in FA both type F and C across countries, the second ML model was made using FA type F from Korea. This will be referred to as model B. As data for model B, 62 FA samples from 17 papers were collected (Cho and Lee 2019; Cho et al. 2016a, b, 2019; Jang and Lee 2016; Jeon et al. 2015, 2018; Jung-Il et al. 2020; Kang et al. 2013; Kim et al. 2017, 2018; Moon et al. 2016; Oh et al. 2014, 2015; Park and Choi 2019; Suh and Park 2019; Suh et al. 2019). The model B data, like the model A data, comprises the XRF chemical composition of bulk FA as input features, and the amount of aluminosilicate glass determined by QXRD as the output feature. Fig. 2 shows the distribution of model B dataset, and variations of the six input features are summarized in Table 2. All of the 62 FA samples are also compliant with the ASTM C618 requirement. Similar to the existing model A database, the Korean 62 samples show the highest fraction and variance in SiO2, followed by Al2O3, and CaO. Model B dataset can be also found in the Additional file 1

Fig. 2
figure 2

Histogram, density curve and rug plot of model B dataset

Table 2 Statistical information on the chemical composition (wt.%) of six input features of model B as determined by XRF

But, some of the Korean FA samples in model B dataset cannot predict the target feature. Then, model B dataset has been reconstructed by removing outlier samples. 43 Korean FA types selected from 13 papers were used as data, while 19 FA types were excluded. An ML model has been made as part of the data pre-processing process to determine outliers from the original model B dataset. This model is called the pre-processing model. The pre-processing model used a tree-based ensemble model to prevent biased judgment of outliers for certain models. The description of the outlier(removed) sample and the selected sample determined by the pre-processing model is desbribed in below.

The material property of FA varies considerably, depending on a lot of factors. Even if all input feature values are similar, target feature values can differ drastically if other factors (especially particle size or geographical region) differ from those of typical FA. This can be seen in Fig. 3. The ternary plot consists of network modifiers (i.e., 2Ca + Na + K + 2Mg) and two network formers (i.e., Al and Si) as axes. 3-dimension elemental atomic composition of possible network modifiers and network formers are calculated from 6-dimension input features (i.e., chemical composition of XRF). According to the network theroy (Zachariasen 1932), the higher content of network modifiers tend to generate a higher amorphous phase content in FA (Diamond 1983; Oh et al. 2015; Shi and Zheng 2007). But, the ternary plot shows that there is little of this tendency in the used dataset herein. It is considered to be due to different partical size, geographical difference or experimental error that are not included in the input features (but important factors for material property). In this study, these are considered as outliers of type 1. In addition, different ranges of samples among dataset were viewed as type 2 outliers. Thus, 19 samples that cannot be predicted in the pre-processing model are classified as outliers, and the remaining 43 samples are finally used as data for Model B. This selected FA samples and the relevant references can be found in the Additional file 1.

Fig. 3
figure 3

Triangular compositional plot of original model B dataset (atomic %)

2.2 Data Sampling: Stratified Sampling

To make an ML model, the data set should be split into two parts twice. It should first be separated into input values and a target value (or several target values). Second, train-data and test-data must be separated. This second process is called data sampling. Random sampling is the ordinay sampling method, but if there is not enough data, this method may generate a bias. To prevent the problem of insufficient the number of data, stratified sampling was applied in this study. Stratified sampling is a technique in which a target feature is broken into n-layer based on frequency and then a similar amount of data is sampled for each layer (see Fig. 4). So, the sampled data has a ratio for each layer equivalent to the ratio in the whole dataset. By doing this, the dataset is split into train-data and test-data, each with a distribution that is similar to the distribution of the entire dataset. The effect of the stratified sampling is shown in Fig. 5. In model A, 85% (76) and 15% (15) were selected for test-data and train-data, respectively. In pre-processing model, train-data accounts for 80% (49) and test-data 20% (13). In model B, outliers 19 types of FA are excluded from the original model B dataset; the remained samples are divided into 80% (34) train-data and 20% (9) test-data.

Fig. 4
figure 4

Visual representation of stratification of a target feature (pre-processing model)

Fig. 5
figure 5

Density curve of train test data (pre-processing model)

2.3 Data Transformation: Feature Scaling

Well-organized data is the basic foundation of good prediction model. Any artifacts, missing values, or outliers in the acquired data should be properly handled to create a good ML model. Also, data should be transformed into a form suitable for modeling. This belongs to the data transformation step of data preprocessing. This is called feature scaling, a process that unifies each range of data features by normalization or standardization. Standardization of feature scaling is to transform the range of each feature into distributions with a mean of 0 and a variance of 1. It follows the Eq. 1 below, which can be automatically done using the ‘standardscaler’ function in the ‘sklearn’ library.

$$x_{i,new} = \frac{{x_{i,old} - \overline{x}}}{\sigma }$$
(1)

In this study, dataset is commensurable (wt.%) but greatly diverges in terms of the distribution of features. The weight fractions of SiO2 and Al2O3 are significantly higher than those of other oxides. For instance, the CaO weight fraction is 0.1–29.2 wt.%, and the SiO2 weight fraction is 27.1–70.8 wt.% (in the case of model A). The impact of variation in the CaO value on aluminosilicate content can be underestimated since the CaO value is relatively low. On the other hand, the impact of SiO2 may be overestimated. As a result, the feature scaling technique (i.e., standardization) was used in the data preprocessing step to ensure that ML models do not exhibit a bias toward particular features of the data set.

2.4 Algorithms

2.4.1 Artificial Neural Network.

Model A was made based on ANN algorithm. The structure of an ANN corresponds to the structure of a biological neural network. The ANN is a network of numerous perceptrons (artificial neurons), as shown in Fig. 6. The ANN procedure is as follows (Matias et al. 2014): (1) In the input layer, nodes (i.e., perceptrons) determine whether to consider each input value using the binary variables \({s}_{i}\) of each input variables. (2) In the hidden layer, nodes transmit their output value to the posterior nodes when input values from the input layer (or output values of prior nodes, in the case of multiple hidden layers) are multiplied by their synaptic weight \(w\) and the result exceeds their threshold \(\theta\) (bias). The weight represents the impact of the value variation of one node on another. The activation function \(f\)(i.e., rectified linear unit, ReLU) is applied in this step. (3) In the output layer, a node yields a target value, which is the output value from the last hidden layer times a synaptic weight. This step is taken by the function \(g\) (i.e., linear) instead of the activation function. Steps (1) to (3) of this process are collectively called the “feedforward” approach; this process is a unit of epoch (iteration). (4) The errors between the target value and the actual value are set as the cost function, and through the derivative of a cost function using chain rule, the ANN returns to step (1) and repeats the epoch until the error is no longer low and a more appropriate weight and threshold are found. This process is referred as the “backpropagation” approach. Finally, model A was developed using the multiple hidden layer perceptron regressor (MLPRegressor) library, ReLU as an activation function, and MSE as a cost function.

Fig. 6
figure 6

Schematics of ANN (Tangri et al. 2008)

Model A, which corresponds to the preceding study (Song et al. 2021), used the ANN model. Considering the number and characteristics of FA data, the selected algorithm may not be appropriate. The ANN algorithm requires at least hundreds of data (Schocken 1991), and the fly ash data may require even more data because of its intrinsic complexity. If it learns less than 100 pieces of data and has high predictions for test data, it is likely that the given data does not represent the whole data available globally. Therefore, as a solution to resolve this issue, a tree-based ensemble model for F-class FA was created in this study.

2.4.2 Ensemble

Ensemble algorithm was used in pre-processing model and model B. The ensemble is a multiple learning algorithm. The ANN described above creates one strong learner (optimal model) with a network of numerous perceptrons, while the ensemble method is to create multiple weak learners and synthesizes them to make a more powerful learner. The advantage of this method is that it can solve a trade-off relationship between bias and variance. Bias increases when the model is not sufficiently trained (i.e., underfitting model). On the other hand, variance increases when the model is overlearned and its applicability to new data decreases (i.e., overfitting model) (Geman and Doursat 1992). Therefore, if the bias is lowered, the variance may increase (and vice versa), resulting in a trade-off problem (Doroudi 2020). The best way to solve this problem is to collect and train a lot of data, but it is difficult to obtain a large amount of XRD data of Korean FA. Here, we tried to solve this problem using ensemble algorithms and reveal the nonlinear structure of the dataset.

Most popular ensembles are boosting, bagging, and stacking ensembles. The purpose of the bagging ensemble is to lower the variance (Derbeko et al. 2022). The boosting ensemble aims to further reduce bias by fitting data, which could not be fitted even through bagging (Schwenk 2000). The stacking ensemble attempted to solve the trade-off problem by blending ensemble models (e.g., bagging and boosting) (Breiman 1996; Doroudi 2020; Wolpert 1992). In pre-processing model, all three ensembles were used, and model B used one boosting ensemble. The boosting technique was chosen to better fit the data from which outliers have been removed.

First, a bagging is an ensemble that combines several weak learners in parallel. In this study, RandomForestRegressor library was used as a bagging ensemble. As the name suggests, numerous individual trees (i.e., weak learners) are combined to produce a forest (i.e., a strong learner). In the case of the regressor problem, the average of the predicted values of each tree is used as the predicted value of a forest. Fig. 7a shows the process of the bagging algorithm. (1) Extract the subset of the train-data for each learner at random with replacement (There are overlapping samples between learners). (2) Train models using the same algorithm but different sample data. So, the predicted values are different. (3) These predicted values are aggregated as averages to generate a target value \(y\) of the final strong learner (bagging model). This method avoids overfitting noise and outliers because the results of each weak learner come up with one mean value. It was the algorithm used for pre-processing model.

Fig. 7
figure 7

Schematics of ensembles (Yang et al. 2019)

Next, a boosting ensemble is a model that combines several weak learners in sequential. The GradientBoostingRegressor library was used as the boosting ensemble. Gradient boosting machine (GBM) is error (residual) fitting, unlike target value fitting. What leaner predicts is not a value itself, but a residual. It focuses on problems that were difficult for previous learners to solve. Fig. 7b is the process of the GBM algorithm according to bagging ensemble illustration form of Yang et al. (2019). (1) The first learner is trained with a subset of train-data. The residual of the first learner becomes input data of the next learner. (2) Since the current learner has learned the residual of the prior learner, it predicts the residual value, not the value itself. The residual reconstructs in a decreasing direction iteratively up to the last learner. (3) The predicted values of each weak learner are added to obtain a target value \(y\) of the final strong learner (boosting model). The algorithm was used for pre-processing model and model B.

Thirdly, a stacking ensemble is used to solve the trade-off problem. The aforementioned bagging and boosting ensembles are methods of synthesizing each learner, which is trained with different data on the same algorithm, whereas stacking is a technique of synthesizing each model (base model) with the same data on different algorithms to create a new model (meta-model) (Breiman 1996). Fig. 7c shows the process of the stacking algorithm according to bagging ensemble illustration form of Yang et al. (2019). (1) In level 1, several different types of base models are trained with the same input data. (2) In Level 2, The meta-model is trained with the results of each model as the input data. In other words, each base model creates new input features, and the meta-model learns them. (3) Finally, the prediction of the meta-model is a target value \(y\) of the final meta-model (stacking model). In this work, bagging and boosting ensembles were used as base models and StratifiedKFold library is used to create a stacking ensemble. This corresponds to pre-processing model.

3 Results and Discussion

3.1 Model A

Fig. 8a shows that obtained \({R}^{2}\) for the train-data is 0.63 for exising database of both fly ash F and C from various countries. \({R}^{2}\) is a metric for measuring how much of a target feature's variation can be predicted from independent variables using a trained model. It normally ranges from 0 to 1. The closer \({R}^{2}\) is to 1, the better the input features can explain the target feature. Model A has an explanatory power of 63%. Not all samples of model A is highly predictable. Therefore, it can be suggested that it is necessary to use an algorithm that is more suitable for the used FA dataset than the ANN algorithm suggested by Song et al. (2021).

Fig. 8
figure 8

Relationship between predicted and actual values of model A

Model A was applied to FA type F from Korea. When accuracy was assessed using 62 types of Korean FA type F as new data, negative value of \({R}^{2}\) was obtained. Negative \({R}^{2}\) means that the prediction accuracy is worse than that predicted by the average. It simply means that the model fits the new data really poorly (Barten 1987). It can be a result of biased data used to train the model A. Biased data does not include new data, and the model trained only with such data are less common. Fig. 8b shows that one sample of existing model A database and one Korean sample (12 FA type F) have similar XRF input features, but a QXRD output feature is very different. The quality of FA varies significantly depending on the location of coal-fired power plants, as well as their infrastructure and operational circumstances, and the types of raw coal. Therefore, rather than using FA both type F and C from various countries, it is more economical and accurate to develop an individual model for a specific country in order to uncover the uncertain linkage between the XRF input features and a QXRD target feature of FA, especially considering the high undertainty that FA material intrinsically has.

To explain the result of model A, shapley additive explanations (SHAP) analysis is used. The high-dimensional ML model is called a black box model because the exact solution between the input features and the target features is unknown. However, SHAP analysis can quantitatively determine the impact of each input feature on the prediction of a target feature. Fig. 9 shows the SHAP analysis of model A. The SHAP value indicates the importance of each feature based on game theory (Antwarg et al. 1903; Lee and Lundberg 2017); That is the averaged change in prediction according to the presence or absence of each feature by forming a combination of several features (Mangalathu et al. 2020). Fig. 9a is a SHAP value’s scatter plot. The horizontal axis represents SHAP values, while the color of the point indicates the value of the feature from low to high. Since the overlapping points are scattered in the y-axis direction, the distribution of the SHAP values per input feature can be computed. The input features are arranged according to importance. If the SHAP value is negative, this means that it is a factor that lowers the value of a target feature. Positive value indicates that it is a factor that increases the value of a target feature. Fig. 9b is a bar plot of the mean of the absolute SHAP value. The larger this value, the more important feature in predicting the target feature. In model A, as the value of CaO, SiO2, and Al2O3 increase, the SHAP value increases, and the impact increases in the negative direction. CaO feature has the highest importance, followed by SiO2. While Fe2O3 and Na2O + 0.658K2O (i.e., total alkali content) have some impact on the target feature, it is low.

Fig. 9
figure 9

Analysis of the impact of the six input features on the target feature in model A

3.2 Pre-Processing Model

A ML model was built using 62 types of FA type F all collected from 17 studies reported in Korea case. This pre-processing model used all three ensemble methods (i.e., bagging, boosting, and stacking). Using the boosting and the bagging ensemble as the base model, the meta-model (stacking ensemble) is pre-processing modelAs shown in Fig. 10, \({R}^{2}\) of each ensemble are low at 0.38, 0.36, and 0.49 respectively. As can be seen from Fig. 10, pre-processing modelmodel has \({R}^{2}\) of 0.49, which is improved over the performance of the base model. However, it can still be seen that the target feature is not predicted with the given input features. Especially if the predicted value of the amorphous aluminosilicate content (the target feature) is 55 (wt.%), pre-processing modelmodel cannot approximate the actual value. As shown in Fig. 3, the linkage between the input features and the target feature of some FAs that are not approximate differs from the overall FAs of pre-processing modelmodel. This is thought to be the difference in particle size and geographic location that did not be considered as input features. Since features other than the given input features were not covered in this study, these non-approximate FAs were removed to develop a new model (i.e., Model B) targeting only representative FA type F from Korea.

Fig. 10
figure 10

Relationship between predicted and actual values of pre-processing model

3.3 Model B

For model B, 43 Korean FA type F from 13 papers were chosen as FA samples. Fig. 11 displays the performance of model B. \({R}^{2}\) of the train-data is 0.80; Model B has an explanatory power of 80%. With the exception of a few samples, in most FA samples, aluminosilicate glass content can be precisely predicted from XRF input features.

Fig. 11
figure 11

Relationship between predicted and actual values of model B (boosting ensemble)

Fig. 12 shows SHAP analysis of model B. As the value of CaO increases, the SHAP value increases, and the impact increases in the negative direction. CaO feature has the highest importance. While MgO, Fe2O3, and Na2O + 0.658K2O (i.e., total alkali content) have some impact on the target feature, it is low. Therefore, it can be proposed that the refined Model can be applicable to accruately predict the reactivity of FA type F produced in Korea with high accuracy of prediction.

Fig. 12
figure 12

Analysis of the impact of the six input features on the target feature in model B

As a result, when the ML model was individually made only for type F FA from Korea, the prediction accuracy (a coefficient of determination, \({R}^{2}\)) is 0.8. This is a 27% increase in performance, compared to the existing database of model A. This seems to be because the chemical composition and amorphous phase of FA varies depending on the regions and type (F or C) of FA, which causes the hydration mechanism to be different in cementitious materials. Therefore, it can be concluded that it is more effective and accurate to build ML model considering the specific region and specific type of FA.

4 Conclusion

Mapping XRF chemical composition of the six major oxides into the aluminosilicate glass content has been achieved by refinding existing ML model. The most recently proposed model is the ML model for FA type both F anc C from various countries. However, this model was not successful in prediction of reactivity of FA type F from Korea. To create an ML model for this specific targeted country, 43 FA type F from Korea were used to develop a final model (i.e., model B) with boosting ensemble algorithm. \({R}^{2}\) (i.e., a score of accuracy) of test-data is 0.80. It is possible to predict the amount of aluminum silicate glass of Korean FA using the proposed ML model. Additional conclusions can be draswn from the study are as follows:

  1. 1

    The model A was built using the ANN algorithm for FA for both F and C collected from various countries of India, Japan, China, the United States, Canada, and Europe. Model A has \({R}^{2}\) of 0.63, which means that the explanatory power of the input variable to the target variable is 63%. However, the amount of aluminosilicate glass calculated QXRD was not accurately predicted for the Korean FA type F using the ML model. It shows a negative value of \({R}^{2}\).

  2. 2

    The pre- processing model applied the three ensemble algorithms (i.e., bagging, boosting, and stacking) for the Korean FA. It shows the \({R}^{2}\) of 0.49. Because, regardless of the actual target value, in some samples, a target variable was predicted to be the value in the range 52 to 58 (wt.%), the largest distribution on the entire dataset. It seems that the actual value was not approximated due to variations in particle size or geographic location that were not taken into consideration as input features. The model B has been developed using a boosting ensemble for the selected 43 FAs, excluding the 19 FAs deemed to be outliers. This is the final model of this study, with \({R}^{2}\) of 0.80 and an explanatory power of 80%. As shown in Fig. 11, in most FA samples, the content of the aluminosilicate glass phase is precisely predicted by the chemical composition of the six major oxides obtained by XRF.

  3. 3

    This study shows that it is possible to predict the aluminosilicate glass content using XRF chemical composition (without the use of QXRD). The suggested ML model of this paper still has room to be improved if various FA samples are added in the future. In this sense, the current model can be specific which targets only certain quality of FA. Nevertheless, the high accuracy of the suggested ML model could be achieved by the conducted grouping selection using a pre-processing model before model optimization. Therefore, it can be concluded that the model can be more general once more data with various quality can be accumulated and tested.

  4. 4

    Nevertheless, as shown in Figs. 9 and 12, it was found that model A and the final model B for Korean FA has something in common in that CaO content is the most important factor in predicting aluminosilicate glass content, followed by SiO2 content. Furthermore, it is found that other chemical components also have a meaningful influence. Therefore, this study suggestes that the current ASTM standard for FA classicifaction may not be sufficient. That is, not only the content of each oxide but also the relationship between each oxide content should be considered. But Al2O3 and Fe2O3 are ranked differently in models A and B, and there is significant variation in the SHAP values of SiO2 or lower factors between the two (Figs. 10 and 13). The obtained different contribution of the elemental composition on the amorphous content can be due to the compositional variance of raw coal or operational condition.

Fig. 13
figure 13

Comparison of prediction accuracy of models A and B

Availability of Data and Materials

All the datasets associated with this study are available from the corresponding author upon request.

References

  • Abualrous Y. (2017). Characterization of Indian and canadian fly ash for use in concrete, PhD. Thesis. University of Toronto

  • Antwarg, R. M. L., Shapira, B., & Rokach, L. (2019). Explaining anomalies detected by autoencoders using SHAP. Arxiv preprint. https://doi.org/10.48550/arXiv.1903.02407

    Article  Google Scholar 

  • ASTM C618. (2019). Specification for coal fly ash and raw or calcined natural pozzolan for Use in concrete. West Conshohocken: ASTM International.

    Google Scholar 

  • Aughenbaugh, K. L., Stutzman, P., & Juenger, M. C. G. (2016). Identifying glass compositions in fly ash. Frontiers in Materials, 3, 1.

    Article  Google Scholar 

  • Aughenbaugh, K. L., Williamson, T., & Juenger, M. C. G. (2014). Critical evaluation of strength prediction methods for alkali-activated fly ash. Materials and Structures, 48(3), 607–620.

    Article  Google Scholar 

  • Barten, A. P. (1987). The coefficient of determination for regression without a constant term, The Practice of Econometrics (pp. 171–189). Dordrecht: Springer.

    Google Scholar 

  • Bhagath Singh, S. M. A. G. V. P., Kolluru, M. A., & Subramaniam, V. L. (2016). Quantitative XRD analysis of binary blends of siliceous fly ash and hydrated cement. The American Society of Civil Engineers. https://doi.org/10.1061/(ASCE)MT.1943-5533.0001554

    Article  Google Scholar 

  • Breiman, L. (1996). Stacked regressions. Machine Learning, 24(1), 49–64.

    Article  MATH  Google Scholar 

  • Brouwers, H. J. H., & Van Eijk, R. J. (2002). Fly ash reactivity: Extension and application of a shrinking core model and thermodynamic approach. Journal of Materials Science, 37, 2129.

    Article  Google Scholar 

  • Chancey, R. T., Stutzman, P., Juenger, M. C. G., & Fowler, D. W. (2010). Comprehensive phase characterization of crystalline and amorphous phases of a class F fly ash. Cement and Concrete Research, 40(1), 146–156.

    Article  Google Scholar 

  • Cho, Y.-H., An, E.-M., Chon, C.-M., & Lee, S. (2016a). Effect of fillers on high temperature shrinkage reduction of geopolymers. Journal of the Korean Institute of Resources Recycling, 25(6), 73–81.

    Article  Google Scholar 

  • Cho, Y.-H., An, E.-M., Lee, S.-J., Chon, C.-M., & Kim, D.-J. (2016b). Influence of fine aggregate properties on unhardened geopolymer concrete. Journal of the Korean Recycled Construction Resources Institute, 4(2), 101–111.

    Article  Google Scholar 

  • Cho, Y. K., Jung, S. H., & Choi, Y. C. (2019). Effects of chemical composition of fly ash on compressive strength of fly ash cement mortar. Construction and Building Materials, 204, 255–264.

    Article  Google Scholar 

  • Cho, Y. K., & Lee, K. M. (2019). Effect of chemical properties of fly ash on the compressive strength of geopolymer. Construction and Building Materials, 204, 255.

    Article  Google Scholar 

  • Derbeko, P., Yaniv, R. E., & Meir, R. (2022). Variance optimized bagging European conference on machine learning. Berlin: Springer, Berlin Heidelberg.

    MATH  Google Scholar 

  • Diamond, S. (1983). On the glass present in low-calcium and in high-calcium flyashes. Cement and Concrete Research, 13(4), 459–464.

    Article  Google Scholar 

  • Donatello, S., Tyrer, M., & Cheeseman, C. R. (2010). Comparison of test methods to assess pozzolanic activity. Cement and Concrete Composites, 32(2), 121–127.

    Article  Google Scholar 

  • Doroudi, S. (2020). The bias-variance tradeoff: How data science can inform educational debates. AERA Open, 6, 2332858420977208.

    Article  Google Scholar 

  • Durdziński, P. T., Ben Haha, M., Bernal, S. A., De Belie, N., Gruyaert, E., Lothenbach, B., Menéndez Méndez, E., Provis, J. L., Schöler, A., Stabler, C., Tan, Z., Villagrán Zaccardi, Y., Vollpracht, A., Winnefeld, F., Zając, M., & Scrivener, K. L. (2017). Outcomes of the RILEM round robin on degree of reaction of slag and fly ash in blended cements. Materials and Structures, 50(2), 1.

    Article  Google Scholar 

  • Durdziński, P. T., Dunant, C. F., Haha, M. B., & Scrivener, K. L. (2015). A new quantification method based on SEM-EDS to assess fly ash composition and study the reaction of its individual components in hydrating cement paste. Cement and Concrete Research, 73, 111–122.

    Article  Google Scholar 

  • Fowler DDW. (2013) Characterizing fly ash, center for transportation research.

  • Geman, B. E. S., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.

    Article  Google Scholar 

  • Göktepe, A. B., Sezer, A., Sezer, G. İ, & Ramyar, K. (2008). Classification of time-dependent unconfined strength of fly ash treated clay. Construction and Building Materials, 22(4), 675–683.

    Article  Google Scholar 

  • Hasanbeigi, A., Price, L., & Lin, E. (2012). Emerging energy-efficiency and CO2 emission-reduction technologies for cement and concrete production: A technical review. Renewable and Sustainable Energy Reviews, 16(8), 6220–6238.

    Article  Google Scholar 

  • James, C. H., & Maria, M. (2001). An Approach toward a combined scheme for the petrographic classification of fly ash. Energy & Fuels, 15, 1319.

    Article  Google Scholar 

  • Jang, J. G., & Lee, H. K. (2016). Effect of fly ash characteristics on delayed high-strength development of geopolymers. Construction and Building Materials, 102, 260–269.

    Article  Google Scholar 

  • Jeon, D., Jun, Y., Jeong, Y., & Oh, J. E. (2015). Microstructural and strength improvements through the use of Na2 CO3 in a cementless Ca(OH)2 -activated Class F fly ash system. Cement and Concrete Research, 67, 215–225.

    Article  Google Scholar 

  • Jeon, D., Yum, W. S., Jeong, Y., & Oh, J. E. (2018). Properties of quicklime(CaO)-activated Class F fly ash with the use of CaCl2. Cement and Concrete Research, 111, 147–156.

    Article  Google Scholar 

  • John MF. (2017). Fly ash classification—old and new ideas.

  • Jung-Il, S., Yum, W. S., Sim, S., Park, H.-G., & Oh, J. E. (2020). Effect of magnesium formate as compared with magnesium oxide on the strength enhancement and microstructures of CaO-activated Class F fly ash system. Construction and Building Materials, 253, 119140.

    Article  Google Scholar 

  • Kang, N.-H., Chon, C.-M., Jou, H.-T., & Lee, S. (2013). Effect of particle size and unburned carbon content of fly ash from hadong power plant on compressive strength of geopolymers. Korean Journal of Materials Research, 23(9), 510–516.

    Google Scholar 

  • Kim, B., Heo, Y.-E., Chon, C.-M., & Lee, S.-J. (2018). Influence of Na/Al ratio and curing temperature of geopolymers on efflorescence reduction. Resources Recycling, 27(6), 59.

    Google Scholar 

  • Kim, Y., Kim, K., & Jeong, G.-Y. (2017). Study of detailed geochemistry of hazardous elements in weathered coal ashes. Fuel, 193, 343–350.

    Article  Google Scholar 

  • Lee SI, Lundberg SM. 2017. A unified approach to interpreting model predictions, Advances in neural information processing systems. p. 4765–4774.

  • Mangalathu, S., Hwang, S.-H., & Jeon, J.-S. (2020). Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219, 110927.

    Article  Google Scholar 

  • Matias, T., Souza, F., Araújo, R., & Antunes, C. H. (2014). Learning of a single-hidden layer feedforward neural network using an optimized extreme learning machine. Neurocomputing, 129, 428–436.

    Article  Google Scholar 

  • Moomen, M., & Siddiqui, C. (2022). Probabilistic deterioration modeling of bridge component condition with random effects. Journal of Structural Integrity and Maintenance, 7(3), 151–160.

    Article  Google Scholar 

  • Moon, G. D., Oh, S., & Choi, Y. C. (2016). Effects of the physicochemical properties of fly ash on the compressive strength of high-volume fly ash mortar. Construction and Building Materials, 124, 1072–1080.

    Article  Google Scholar 

  • Moreno, N., Querol, X., Andres, J., Stanton, K., Towler, M., Nugteren, H., Janssenjurkovicova, M., & Jones, R. (2005). Physico-chemical characteristics of European pulverized coal combustion fly ashes. Fuel, 84(11), 1351–1363.

    Article  Google Scholar 

  • Mukhopadhyay, A. K., Liu, K.-W., & Jalal, M. (2019). An innovative approach to fly ash characterization and evaluation to prevent alkali-silica reaction. ACI Materials Journal. https://doi.org/10.14359/51716751

    Article  Google Scholar 

  • Oey, T., Timmons, J., Stutzman, P., Bullard, J. W., Balonis, M., Bauchy, M., & Sant, G. (2017). An improved basis for characterizing the suitability of fly ash as a cement replacement agent. Journal of the American Ceramic Society, 100(10), 4785–4800.

    Article  Google Scholar 

  • Oh, J. E., Jun, Y., & Jeong, Y. (2014). Characterization of geopolymers from compositionally and physically different Class F fly ashes. Cement and Concrete Composites, 50, 16–26.

    Article  Google Scholar 

  • Oh, J. E., Jun, Y., Jeong, Y., & Monteiro, P. J. M. (2015). The importance of the network-modifying element content in fly ash as a simple measure to predict its strength potential for alkali-activation. Cement and Concrete Composites, 57, 44–54.

    Article  Google Scholar 

  • Paris, J. M., Roessler, J. G., Ferraro, C. C., DeFord, H. D., & Townsend, T. G. (2016). A review of waste products utilized as supplements to Portland cement in concrete. Journal of Cleaner Production, 121, 1–18.

    Article  Google Scholar 

  • Park, B., & Choi, Y. C. (2019). Prediction of self-healing potential of cementitious materials incorporating crystalline admixture by isothermal calorimetry. International Journal of Concrete Structures and Materials. https://doi.org/10.1186/s40069-019-0349-9

    Article  Google Scholar 

  • Pietersen, H., Fraay, A., & Bijen, J. (1989). Reactivity of fly ash at high pH, in fly ash and coal conversion by-products: Characterization, utilization and disposal VI. Materials research society symposium proceedings.

    Google Scholar 

  • Saraber A. (2017). Fly ash from coal and biomass for use in concrete Origin, properties and performance.

  • Schocken, A. G. S. (1991). Neural networks for decision support systems: problems and opportunities center for research on information systems. New York: Stern School of Business, New York University.

    Google Scholar 

  • Schwenk, B. Y. H. (2000). Boosting neural networks. Neural Computation, 12(8), 1869–1887.

    Article  Google Scholar 

  • Sheare, C. R. (2014). The Productive reuse of coal, biomass and co-fired fly ash. ACI Materials Journal. https://doi.org/10.14359/51686827

    Article  Google Scholar 

  • Shi, C., & Zheng, K. (2007). A review on the use of waste glasses in the production of cement and concrete. Resources, Conservation and Recycling, 52(2), 234–247.

    Article  Google Scholar 

  • Shon, C. S. (2004). Testing the effectiveness of class C and class F fly ash in controlling expansion due to alkali-silica reaction using modified ASTM C 1260 test method. Journal of Materials in Civil Engineering, 16(1), 20.

    Article  MathSciNet  Google Scholar 

  • Sindhunata, J. S. J., van Deventer, G. C., & Lukey, H. X. (2006). Effect of curing temperature and silicate concentration on fly-ash-based geopolymerization. Industrial & Engineering Chemistry, 45, 3559.

    Article  Google Scholar 

  • Singh, G. B., & Subramaniam, K. (2018). Characterization of Indian fly ashes using different experimental techniques. Indian Concrete Journal, 92, 10.

    Google Scholar 

  • Snellings, R., & Scrivener, K. L. (2016). Rapid screening tests for supplementary cementitious materials: Past and future. Materials and Structures, 49(8), 3265–3279.

    Article  Google Scholar 

  • Song, Y., Yang, K., Chen, J., Wang, K., Sant, G., & Bauchy, M. (2021). Machine learning enables rapid screening of reactive fly ashes based on their network topology. ACS Sustainable Chemistry & Engineering, 9(7), 2639–2650.

    Article  Google Scholar 

  • Suárez-Ruiz, I., Valentim, B., Borrego, A. G., Bouzinos, A., Flores, D., Kalaitzidis, S., Malinconico, M. L., Marques, M., Misz-Kennan, M., Predeanu, G., Montes, J. R., Rodrigues, S., Siavalas, G., & Wagner, N. (2017). Development of a petrographic classification of fly-ash components from coal combustion and co-combustion (An ICCP Classification System, Fly-Ash working group—commission III.). International Journal of Coal Geology, 183, 188–203.

    Article  Google Scholar 

  • Suh JI, Park HG. 2019 Development of one-part lime-activated fly ash binders with formate and nitrate compounds.

  • Suh, J.-I., Yum, W. S., Song, H., Park, H.-G., & Oh, J. E. (2019). Influence of calcium nitrate and sodium nitrate on strength development and properties in quicklime (CaO)-activated Class F fly ash system. Materials and Structures. https://doi.org/10.1617/s11527-019-1413-2

    Article  Google Scholar 

  • Sumer, M. (2012). Compressive strength and sulfate resistance properties of concretes containing class F and class C fly ashes. Construction and Building Materials, 34, 531–536.

    Article  Google Scholar 

  • Tangri, N., Ansell, D., & Naimark, D. (2008). Predicting technique survival in peritoneal dialysis patients: Comparing artificial neural networks and logistic regression. Nephrology, Dialysis, Transplantation, 23(9), 2972–2981.

    Article  Google Scholar 

  • Vassilev, S. V., & Vassilev, C. G. (1996). Occurrence, abundance and origin of minerals in coals and coal ashes. Fuel Processing Technology, 48, 85.

    Article  Google Scholar 

  • Ward, C., & French, D. (2006). Determination of glass content and estimation of glass composition in fly ash using quantitative X-ray diffractometry. Fuel, 85(16), 2268–2277.

    Article  Google Scholar 

  • Wardhono, A. (2017). Comparison study of class F and class C fly ashes as cement replacement material on strength development of non-cement mortar. Materials Science ans Engineering, 288, 012019.

    Google Scholar 

  • Williams, R. P., & van Riessen, A. (2010). Determination of the reactive component of fly ashes for geopolymer production using XRF and XRD. Fuel, 89(12), 3683–3692.

    Article  Google Scholar 

  • Wolpert, H. D. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.

    Article  Google Scholar 

  • Xu, G., & Shi, X. (2018). Characteristics and applications of fly ash as a sustainable construction material: A state-of-the-art review. Resources, Conservation and Recycling, 136, 95–109.

    Article  Google Scholar 

  • Yang, X., Wang, Y., Byrne, R., Schneider, G., & Yang, S. (2019). Concepts of artificial intelligence for computer-assisted drug discovery. Chemical Reviews, 119(18), 10520–10594.

    Article  Google Scholar 

  • Yoon, I.-S., Chang, C., & Nam, J.-W. (2022). Effect of carbonation on chloride transportation parameters in cementitious materials. Journal of Structural Integrity and Maintenance, 7(3), 161–167.

    Article  Google Scholar 

  • Zachariasen, W. H. (1932). The Atomic arrangement in glass. Journal of the American Chemical Society, 54, 3841–3851.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Republic of Korea (NRF-2021R1A2C4001944). The Institute of Engineering Research in Seoul National University provided research facilities for this work.

Funding

This work is supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Republic of Korea (NRF-2021R1A2C4001944).

Author information

Authors and Affiliations

Authors

Contributions

WP: investigation, data curation, formal analysis, visualization, writing—original draft. J: conceptualization, writing—reviewing and editing, and supervision.

Corresponding author

Correspondence to Juhyuk Moon.

Ethics declarations

Ethics Approval and Consent to Participate

All authors of the manuscript confirm the ethics approval and consent to participate following the Journal’s policies.

Consent for Publication

All authors of the manuscript agree on the publication of this work in the International Journal of Concrete Structures and Materials.

Competing Interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Journal information: ISSN 1976-0485 / eISSN 2234-1315.

Supplementary Information

Additional file 1.

Fly ash dataset.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, WY., Moon, J. Machine Learning Based Reactivity Prediction of Fly Ash Type F Produced from South Korea. Int J Concr Struct Mater 17, 58 (2023). https://doi.org/10.1186/s40069-023-00622-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40069-023-00622-3

Keywords