贝叶斯优化的XGBoost在小断层地震解释中的应用

Application of Bayesian optimized XGBoost in seismic interpretation of small-scale faults

  • 摘要: 为了进一步提高小断层地震解释的精确度 ,提出了利用信息价值对地震属性进行约简,结合改进的贝叶斯优化算法,优化XGBoost参数以识别小断层。首先,对采区地震属性数据进行预处理,去除异常样本和大噪声样本;然后针对处理后样本的每一个特征进行卡方分箱,计算每一个分箱中的证据权重(WOE),进而得到每一个特征的信息价值(IV),以此作为各个特征的重要度,对信息价值小的特征进行约简,去除高噪声特征属性;同时,给小断层地震数据增加一定程度的噪声来增强模型的抗噪能力;接着,构建XGBoost模型,因正负样本分布不均衡,提出了一种改进XGBoost目标函数的方法来平衡正负样本的训练权重。针对其重要参数选择问题,选用贝叶斯算法优化XGBoost模型的参数。因为贝叶斯优化算法不易平衡“开采”(Exploit)和“探索”(Explore)的过程,导致寻优效率不高,易陷入局部最优点,提出一种自适应平衡因子变化算法,动态地平衡pi采集函数“开采”和“探索”的过程,提升参数优化过程的鲁棒性。实验数据表明:利用采集函数优化的贝叶斯算法,用来优化目标函数改进的XGBoost,构建的新XGBoost模型框架(SAPI-Bay-ImpXGBoost)相比于BP神经网络、支持向量机(SVM)、K近邻(KNN)、AdaBoost预测精度更高,有助于提高小断层地震识别的准确率。

     

    Abstract: In order to further improve the identification accuracy of small-scale faults in seismic interpretation, Bayesian optimized extreme gradient boosting (XGBoost) model was constructed to recognize small-scale faults across coalbeds using reduced seismic attributes based on the theory of information value(IV). Firstly, the seismic attribute data of the mining area were preprocessed to remove abnormal samples and large noise samples. Secondly, chi-square bins were performed for each feature of the processed model, the weight of evidence (WOE) was calculated in each container, and the information value of each element was obtained, which is used as the importance of each feature. Features with low information values were reduced to remove high-noise feature attributes. At the same time,a certain degree of noise is added to the seismic data of small-scale faults to enhance the anti-noise ability of the model. Finally, the Bayesian optimized XGBoost model was constructed. The method to improve the XGBoost objective function was proposed to balance the training weights of the positive and negative examples. As the acquisition function of the Bayesian optimized algorithm quickly falls into the local optimum, it does not easily balance the “exploit” and “explore” approach. Therefore, this paper proposes an adaptive balance factor change algorithm, which dynamically ground balances the process of “mining” and “exploring” the pi acquisition function to improve the robustness of the parameter optimization process. Comparing the identification outcomes, the new XGBoost model framework (SAPI-Bay-ImpXGBoost) has a higher prediction accuracy than BP neural network, Support Vector Machine(SVM), K-nearst neighbors(KNN) and Adaptive Boosting(AdaBoost). In summary, the proposed method can further strengthen the identification of small-scale faults in coal mining areas.

     

/

返回文章
返回
Baidu
map