首页 | 本学科首页   官方微博 | 高级检索  
     

混凝土坝施工文档实体知识智能挖掘方法
引用本文:田丹,沈扬,李明超,韩帅.混凝土坝施工文档实体知识智能挖掘方法[J].水力发电学报,2021,40(6):139-151.
作者姓名:田丹  沈扬  李明超  韩帅
摘    要:混凝土坝施工信息多以文档文本的形式呈现,其体量大、分布广、内在关系复杂,人工操作难以准确、高效地提取信息知识内容,理清错综复杂的施工信息关系。在自然语言处理技术中,命名实体是文本信息知识的载体,实现精确快速的实体识别是施工知识挖掘的重要前提。本文提出一种融合深度学习与关联规则技术的混凝土坝施工文档知识智能识别及挖掘分析方法。该方法耦合双向长短期记忆神经网络(bi-directional long-short term memory,Bi-LSTM)与条件随机场(conditional random field,CRF),定义混凝土坝施工实体类型,构建命名实体识别模型,形成混凝土坝施工实体知识集合;在此基础上,考虑施工文本表达规律及实体类型,预定义实体之间关系,确定施工实体组合形式,形成实体关联规则提取技术;以实体关联规则提取技术为导向,改进Apriori算法计算频繁项集,获得实体间的强关联规则。该方法应用于实际混凝土坝施工监理周报中,经过计算得到命名实体识别的精确率为86.42%,验证了该方法的准确性。利用改进Apriori算法分析实体间的关联规则,证明了改进算法的优势,有助于提升混凝土坝施工文档知识分析的智能化与精细化水平。

关 键 词:混凝土坝  施工文档  命名实体  智能识别  深度学习  知识挖掘  

Intelligent data mining approach of text entity knowledge from construction documents of concrete dams
TIAN Dan,SHEN Yang,LI Mingchao,HAN Shuai.Intelligent data mining approach of text entity knowledge from construction documents of concrete dams[J].Journal of Hydroelectric Engineering,2021,40(6):139-151.
Authors:TIAN Dan  SHEN Yang  LI Mingchao  HAN Shuai
Abstract:The construction information of concrete dams is mostly expressed in form of document text, which is characterized by a wealth of information, wide distribution, and complex internal relations; manual operation finds it difficult to accurately extract information knowledge and sort out complicated relationships of construction information. In natural language processing, named entities are the carriers of text information, and realizing accurate and fast entity recognition is an important premise of construction knowledge mining. This paper describes a knowledge intelligent recognition and analysis method that combines deep learning and association rule technique for processing the construction documents of concrete dams. The types of concrete dam construction entities are defined; the bi-directional long-short term memory (Bi-LSTM) and conditional random field (CRF) methods are used to build named entity recognition models and generate construction entity knowledge sets. Further, we develop an entity association rule extraction technique by considering the expression rules and entity types of the text, predefining the relationships between the entities, and determining their combination forms. And we use this method to improve the Apriori algorithm and obtain strong association rules by calculating the frequent itemset. Application to the weekly report text for construction supervision of a concrete dam verifies the method, and shows its accuracy of 86.4% in recognition of named entities. The improved Apriori algorithm is used to analyze the association rules between the entities, demonstrating its advantages and usefulness in raising the intelligence and refinement level of document knowledge extraction and analysis for concrete dam construction.
Keywords:concrete dam  construction document  named entity  intelligent recognition  deep learning  knowledge mining  
本文献已被 CNKI 等数据库收录!
点击此处可从《水力发电学报》浏览原始摘要信息
点击此处可从《水力发电学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号