首页 | 本学科首页   官方微博 | 高级检索  
     


Named entity recognition for Chinese construction documents based on conditional random field
Authors:Qiqi ZHANG  Cong XUE  Xing SU  Peng ZHOU  Xiangyu WANG  Jiansong ZHANG
Affiliation:1. College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China2. School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100081, China3. School of Design and the Built Environment, Curtin University, Perth, Western Australia 6845, Australia4. School of Construction Management Technology, Purdue University, West Lafayette, IN 47907, USA
Abstract:Named entity recognition (NER) is essential in many natural language processing (NLP) tasks such as information extraction and document classification. A construction document usually contains critical named entities, and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency. This study presents a NER method for Chinese construction documents based on conditional random field (CRF), including a corpus design pipeline and a CRF model. The corpus design pipeline identifies typical NER tasks in construction management, enables word-based tokenization, and controls the annotation consistency with a newly designed annotating specification. The CRF model engineers nine transformation features and seven classes of state features, covering the impacts of word position, part-of-speech (POS), and word/character states within the context. The F1-measure on a labeled construction data set is 87.9%. Furthermore, as more domain knowledge features are infused, the marginal performance improvement of including POS information will decrease, leading to a promising research direction of POS customization to improve NLP performance with limited data.
Keywords:NER  NLP  Chinese language  construction document  
点击此处可从《工程管理前沿(英文版)》浏览原始摘要信息
点击此处可从《工程管理前沿(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号