首页 | 本学科首页   官方微博 | 高级检索  
     

本体与条件随机场结合的涉农商品名称抽取与类别标注
引用本文:黄念娥,黄河,王儒敬.本体与条件随机场结合的涉农商品名称抽取与类别标注[J].计算机应用,2017,37(1):233-238.
作者姓名:黄念娥  黄河  王儒敬
作者单位:1. 中国科学院 合肥智能机械研究所, 合肥 230031;2. 中国科学技术大学 合肥物质研究院, 合肥 230027
基金项目:国家科技支撑计划项目(2013BAD15B03);中国科学院重点部署项目(Y622A21291);安徽省科技攻关项目(1401032010)。
摘    要:传统的基于条件随机场(CRF)的信息抽取方法在进行涉农商品名称抽取与类别标注时,需要大量的训练语料,标注工作量大,且抽取精度不高。为解决该问题,提出了一种基于农业本体与CRF相结合的涉农商品名称抽取与类别标注方法,将涉农商品名称的自动抽取与分类看作序列标注的任务。首先是原始数据的分词处理和词、词性、地理属性、本体概念特征选择;然后,采用改进的拟牛顿算法训练CRF模型参数,用维特比算法实现解码,共完成4组对比实验,识别出7种类别,并将CRF和隐马尔可夫模型(HMM)、最大熵马尔可夫模型(MEMM)通过实验进行比较;最后,将CRF应用于农产品供求趋势分析。结合合适的特征模板,本体概念的加入使CRF开放测试的总体准确率提高10.20%,召回率提高59.78%,F值提高37.17%,证明了本体与CRF结合方法在涉农商品名称和类别抽取中的可行性和有效性,可以促进农产品供求对接。

关 键 词:条件随机场  农业本体  涉农商品名称  供求趋势  序列标注  
收稿时间:2016-08-02
修稿时间:2016-09-19

Agriculture-related product name extraction and category labeling based on ontology and conditional random field
HUANG Nian'e,HUANG He,WANG Rujing.Agriculture-related product name extraction and category labeling based on ontology and conditional random field[J].journal of Computer Applications,2017,37(1):233-238.
Authors:HUANG Nian'e  HUANG He  WANG Rujing
Affiliation:1. Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei Anhui 230031, China;2. Hefei Institute of Physical Science, University of Science and Technology of China, Hefei Anhui 230027, China
Abstract:Traditional information extraction method based on Conditional Random Field (CRF) requires large-scale labeled corpus, it is expensive to label corpus manually and the extraction precision is low in processing agriculture-related product name extraction and category labeling. In order to solve this problem, a method of agriculture-related product name extraction and category labeling based on agricultural ontology and CRF was proposed, automatic extraction and classification of agriculture-related product names was regarded as sequence labeling. Firstly, original data was processed, word, part of speech, geographical attributes and ontology concept features were selected. Then, parameters of the CRF model were trained by the improved quasi-Newton algorithm and decoding was implemented by Viterbi algorithm. A total of four groups of comparative experiments were completed and seven categories were identified. CRF, Hidden Markov Model (HMM) and Maximum Entropy Markov Model (MEMM) were compared through experiments. Finally, the supply and demand trend analysis of agriculture produce was accomplished. The experimental results show that the overall precision, recall and F-score of the open test were increased by 10.20%, 59.78% and 37.17% respectively by adding ontology concepts with appropriate CRF features; it also proves the feasibility, effectiveness and practical significance of the method in promoting automatic supply and demand docking of agricultural products.
Keywords:Conditional Random Field (CRF)  agricultural ontology  agriculture-related product name  supply and demand trend  sequence labeling  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号