首页 | 本学科首页   官方微博 | 高级检索  
     

语义增强的多策略政策术语抽取系统
引用本文:曹秀娟,马志柔,朱涛,张庆文,杨燕,叶丹. 语义增强的多策略政策术语抽取系统[J]. 计算机系统应用, 2022, 31(9): 152-158
作者姓名:曹秀娟  马志柔  朱涛  张庆文  杨燕  叶丹
作者单位:广西大学 计算机与电子信息学院, 南宁 530004;中国科学院 软件研究所 软件工程技术研究开发中心, 北京 100190;政和科技股份有限公司, 济南 250000
基金项目:国家自然科学基金(61802381)
摘    要:针对政策术语具有时效性、低频度、稀疏性和复合短语的特点,传统术语抽取方法难以满足需求的问题,设计实现了语义增强的多策略政策术语抽取系统.该系统从频繁项挖掘和语义相似度两个维度对政策文本特征进行建模,融合多种频繁模式挖掘策略选取特征种子词,利用预训练语言模型增强语义匹配来召回低频且稀疏的政策术语,实现了从无词库冷启动到有词库热启动半自动化的政策术语抽取.该系统能够提升政策文本分析效果,为建设智慧政务服务平台提供技术支持.

关 键 词:术语抽取  多策略  语义增强  低频度  词库构建
收稿时间:2021-12-21
修稿时间:2022-01-24

Semantic Enhanced Multi-strategy Policy Term Extraction System
CAO Xiu-Juan,MA Zhi-Rou,ZHU Tao,ZHANG Qing-Wen,YANG Yan,YE Dan. Semantic Enhanced Multi-strategy Policy Term Extraction System[J]. Computer Systems& Applications, 2022, 31(9): 152-158
Authors:CAO Xiu-Juan  MA Zhi-Rou  ZHU Tao  ZHANG Qing-Wen  YANG Yan  YE Dan
Affiliation:School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China;Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;Zhenghe Technology Co. Ltd., Jinan 250000, China
Abstract:Policy terms are characterized by timeliness, low frequency, sparsity, and compound phrases. To address the difficulty of traditional term extraction methods in meeting demands, we design and implement a semantic enhanced multi-strategy system of policy term extraction. The system models the features of policy texts from the two dimensions of frequent item mining and semantic similarity. Feature seed words are selected by integrating multiple frequent pattern mining strategies. Low-frequency and sparse policy terms are recalled by pre-training the language model and enhancing semantic matching. Transforming from a cold start without a thesaurus to a hot start with a thesaurus, the system achieves semi-automatic extraction of policy terms. The proposed system can improve the effect of policy text analysis and provide technical support for the construction of a smart government service platform.
Keywords:term extraction  multi-strategy  semantic enhancement  low frequency  thesaurus construction
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号