首页 | 本学科首页   官方微博 | 高级检索  
     

基于预训练语言模型的商品属性抽取
引用本文:张世奇,马进,周夏冰,贾昊,陈文亮,张民.基于预训练语言模型的商品属性抽取[J].中文信息学报,2022,36(1):56-64.
作者姓名:张世奇  马进  周夏冰  贾昊  陈文亮  张民
作者单位:苏州大学 计算机科学与技术学院,江苏 苏州 215006
基金项目:国家自然科学基金(61876115)
摘    要:属性抽取是构建知识图谱的关键一环,其目的是从非结构化文本中抽取出与实体相关的属性值.该文将属性抽取转化成序列标注问题,使用远程监督方法对电商相关的多种来源文本进行自动标注,缓解商品属性抽取缺少标注数据的问题.为了对系统性能进行精准评价,构建了人工标注测试集,最终获得面向电商的多领域商品属性抽取标注数据集.基于新构建的数...

关 键 词:属性抽取  远程监督  预训练语言模型  跨领域学习

Pre-trained Language Models for Product Attribute Extraction
ZHANG Shiqi,MA Jin,ZHOU Xiabing,JIA Hao,CHEN Wenliang,ZHANG Min.Pre-trained Language Models for Product Attribute Extraction[J].Journal of Chinese Information Processing,2022,36(1):56-64.
Authors:ZHANG Shiqi  MA Jin  ZHOU Xiabing  JIA Hao  CHEN Wenliang  ZHANG Min
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Attribute extraction is a key step of constructing a knowledge graph. In this paper, the task of attribute extraction is converted into a sequence labeling problem. Due to a lack of labeling data in product attribute extraction, we use the distant supervision to automatically label multiple source texts related to e-commerce. In order to accurately evaluate the performance of the system, we construct a manually annotated test set, and finally obtain a new data set for product attribute extraction in multi-domains. Based on the newly constructed data set, we carried out intra-domain and cross-domain attribute extraction for a variety of pre-trained language models. The experimental results show that the pre-trained language models can better improve the extraction performance. Among them, ELECTRA performs the best in attribute extraction in in-domain experiments, and BERT performs the best in cross-domain experiments. we also find that adding a small amount of target domain annotation data can effectively improve the performance cross-domain attribute extraction and enhance the domain adaptability of the model.
Keywords:attribute extraction  distant supervision  pre-trained language model  domain adaptation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号