首页 | 本学科首页   官方微博 | 高级检索  
     

基于标签分层延深建模的企业画像构建方法
引用本文:丁行硕,李翔,谢乾. 基于标签分层延深建模的企业画像构建方法[J]. 计算机应用, 2022, 42(4): 1170-1177. DOI: 10.11772/j.issn.1001-9081.2021071248
作者姓名:丁行硕  李翔  谢乾
作者单位:淮阴工学院 计算机与软件工程学院, 江苏 淮安 223003
江苏卓易信息科技股份有限公司, 江苏 宜兴 214200
南京百敖软件有限公司, 南京 210032
基金项目:国家自然科学基金资助项目(71874067);
摘    要:标签建模是标签体系建设和画像构建的基本任务。而传统标签建模方法存在模糊标签处理难、标签提取不合理,以及无法有效融合多模实体和多维关系等问题。针对以上问题提出了一种基于标签分层延深建模的企业画像构建方法EPLLD。首先,通过多源信息融合获取多特征信息,并对企业模糊标签(如批发、零售等行业中的不能完整概括企业特点的标签)进行统计和筛选;然后,建立专业领域词库进行特征拓展,并结合BERT语言模型进行多特征提取;其次,利用双向长短期记忆(BiLSTM)网络获取模糊标签延深结果;最后,通过TF-IDF、TextRank、隐含狄利克雷分布(LDA)模型提取关键词,从而实现标签的分层延深建模。在同一企业数据集上进行实验分析,结果表明在模糊标签延深任务中EPLLD的精确率达到91.11%,高于BiLSTM+Attention、BERT+Deep CNN等8种标签处理方法。

关 键 词:企业画像  标签建模  多源信息融合  模糊标签  特征提取  
收稿时间:2021-07-16
修稿时间:2021-09-01

Enterprise portrait construction method based on label layering and deepening modeling
DING Xingshuo,LI Xiang,XIE Qian. Enterprise portrait construction method based on label layering and deepening modeling[J]. Journal of Computer Applications, 2022, 42(4): 1170-1177. DOI: 10.11772/j.issn.1001-9081.2021071248
Authors:DING Xingshuo  LI Xiang  XIE Qian
Affiliation:Faculty of Computer and Software,Huaiyin Institute of Technology,Huaian Jiangsu 223003,China
Jiangsu Eazytec Company Limited,Yixing Jiangsu 214200,China
Nanjing Byosoft Company Limited,Nanjing Jiangsu 210032,China
Abstract:Label modeling is the basic task of label system construction and portrait construction. Traditional label modeling methods have problems such as difficulty in processing fuzzy labels, unreasonable label extraction, and ineffective integration of multi-modal entities and multi-dimensional relationships. Aiming at these problems, an enterprise profile construction method based on label layering and deepening modeling, called EPLLD (Enterprise Portrait of Label Layering and Deepening), was proposed. Firstly, the multi-characteristic information was extracted through multi-source information fusion, and the fuzzy labels of enterprises (such as labels in wholesale and retail industries that cannot fully summarize the characteristics of enterprises) were counted and screened. Secondly, the professional domain lexicon was established for feature expansion, and the BERT (Bidirectional Encoder Representation from Transformers) language model was combined for multi-feature extraction. Thirdly, Bi-directional Long Short-Term Memory (BiLSTM) was used to obtain fuzzy label deepening results. Finally, the keywords were extracted through TF-IDF (Term Frequency-Inverse Document Frequency), TextRank, and Latent Dirichlet Allocation (LDA) model to achieve label layering and deepening modeling. Experimental analysis on the same enterprise dataset shows that the precision of EPLLD in the fuzzy label deepening task is 91.11%, which is higher than those of 8 label processing methods such as BiLSTM+Attention and BERT+Deep CNN.
Keywords:enterprise portrait  label modeling  multi-source information fusion  fuzzy label  feature extraction  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号