期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hybrid Deep Neural Network-Based Text Representation Model to Improve Microblog Retrieval

Ibtihel Ben Ltaifa Lobna Hlaoua Lotfi Ben Romdhane 《控制论与系统》2020,51(2):115-139

Abstract

Retrieving relevant information from Twitter is always a challenging task given its vocabulary mismatch, sheer volume and noise. Representing the content of text tweets is a critical part of any microblog retrieval model. For this reason, deep neural networks can be used for learning good representations of text data and then conduct to a better matching. In this paper, we are interested in improving both representation and retrieval effectiveness in microblogs. For that, a Hybrid-Deep Neural-Network-based text representation model is proposed to extract effective features’ representations for clustering oriented microblog retrieval. HDNN combines recurrent neural network and feedforward neural network architectures. Specifically, using a bi-directional LSTM, we first generate a deep contextualized word representation which incorporates character n-grams form FasText. However, these contextual embedded existing in a high-dimensional space are not all important. Some of them are redundant, correlated and sometimes noisy making the learning models over-fitting, complex and less interpretable. To deal with these problems, we proposed a Hybrid-Regularized-Autoencoder-based method which combines autoencoder with Elastic Net regularization for an effective unsupervised feature selection and extraction. Our experimental results show that the performance of clustering and especially information retrieval in microblogs depend heavily on features’ representation. 相似文献

2.

商品名称短文本快速有效分类的多基模型框架

沈雅婷左志新《计算机应用与软件》2021,38(2):185-190

提出一种适用于短文本分类的多基模型框架Bagging_fastText(B_f)。它是一种基于自举汇聚法的快速文本分类算法的框架。以fastText为基模型,运用集成学习思想,设置最优超参数并训练出多个基模型组成多基模型,再通过投票机制获取最终类别。对商品名称短文本分类的实验结果表明,提出的B_f比fastText、朴素贝叶斯传统文本分类算法、文本卷积神经网络(TextCNN)算法的分类效果更优。相似文献

3.

基于语义的档案数据智能分类方法研究

下载免费PDF全文

霍光煜张勇孙艳丰尹宝才《计算机工程与应用》2021,57(6):247-253

随着信息技术的高速发展,各种数字档案数据量出现了爆炸式的增长。如何合理地挖掘分析档案数据,提升对新收录档案智能管理的效果已成为一个亟需解决的问题。现有的档案数据分类方法是面向管理需求的人工分类,这种人工分类的方式效率低下,忽略了档案固有的内容信息。此外,对于档案信息发现和利用来说,需进一步挖掘分析档案数据内容之间的关联性。面向档案智能管理的需求,从档案数据的文本内容角度出发,对人工分类的档案进行进一步分析。采用LDA模型提取文档的主题特征向量,进而用[K]-means算法对档案的主题特征进行聚类,得到档案间的关联。针对新收录档案数据的分类问题,采用现有档案数据,有监督的训练FastText深度学习模型,用训练完成的模型对新收录的档案数据进行全自动分类。在数据集上测试的结果表明,所提聚类方法在文档数据集的准确率相较于传统的基于TF-IDF特征的聚类算法提升6%,基于FastText的档案分类方法准确率超过96%,达到了代替手工分类的级别,验证了该方法的有效性和实用性。相似文献

4.

基于FastText和关键句提取的中文长文本分类

汪家成薛涛《计算机系统应用》2021,30(8):213-218

FastText是一种准确高效的文本分类模型,但直接应用在中文长文本分类领域存在准确度不高的问题.针对该问题,提出一种融合TextRank关键子句提取和词频-逆文本频率(Term Frequency-Inverse Document Frequency,TF-IDF)的FastText中文长文本分类方法.该方法在FastText模型输入阶段使用TextRank算法提取文本的关键子句输入训练模型,同时采用TF-IDF提取文本的关键词作为特征补充,从而在减少训练语料的同时尽可能保留文本分类的关键特征.实验结果表明,此文本分类方法在数据集上准确率达到86.1％,比经典的FastText模型提高了约4％. 相似文献

5.

基于宽度和词向量特征的文本分类模型

李雪松《计算机系统应用》2021,30(3):177-183

针对词向量文本分类模型记忆能力弱,缺少全局词特征信息等问题,提出基于宽度和词向量特征的文本分类模型(WideText):首先对文本进行清洗、分词、词元编码和定义词典等,计算全局词元的词频-逆文档频度(TFIDF)指标并将每条文本向量化,将输入文本中的词通过编码映射到词嵌入矩阵中,词向量特征经嵌入和平均叠加后,和基于TF-IDF的文本向量特征进行拼接,传入到输出层后计算属于每个分类的概率.该模型在低维词向量的基础上结合了文本向量特征的表达能力,具有良好的泛化和记忆能力.实验结果表明,在引入宽度特征后,WideText分类性能不仅较词向量文本分类模型有明显提升,且略优于前馈神经网络分类器. 相似文献

6.

Chinese Relation Extraction on Forestry Knowledge Graph Construction

Qi Yue Xiang Li Dan Li 《计算机系统科学与工程》2021,37(3):423-442

Forestry work has long been weak in data integration; its initial state will inevitably affect the forestry project development and decision-quality. Knowledge Graph (KG) can provide better abilities to organize, manage, and understand forestry knowledge. Relation Extraction (RE) is a crucial task of KG construction and information retrieval. Previous researches on relation extraction have proved the performance of using the attention mechanism. However, these methods focused on the representation of the entire sentence and ignored the loss of information. The lack of analysis of words and syntactic features contributes to sentences, especially in Chinese relation extraction, resulting in poor performance. Based on the above observations, we proposed an end-to-end relation extraction method that used Bi-directional Gated Recurrent Unit (BiGRU) neural network and dual attention mechanism in forestry KG construction. The dual attention includes sentence-level and word-level, capturing relational semantic words and direction words. To enhance the performance, we used the pre-training model FastText to provide word vectors, and dynamically adjusted the word vectors according to the context. We used forestry entities and relationships to build forestry KG and used Neo4j to store forestry KG. Our method can achieve better results than previous public models in the SemEval-2010 Task 8 dataset. By training the model on forestry dataset, results showed that the accuracy and precision of FastText-BiGRU-Dual Attention exceeded 0.8, which outperformed the comparison methods, thus the experiment confirmed the validity and accuracy of our model. In the future, we plan to apply forestry KG to question and answer system and achieve a recommendations system for forestry knowledge. 相似文献

7.

基于多神经网络混合的短文本分类模型

侯雪亮李新陈远平《计算机系统应用》2020,29(10):9-19

文本分类指的是在制定文本的类别体系下,让计算机学会通过某种分类算法将待分类的内容完成分类的过程.与文本分类有关的算法已经被应用到了网页分类、数字图书馆、新闻推荐等领域.本文针对短文本分类任务的特点,提出了基于多神经网络混合的短文本分类模型（Hybrid Short Text Classical Model Base on Multi-neural Networks）.通过对短文本内容的关键词提取进行重构文本特征,并作为多神经网络模型的输入进行类别向量的融合,从而兼顾了FastText模型和TextCNN模型的特点.实验结果表明,相对于目前流行的文本分类算法而言,多神经网络混合的短本文分类模型在精确率、召回率和F1分数等多项指标上展现出了更加优越的算法性能. 相似文献

8.

Fast Sentiment Analysis Algorithm Based on Double Model Fusion

Zhixing Lin Like Wang Xiaoli Cui Yongxiang Gu 《计算机系统科学与工程》2021,36(1):175-188

Nowadays, as the number of textual data is exponentially increasing, sentiment analysis has become one of the most significant tasks in natural language processing (NLP) with increasing attention. Traditional Chinese sentiment analysis algorithms cannot make full use of the order information in context and are inefficient in sentiment inference. In this paper, we systematically reviewed the classic and representative works in sentiment analysis and proposed a simple but efficient optimization. First of all, FastText was trained to get the basic classification model, which can generate pre-trained word vectors as a by-product. Secondly, Bidirectional Long Short-Term Memory Network (Bi-LSTM) utilizes the generated word vectors for training and then merges with FastText to make comprehensive sentiment analysis. By combining FastText and Bi-LSTM, we have developed a new fast sentiment analysis, called FAST-BiLSTM, which consistently achieves a balance between performance and speed. In particular, experimental results based on the real datasets demonstrate that our algorithm can effectively judge sentiments of users’ comments, and is superior to the traditional algorithm in time efficiency, accuracy, recall and F1 criteria. 相似文献

9.

GM-FastText多通道词向量短文本分类模型

白子诚周艳玲张龑《计算机系统应用》2022,31(9):403-408

在针对短文本分类中文本特征稀疏难以提取、用词不规范导致OOV (out of vocabulary)等问题, 提出了基于FastText模型多通道嵌入词向量, 和GRU (gate recurrent unit)与多层感知机(multi-layer perceptron, MLP)混合网络结构(GRU-MLP hybrid network architecture, GM)的短文本分类模型GM-FastText. 该模型使用FastText模型以N-gram方式分别产生不同的嵌入词向量送入GRU层和MLP层获取短文本特征, 通过GRU对文本的特征提取和MLP层混合提取不同通道的文本特征, 最后映射到各个分类中. 多组对比实验结果表明: 与TextCNN、TextRNN方法对比, GM-FastText模型F1指标提升0.021和0.023, 准确率提升1.96和2.08个百分点; 与FastText, FastText-CNN, FastText-RNN等对比, GM-FastText模型F1指标提升0.006、0.014和0.016, 准确率提升0.42、1.06和1.41个百分点. 通过对比发现, 在FastText多通道词向量和GM混合结构网络的作用下, 多通道词向量在短文本分类中有更好的词向量表达且GM网络结构对多参数特征提取有更好的性能. 相似文献

10.

域信息共享的方法在蒙汉机器翻译中的应用

下载免费PDF全文

张振苏依拉牛向华高芬赵亚平仁庆道尔吉《计算机工程与应用》2020,56(10):106-114

蒙汉翻译属于低资源语言的翻译,面临着平行语料资源稀缺的困难,为了缓解平行语料数据稀缺和词汇表受限引发的翻译正确率低的问题,利用动态的数据预训练方法ELMo（Embeddings from Language Models）,并结合多任务域信息共享的Transformer翻译架构进行蒙汉翻译。利用ELMo（深层语境化词表示）进行单语语料的预训练。利用FastText词嵌入算法把蒙汉平行语料库中的上下文语境相关的大规模文本进行预训练。根据多任务共享参数以实现域信息共享的原理,构建了一对多的编码器-解码器模型进行蒙汉神经机器翻译。实验结果表明,该翻译方法比Transformer基线翻译方法在长句子输入序列中可以有效提高翻译质量。相似文献