共查询到19条相似文献,搜索用时 109 毫秒
1.
基于依存分析改进贝叶斯模型的词义消歧 总被引:4,自引:0,他引:4
词义消歧一直是自然语言处理领域的关键问题和难点之一。目前进行的很多词义消歧研究多采用几个多义词作为实验测试对象,在实际应用方面存在着局限性。本文对大规模真实文本进行了词义消歧研究,采用了基于依存分析改进贝叶斯分类模型的有指导词义消歧方法。该模型充分利用依存句法分析,从句子的内部结构,寻找词语之间支配与被支配的关系,借以确定能够对词语语义构成内在限制的上下文,有效地克服了单纯贝叶斯分类器中无关上下文造成的噪声影响。本实验的开放测试正确率可以达到91.89%,封闭实验正确率可达99.4%,验证了改进模型的有效性。 相似文献
2.
神经网络和贝叶斯网络在汉语词义消歧上的对比研究 总被引:5,自引:0,他引:5
神经网络和贝叶斯网络是两种经典的机器学习方法。本文通过实验考察了这两种网络模型在汉语词义消歧上的应用效果。实验对象是通过特定规则构造的6个伪词。使用伪词可以避免有指导的词义消歧方法中的数据稀疏问题,充分验证词义分类器的实验效果。贝叶斯网络用于词义分类简单高效,模型容易构造,而神经网络的结构则相对复杂,用于词义消歧需要先解决输入问题。实验中采用词间互信息成功构造了神经网络的输入模型,实验效果较为理想。实验数据表明贝叶斯网络比神经网络更适合解决汉语词义消歧问题。但贝叶斯网络的抗噪声能力却明显逊色于神经网络。 相似文献
3.
基于矢量空间模型和最大熵模型的词义问题解决策略 总被引:2,自引:0,他引:2
针对单义词的词义问题构建了融合触发对(trigger pair)的矢量空间模型用来进行词义相似度的计算,并以此为基础进行了词语的聚类;针对多义词的词义问题应用融合远距离上下文信息的最大熵模型进行了有导词义消歧的研究。为克服以往词义消歧评测中通过人工构造带有词义标记的测试例句而带来的覆盖程度小、主观影响大等问题,将模型的评测直接放到了词语聚类和分词歧义这两个实际的应用中。分词歧义的消解正确率达到了92%,词语聚类的结果满足进一步应用的需要。 相似文献
4.
为缓解译文消歧任务中消歧知识获取困难及数据稀疏问题,提出了一种基于Web的挖掘双语词汇相关关系的无指导译文消歧方法。该方法将双语词汇在语料库中的间接相关拓展到Web,提出了基于Web的双语词汇间接相关模型,在此基础上又提出了一种基于Web的双语词汇相关度的消歧方法,通过构造不同queries并利用搜索引擎抽取返回页面的page counts,最后利用点式互信息来计算词汇间的相关度并用于消歧决策。该方法最好性能(P_(mar)=0.464)超过了国际语义评测Semeval-2007的Task #5上可比较的最好无指导系统TorMd。 相似文献
5.
6.
7.
8.
9.
10.
针对基于CNN的立体匹配方法中特征提取难以较好学习全局和远程上下文信息的问题,提出一种基于Swin Transformer的立体匹配网络改进模型(stereo matching net with swin transformer fusion,STransMNet)。分析了在立体匹配过程中,聚合局部和全局上下文信息的必要性和匹配特征的差异性。改进了特征提取模块,把基于CNN的方法替换为基于Transformer的Swin Transformer方法;并在Swin Transformer中加入多尺度特征融合模块,使得输出特征同时包含浅层和深层语义信息;通过提出特征差异化损失改进了损失函数,以增强模型对细节的注意力。最后,在多个公开数据集上与STTR-light模型进行了对比实验,误差(End-Point-Error,EPE)和匹配错误率3 px error均有明显降低。 相似文献
11.
Botnets often use domain generation algorithms (DGA) to connect to a command and control (C2) server, which enables the compromised hosts connect to the C2 server for accessing many domains. The detection of DGA domains is critical for blocking the C2 server, and for identifying the compromised hosts as well. However, the detection is difficult, because some DGA domain names look normal. Much of the previous work based on statistical analysis of machine learning relies on manual features and contextual information, which causes long response time and cannot be used for real-time detection. In addition, when a new family of DGA appears, the classifier has to be re-trained from the very beginning. This paper presents a deep learning approach based on bidirectional long short-term memory (Bi-LSTM) model for DGA domain detection. The classifier can extract features without the need for manual feature extraction, and the trainable model can effectively deal with new unknown DGA family members. In addition, the proposed model only needs the domain name without any additional context information. All domain names are preprocessed by bigram and the length of each processed domain name is set as a value longer than the most samples. Bidirectional LSTM model receives the encoded data and returns labels to check whether domain names are normal or not. Experiments show that our model outperforms state-of-the-art approaches and is able to detect new DGA families reliably. 相似文献
12.
Using a Bayes classifier to optimize alarm generation to electric power generator stator overheating
Fischer D. Szabados B. Poehlman W.F.S. 《IEEE transactions on instrumentation and measurement》2003,52(3):703-709
This paper shows how a Bayes classifier can be implemented for a failure detection system where statistical failure data is not available for one of the classes. Results of field data obtained from a large electric power generator are shown. The classifier is further improved by the iterative re-evaluation of the prior probabilities, which results in the use of higher alarm threshold values when a good agreement between the monitored quantity and its estimated value is observed, while large disagreement values result in smaller thresholds. As expected, the proposed system is an improvement over a classical Bayesian implementation and a large improvement over a fixed, arbitrary value threshold classifier. 相似文献
13.
考虑到卷积神经网络在滚动轴承故障诊断中存在网络结构难以确定、训练次数过多、时间过长等问题,设计了一种贝叶斯优化改进LeNet-5算法,以及采用该算法构建的轴承故障诊断模型。采用贝叶斯优化训练过程中学习率等超参数,多种故障轴承的振动信号直接作为改进LeNet-5网络的输入,对池化输出采用批归一化处理和改进池化层激活函数防止过拟合,利用全局平均池化层替代全连接层提高改进LeNet-5网络的泛化能力,用Softmax分类器实现滚动轴承故障的分类。通过轴承数据库开展实验,实验表明,该算法构建的轴承故障诊断模型在训练集上准确率为99.94%,验证集上的准确率为99.89%,测试集准确率也达到99.65%,与一维卷积神经网络和二维卷积神经网络对比分析,基于贝叶斯优化改进LeNet-5算法构建的轴承故障诊断模型在滚动轴承的故障诊断模型具有更高的准确率,更少的训练次数和训练时间。 相似文献
14.
15.
An improved classifier based on the nearest feature plane (NFP), called the centre-based restricted nearest feature plane with the angle (RNFPA) classifier, is proposed for the face recognition problems here. The famous NFP uses the geometrical information of samples to increase the number of training samples, but it increases the computation complexity and it also has an inaccuracy problem coursed by the extended feature plane. To solve the above problems, RNFPA exploits a centre-based feature plane and utilizes a threshold of angle to restrict extended feature space. By choosing the appropriate angle threshold, RNFPA can improve the performance and decrease computation complexity. Experiments in the AT&T face database, AR face database and FERET face database are used to evaluate the proposed classifier. Compared with the original NFP classifier, the nearest feature line (NFL) classifier, the nearest neighbour (NN) classifier and some other improved NFP classifiers, the proposed one achieves competitive performance. 相似文献
16.
17.
With the continuous expansion of software scale, software update and
maintenance have become more and more important. However, frequent software code
updates will make the software more likely to introduce new defects. So how to predict the
defects quickly and accurately on the software change has become an important problem
for software developers. Current defect prediction methods often cannot reflect the feature
information of the defect comprehensively, and the detection effect is not ideal enough.
Therefore, we propose a novel defect prediction model named ITNB (Improved Transfer
Naive Bayes) based on improved transfer Naive Bayesian algorithm in this paper, which
mainly considers the following two aspects: (1) Considering that the edge data of the test
set may affect the similarity calculation and final prediction result, we remove the edge data
of the test set when calculating the data similarity between the training set and the test set;
(2) Considering that each feature dimension has different effects on defect prediction, we
construct the calculation formula of training data weight based on feature dimension weight
and data gravity, and then calculate the prior probability and the conditional probability of
training data from the weight information, so as to construct the weighted bayesian
classifier for software defect prediction. To evaluate the performance of the ITNB model,
we use six datasets from large open source projects, namely Bugzilla, Columba, Mozilla,
JDT, Platform and PostgreSQL. We compare the ITNB model with the transfer Naive
Bayesian (TNB) model. The experimental results show that our ITNB model can achieve
better results than the TNB model in terms of accurary, precision and pd for within-project
and cross-project defect prediction. 相似文献
18.
针对高精度谐振式露点测量系统中电路故障诊断问题,提出了一种基于改进的麻雀搜索算法(Improved Sparrow Search Algorithm, ISSA)优化智能分类器参数的电路故障诊断模型,采用测前仿真故障诊断方法中的智能诊断方法,选择适用于小样本、非线性问题的支持向量机(Support Vector Machine, SVM)作为智能分类器,针对麻雀搜索算法中收敛速度慢、易陷入局部最优等问题进行改进,并将改进后的优化算法用于SVM参数寻优,构建ISSA?SVM故障诊断模型用于谐振电路故障诊断。实验结果显示,ISSA?SVM模型在建立的电路上能够达到88.9%的故障诊断率,可靠性较强,能够作为高精度谐振式露点传感器电路的故障诊断方法。 相似文献
19.
Satellite-based remote sensing imaging can provide continuous snapshots of the Earth’s surface over long periods. River extraction from remote sensing images is useful for the comprehensive study of dynamic changes of rivers over large areas. This paper presents a new method of extracting rivers by using training samples based on the mathematical morphology, Bayesian classifier and a dynamic alteration filter. The use of a training map from erosion morphology helps to extract the non-predictive river’s curves in the image. The algorithm has two phases: creating the profile to separate river area via evaluated morphological erosion and dilation, namely, a training map; and improving the river’s image segmentation using the Bayesian rule algorithm in which two consecutive filters swipe false positive (non-water area) along the image. The proposed algorithm was tested on the Kuala Terengganu district, Malaysia, an area that includes a river, a bridge, dam and a fair amount of vegetation. The results were compared with two standard methods based on visual perception and on peak signal-to-noise ratio, respectively. The novelty of this approach is the definition of the contextual information filtering technique, which provides an accurate extraction of river segmentation from satellite images. 相似文献