共查询到17条相似文献,搜索用时 62 毫秒
1.
2.
基于前后文词形特征的生物医学文献句子边界识别 总被引:1,自引:0,他引:1
针对生物医学文献的特点及信息抽取的特殊要求,提出了基于前后文词形特征和有教师学习的句子边界识别算法.与针对一般英语书面语设计的句子边界识别算法不同,本文提出的算法不使用特殊的辅助词表和语法层面的特征信息,只使用前后文单词的词形信息作为句子边界识别和消歧的依据.利用这些特征设计了最大信息熵识别器和支持向量机识别器,并在Medline摘要上进行了实验,达到了超过99%的正确率.实验结果表明,最大信息熵法和支持向量机法在句子边界消歧问题上具有相近的性能,同时还表明,对生物医学文献句子边界识别,只使用词法层面的特征,不使用辅助词表和词性等语法层面的信息,仍可达到其它算法在一般英语书面语上利用辅助词表和词性信息所达到的性能. 相似文献
3.
才藏太 《计算机工程与科学》2012,34(6):187-190
藏文句子的边界识别是藏文文本分析的基础性研究,是藏文与其他语种之间建立句子级平行语料库的必要工作,也是进一步进行藏汉机器翻译的基础。本文通过分析藏文句子的结束形式,研究藏文句子边界规则,提出了一种藏文句子的边界识别方法。该方法首先利用特殊规则和词表对藏文句子进行识别,然后利用最大熵模型对有歧义的句子进一步识别。从而提高藏文句子的边界识别率。 相似文献
4.
5.
基于词类串的汉语句子结构相似度计算方法 总被引:9,自引:1,他引:9
句子相似度的衡量是基于实例机器翻译研究中最重要的一个内容。对于基于实例的汉英机器翻译研究,汉语句子相似度衡量的准确性,直接影响到最后翻译结果的输出。本文提出了一种汉语句子结构相似性的计算方法。该方法比较两个句子的词类信息串,进行最优匹配,得到一个结构相似性的值。在小句子集上的初步实验结果表明,该方法可行,有效,符合人的直观判断。 相似文献
6.
7.
针对现有计算机图像自动识别结果的边界特征完整度较差的问题,设计了一种基于边界特征的计算机图像自动识别系统。基于图像的边界特征,确定图像自动化识别系统的开发环境,设计系统的功能模块,采集与处理图像信息数据,应用Fourier系数理论,提取与识别图像信息的边界特征,利用BP神经网络算法,实现计算机图像的自动化识别。系统性能检测结果显示,对于10组随机的图片特征数据,系统的图像识别时间的均值为0.22s,具有实时性与高效性。系统图像自动识别结果的峰值信噪比的均值为29.31db,表明系统具有优良的图片去噪性能。系统图像自动识别结果的结构相似指数的均值为0.9315,非常接近于数值1,表明系统具有优良的图像边界特征保留能力。 相似文献
8.
9.
10.
11.
Novelty detection is to retrieve new information and filter redundancy from given sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach to novelty detection with semantic distance computation. The motivation is to expand a sentence by introducing semantic information. Computation on semantic distance between sentences incorporates WordNet with statistical information. The novelty detection is treated as a binary classification problem: new sentence or not. The feature vector, used in the vector space model for classification, consists of various factors, including the semantic distance from the sentence to the topic and the distance from the sentence to the previous relevant context occurring before it. New sentences are then detected with Winnow and support vector machine classifiers, respectively. Several experiments are conducted to survey the relationship between different factors and performance. It is proved that semantic computation is promising in novelty detection. The ratio of new sentence size to relevant size is further studied given different relevant document sizes. It is found that the ratio reduced with a certain speed (about 0.86). Then another group of experiments is performed supervised with the ratio. It is demonstrated that the ratio is helpful to improve the novelty detection performance. 相似文献
12.
13.
14.
Nowadays, many payment service providers use the discounts and other marketing strategies to promote their products. This also raises the issue of people
who deliberately take advantage of such promotions to reap financial benefits. These people are known as ‘scalper parties’ or ‘econnoisseurs’ which can
constitute an underground industry. In this paper, we show how to use machine learning to assist in identifying abnormal scalper transactions. Moreover,
we introduce the basic methods of Decision Tree and Boosting Tree, and show how these classification methods can be applied in the detection of abnormal
transactions. In addition, we introduce a graph computing method, which implicitly describes the characteristics of people and merchants through node
correlation, in order to mine deep features. Because of the volume of large data, we carried out reasonable block calculation, and succeeded in reducing
a large amount of data to a series of segments, thereby decreasing the computational resources and memory requirements. Compared with other work on
abnormal transaction detection, we pay more attention to creating and using the portraits of merchants or individuals to assist in decision-making. After
data analysis and model building, we find that focusing on only one transaction or one day does not yield a comprehensive number of characteristics,
and many characteristics can be obtained by examining the transactions of a person or a merchant over a period of time. Furthermore, a large number of
characteristics can be obtained from transactions in a period of time. After GBDT (Gradient Boosting Decision Tree) based classification prediction and
analysis, we can conclude that there is a clear distinction between abnormal trading shops and conventional shops, facilitating the clustering of abnormal
merchants. By filtering transaction data from multiple dimensions, multiple sub-graphs can be obtained. After hierarchical clustering, the abnormal trading
group is mined and classified according to its features. Finally, we build a scoring model and apply it to the big data platform of one of China’s largest
payment service providers to help enterprises identify abnormal trading groups and specific marketing strategies. 相似文献
15.
谢亮 《数字社区&智能家居》2007,1(6):1615
决策树是数据挖掘中的常用方法。指出当前入侵检测系统存在的问题,针对传统入侵检测技术性能低,误报率和漏报率高的问题,描述了利用决策树方法学习的一种优化实现的方式。 相似文献
16.
17.
在基于实例的维吾尔语汉语机器翻译系统中维吾尔语相似度计算起重要作用。维吾尔语的黏着性特性要求对单词进行词干提取。本文提出的方法结合简单的句子结构相似度计算方法,通过对单词词干提取进行句子相似度计算。小规模实验结果比较接近人工评价的句子相似度。 相似文献