期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

多策略汉维句子对齐 总被引：2，自引：0，他引：2

田生伟吐尔根·依布拉音禹龙加米拉·吾守尔杨飞宇《计算机科学》2010,37(4):215

提出了一种错误抑制的多策略算法对齐汉维语句子。针对长度对齐算法无法避免错误蔓延的特点,提出了一种新的错误蔓延抑制策略:利用双语语料的词汇共现信息,自动抽取汉维语词汇搭配,结合句子长度特征,寻找1:1模式的句对作为锚点,将错误蔓延抑制在锚点内;在锚点之间,利用标点符号和长度混合方法进行句子对齐。算法实验结果验证了该多策略算法寻找的锚点的精度高,有效抑制了对齐错误的蔓延;采用的混合对齐算法,避免了基于词汇对齐算法的高时间复杂度的弱点,比传统的对齐算法性能有了较大提高,对齐准确率由95.0%提高到97.6%,召回率由96.8%提高到98.2%,采用的对齐正确性评价算法可以有效发现自动对齐中的噪音对齐。相似文献

2.

改进的自适应汉维句子对齐

下载免费PDF全文

田生伟禹龙杨飞宇《计算机工程与应用》2011,35(35):147-149

提出了改进的自适应汉维句子对齐算法对齐汉维语句子。针对传统对齐方法不能较好地适应语料类型的变化,算法利用当前待对齐汉维文本的字节长度比和历史匹配模式数据,动态修正对齐模型的参数,使其适应语料类型的变化,提高了汉维句子对齐算法的性能,对齐的正确率和召回率较长度对齐模型分别提高了3.5个百分点和2.7个百分点,较混合对齐提高了1.9个百分点和1.8个百分点。实验结果验证了该算法能够有效地适应语料类型的变化。相似文献

3.

信息处理用彝汉双语词汇对齐技术研究

王成平《计算机光盘软件与应用》2012,(11):3-4

本文首先以信息处理用彝汉词汇对齐的难点作为出发点,然后在分析参照Borwn词汇对齐模型的基础上提出基于彝汉双语词典的彝汉词汇对齐的实现算法BiDictAlign,并用此方法进行了实验测试,测试数据显示此方法具有良好的性能,为信息处理用彝汉双语料词汇对齐技术的研究进行了有意义的探索。相似文献

4.

基于多特征融合和图匹配的维汉句子对齐

倪耀群许洪波程学旗《中文信息学报》2016,30(4):124-133

维吾尔语新闻网页与对应的中文翻译网页在内容上往往并非完全可比,主要表现为双语句子序列的错位甚至部分句子缺失,这给维汉句子对齐造成了困难。此外,作为新闻要素的人名地名很多是未登录词,这进一步增加了维汉句子对齐的难度。为了提高维汉词汇的匹配概率,作者自动提取中文人名、地名并翻译为维吾尔译名,构造双语名称映射表并加入维汉双语词典。然后用维文句中词典词对应的中文译词在中文句中进行串匹配,以避免中文分词错误,累计所有匹配词对得到双语句对的词汇互译率。最后融合数字、标点、长度特征计算双语句对的相似度。在所有双语句子相似度构成的矩阵上,使用图匹配算法寻找维汉平行句对,在900个句对上最高达到95.67%的维汉对齐准确率。相似文献

5.

基于长度的扩展方法的汉英句子对齐 总被引：7，自引：4，他引：7

张艳柏冈秀纪《中文信息学报》2005,19(5):33-36,58

本文提出了一种用于汉英平行语料库对齐的扩展方法。该扩展方法以基于长度的统计对齐方法为主,然后根据双语词典引入了词汇信息,而基于标点的方法作为对齐的后处理部分。这种扩展方法不仅避免了复杂的中文处理,例如,汉语分词和词性标注,而且在统计方法中引入了关键词信息,以提高句子对齐的正确率。本文中所用的双语语料是LDC 的关于香港的双语新闻报道。动态规划算法用于系统的实现。和单纯的基于长度的方法和词汇方法相比,我们的扩展方法提高了句子对齐的正确率,并且结果是比较理想的。相似文献

6.

一种汉英双语句子自动对齐算法 总被引：2，自引：0，他引：2

王占军姚卫东《计算机仿真》2009,26(2)

双语语料库建设及其自动对齐研究对计算语言学的发展具有重要的意义.双语对齐技术是加工双语文本的核心,对齐效果的好坏直接影响了以后工作(诸如机器辅助翻译)的进行.基于汉英双语的实际情况,提出了一种新的句子对齐混合算法,该算法主要采用一种新的基于长度的对齐算法,并结合基于词典的对齐算法,通过正反双向对齐,进一步提高了句子对齐的准确率.最后通过100个文件,5000多句英汉双语对该算法进行了验证,从对齐效果可以发现,结果比较理想,因而可以证明,该算法在实际工作中是可行的. 相似文献

7.

基于自动抽取词汇信息的双语句子对齐 总被引：9，自引：0，他引：9

刘昕周明朱胜火黄昌宁《计算机学报》1998,21(Z1):151-158

双语语料库句子对齐已成为新一代机器翻译研究中的一个至关重要的问题.对齐方法主要有基于长度的方法和基于词汇的方法,两者各具特点:前者实现简单、效率高,但精度低;后者精度高但实现复杂.本文提出一种新的对齐方法,首先利用基于长度的方法对文本进行粗对齐,然后在双语平行文本中确定锚点并自动抽取双语对应的关键词汇,降低了对齐问题的复杂度并减少了错误的蔓延.最后再利用所得到的词汇对应信息进行句子的对齐.这种方法融合了基于长度和基于词汇方法的优点,实验表明,它很大程度地提高了对齐的精度. 相似文献

8.

基于译文的英汉双语句子自动对齐

钱丽萍赵铁军杨沫昀高光来《计算机工程与应用》2000,(12)

双语语料库的自动对齐已成为机器翻译研究中的一个重要研究课题.目前的句子对齐方法有基于长度的方法和基于词汇的方法,该文先分析了基于长度的方法,然后提出了基于译文的方法：通过使用一部翻译较完整的词典作为桥梁,把英汉句子之间的对应关系连接起来.根据英语文本中的单词,在词典中找到其对应的译文,并以译文到汉语句子中去匹配,根据评价函数和动态规划算法找到对齐句对.实验结果证明这种对齐方法消除了基于长度做法中错误蔓延的情况,它大大地提高了对齐的精度,其效果是令人满意的. 相似文献

9.

基于词对建模的句子对齐研究

《计算机工程》2019,(6):211-217

句子对齐是将源文本中的句子映射到目标文本中对应翻译的过程。在神经网络的框架下,基于相互对齐的源端和目标端句子中包含大量相互对齐的单词,提出一种句子对齐方法。使用门关联网络捕获源端句子和目标端句子词对之间的语义关系,并通过语义关系来确定源端句子和目标端句子是否对齐。对非单调文本进行对齐评估,结果表明,该方法F1值达到93.8%,有效提高了句子对齐的准确率。相似文献

10.

限定领域的汉语-维吾尔语句子级对齐研究

张亚军贺琛琛香丽芸《软件》2014,(3):62-64

针对政府文献的汉语维吾尔语语料库特点,充分利用汉语和维吾尔语的句子特性,提出一种汉维句子级别的对齐方法。该方法重点分析政府领域的汉语和维吾尔语的句型,分别对汉语和维吾尔语的语料进行边界识别,避免了复杂句型对汉语-维吾尔语句子对齐的影响,使得该方法取得句子对齐达到97%与99%之间的正确率。对齐的汉语-维吾尔语句子对可以充实语料库的规模,为汉语-维吾尔语短语对齐以及汉维机器翻译提供翻译语料。相似文献

11.

基于语料库的英语从句识别研究 总被引：2，自引：0，他引：2

张晶赵铁军姚建民《中文信息学报》2000,14(6):51-57

为改善英汉机译系统复杂句的翻译效果,针对英语复杂句中从句的边界界定问题,本文提出一种基于语料库的方法识别从句,该方法利用词性信息,将规则方法和统计方法结合用于识别从句的边界,获得良好的实验结果,封闭测试的精确率为92.69% ,召回率91.04%;开放测试的精确率为80.34% ,召回率83.93%。相似文献

12.

基于统计的中文地名识别 总被引：20，自引：5，他引：20

黄德根岳广玲杨元生《中文信息学报》2003,17(2):37-42

本文针对有特征词的中文地名识别进行了研究。该系统使用从大规模地名词典和真实文本语料库得到的统计信息以及针对地名特点总结出来的规则,通过计算地名的构词可信度和接续可信度从而识别中文地名。该模型对自动分词的切分作了有效的调整,系统闭式召回率和精确率分别为90.24%和93.14% ,开式召回率和精确率分别达86.86%和91.48%。相似文献

13.

Prediction of Alzheimer’s Using Random Forest with Radiomic Features

Anuj Singh Raman Kumar Arvind Kumar Tiwari 《计算机系统科学与工程》2023,45(1):513-530

Alzheimer’s disease is a non-reversible, non-curable, and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention. It is a frequently occurring mental illness that occurs in about 60%–80% of cases of dementia. It is usually observed between people in the age group of 60 years and above. Depending upon the severity of symptoms the patients can be categorized in Cognitive Normal (CN), Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD). Alzheimer’s disease is the last phase of the disease where the brain is severely damaged, and the patients are not able to live on their own. Radiomics is an approach to extracting a huge number of features from medical images with the help of data characterization algorithms. Here, 105 number of radiomic features are extracted and used to predict the alzhimer’s. This paper uses Support Vector Machine, K-Nearest Neighbour, Gaussian Naïve Bayes, eXtreme Gradient Boosting (XGBoost) and Random Forest to predict Alzheimer’s disease. The proposed random forest-based approach with the Radiomic features achieved an accuracy of 85%. This proposed approach also achieved 88% accuracy, 88% recall, 88% precision and 87% F1-score for AD vs. CN, it achieved 72% accuracy, 73% recall, 72% precisionand 71% F1-score for AD vs. MCI and it achieved 69% accuracy, 69% recall, 68% precision and 69% F1-score for MCI vs. CN. The comparative analysis shows that the proposed approach performs better than others approaches. 相似文献

14.

A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis

《Information and Software Technology》2016

ContextThe component field in a bug report provides important location information required by developers during bug fixes. Research has shown that incorrect component assignment for a bug report often causes problems and delays in bug fixes. A topic model technique, Latent Dirichlet Allocation (LDA), has been developed to create a component recommender for bug reports.ObjectiveWe seek to investigate a better way to use topic modeling in creating a component recommender.MethodThis paper presents a component recommender by using the proposed Discriminative Probability Latent Semantic Analysis (DPLSA) model and Jensen–Shannon divergence (DPLSA-JS). The proposed DPLSA model provides a novel method to initialize the word distributions for different topics. It uses the past assigned bug reports from the same component in the model training step. This results in a correlation between the learned topics and the components.ResultsWe evaluate the proposed approach over five open source projects, Mylyn, Gcc, Platform, Bugzilla and Firefox. The results show that the proposed approach on average outperforms the LDA-KL method by 30.08%, 19.60% and 14.13% for recall @1, recall @3 and recall @5, outperforms the LDA-SVM method by 31.56%, 17.80% and 8.78% for recall @1, recall @3 and recall @5, respectively.ConclusionOur method discovers that using comments in the DPLSA-JS recommender does not always make a contribution to the performance. The vocabulary size does matter in DPLSA-JS. Different projects need to adaptively set the vocabulary size according to an experimental method. In addition, the correspondence between the learned topics and components in DPLSA increases the discriminative power of the topics which is useful for the recommendation task. 相似文献

15.

Using MOEA/D for optimizing ontology alignments

Xingsi Xue Yuping Wang Weichen Hao 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(8):1589-1601

This paper proposes a novel approach which uses a multi-objective evolutionary algorithm based on decomposition to address the ontology alignment optimization problem. Comparing with the approach based on Genetic Algorithm (GA), our method can simultaneously optimize three goals (maximizing the alignment recall, the alignment precision and the f-measure). The experimental results shows that our approach is able to provide various alignments in one execution which are less biased to one of the evaluations of the alignment quality than GA approach, thus the quality of alignments are obviously better than or equal to those given by the approach based on GA which considers precision, recall and f-measure only, and other multi-objective evolutionary approach such as NSGA-II approach. In addition, the performance of our approach outperforms NSGA-II approach with the average improvement equal to 32.79 \(\%\) . Through the comparison of the quality of the alignments obtained by our approach with those by the state of the art ontology matching systems, we draw the conclusion that our approach is more effective and efficient. 相似文献

16.

A Morphological Tagger for Korean: Statistical Tagging Combined with Corpus-Based Morphological Rule Application

Chung-Hye?Han Email author Martha?Palmer 《Machine Translation》2004,18(4):275-297

This paper describes a novel approach to morphological tagging for Korean, an agglutinative language with a very productive inflectional system. The tagger takes raw text as input and returns a lemmatized and morphologically disambiguated output for each word: the lemma is labeled with a part-of-speech (POS) tag and the inflections are labeled with inflectional tags. Unlike the standard approach to tagging for morphologically complex languages, in our proposed approach the tagging phase precedes the analysis phase. It comprises a trigram-based tagging component followed by a morphological rule application component, obtaining 95% precision and recall on unseen test data. 相似文献

17.

基于依存分析的离合触发词合法分离形式判定

肖升李勇帆何炎祥《计算机工程与应用》2014,50(10):11-17

离合触发词的构词语素可能因插入、颠倒、省略而产生多种合法分离形式,这些分离形式与原形一样也能表征事件。为完整抽取事件,提出一种基于依存分析的离合触发词合法分离形式判定算法。该方法首先借助依存分析考察离合触发词合法分离形式在句中所受的依存约束,然后将这些约束转化为可计算的判定规则,最后利用判定规则对离合触发词的合法分离形式进行判定。实验结果显示,排除稀疏数据前,此方法的正确率、召回率、F值分别为82.2%、88.3%、85.1%;排除稀疏数据后,正确率、召回率、F值提升到82.4%、88.7%、85.4%。方法已基本具备应用潜质。相似文献

18.

Short-term potentiation effect on pattern recall in sparsely coded neural network

Julius StroffekAuthor Vitae Petr MarsalekAuthor Vitae 《Neurocomputing》2012,77(1):108-113

It has been shown in studies of biological synaptic plasticity that synaptic efficacy can change in a very short time window, compared to the time scale associated with typical neural events. This time scale is small enough to possibly have an effect on pattern recall processes in neural networks. We study properties of a neural network which uses a cyclic Hebb rule. Then we add the short term potentiation of synapses in the recall phase. We show that this approach preserves the ability of the network to recognize the patterns stored by the network and that the network does not respond to other patterns at the same time. We show that this approach dramatically increases the capacity of the network at the cost of a longer pattern recall process. We discuss that the network possesses two types of recall. The fast recall does not need synaptic plasticity to recognize a pattern, while the slower recall utilizes synaptic plasticity. This is something that we all experience in our daily lives: some memories can be recalled promptly whereas recollection of other memories requires much more time. 相似文献

19.

一种基于本体的知识库语义扩展搜索方法

下载免费PDF全文

万静王文聪易军凯《计算机工程》2012,38(6):19-21

为使知识库的信息搜索突破传统基于关键字查询的局限,提出一种基于本体的知识库语义扩展搜索方法。将本体和语义扩展引入知识库,对用户查询条件进行扩展搜索,通过相关度分析对搜索结果进行排序,使搜索效果得到优化。实验结果表明,该方法能提高搜索查全率和查准率。相似文献

20.

一种人名识别方法的研究

张素香张素贤王小捷《计算机工程与应用》2008,44(21):157-161

针对汉语人名识别的难点,基于最大熵算法提出了结合多知识、多模型的识别方法,充分考虑了人名的内部特征（小颗粒特征）和人名的语境信息。论文的主要贡献是：将概率信息赋予最大熵模型,极大提高人名的准确率和召回率;细化了分类模型,将人名识别分成中国人名识别、外国译名识别和单字人名识别;提出动态优先级方法来防止一个外国译名被部分识别为一个或几个中国人名。实验测试数据为1998年1月的人民日报和Sighan（2006）命名实体测试语料。测试结果表明,人民日报（1998-01）的召回率为90.06%,准确率为89.27%;Sighan（MSRA）语料的召回率为95.39%,准确率为96.71%;Sighan（LDC）语料的召回率为87.56%,准确率为91.04%。实验结果证明,提出的人名识别方法是非常有效的。相似文献