首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
提出了一种基于句法分析与词语相关性相结合的方法实现英文专利文献中名词短语的翻译,建立了一个面向专利文献的名词短语双语实例库,形成名词短语(NP)树库.对待翻译的术语NP,先进行句法分析,再在NP树库中搜索与该术语NP匹配的NP树,对匹配的NP树,用<知网>计算词语间语义相似度,找到最相似NP树,然后计算词语的翻译候选之间的相关性找到词语翻译,最后调整语序生成译文;若不存在匹配的NP树,搜索与该NP树的子NP相匹配的NP树,递归生成译文.使用BLEU作为机器评价准则,实验结果表明,该方法优于基于短语的统计翻译系统(Pharaoh).  相似文献   

2.
该文总结了我们近几年来在基于句法的统计机器翻译方面所做的研究工作,特别是基于源语言句法的一系列统计机器翻译模型与方法,具体包括 基于最大熵括号转录语法的翻译模型,基于源语言短语结构树的树到串翻译模型及其相应的基于树的翻译方法,基于森林的翻译方法和句法分析与解码一体化翻译方法,基于源语言依存树的翻译模型。  相似文献   

3.
关健  刘大昕 《计算机工程》2004,30(5):129-130,149
为了解决基于专家系统的入侵检测系统匹配速度慢,不能适应网络高带宽要求的问题,提出了一种图形化的模型,采用分类树的方法构建规则分析机模型,根据属性在攻击描述的作用,决定节点的选择顺序,并且在搜索过程中采用树的遍历算法代替产生式规则的字符串比较方法,从而有效减少误用检测系统的属性匹配时间,满足了实时性要求。  相似文献   

4.
字符串近似匹配在网络安全中有广泛的应用。本文从中文字符串相似度角度出发,提出了通过单个汉字的细分来提高字符相似度的想法,并从汉字"成簇性"方面进行分析,引出了汉字的Key表示方法,将汉字与Key的映射关系归结为规则,讨论了规则的获取方法。设计了基于规则的中文字符串近似匹配的框架,提出了新的相似度计算模型,并通过实验对整个流程加以验证,证明基于规则的中文字符串近似匹配的优越性。  相似文献   

5.
蒋志鹏  关毅 《自动化学报》2019,45(2):276-288
完全句法分析是自然语言处理(Natural language processing,NLP)中重要的结构化过程,由于中文电子病历(Chinese electronic medical record,CEMR)句法标注语料匮乏,目前还没有面向中文电子病历的完全句法分析研究.本文针对中文电子病历模式化强的子语言特征,首次以树片段形式化中文电子病历复用的模式,提出了面向数据句法分析(Data-oriented parsing,DOP)和层次句法分析融合模型.在树片段抽取阶段,提出效率更高的标准树片段和局部树片段抽取算法,分别解决了标准树片段的重复比对问题,以及二次树核(Quadratic tree kernel,QTK)的效率低下问题,获得了标准树片段集和局部树片段集.基于上述两个树片段集,提出词汇和词性混合匹配策略和最大化树片段组合算法改进面向数据句法分析模型,缓解了无效树片段带来的噪声.实验结果表明,该融合模型能够有效改善中文电子病历句法分析效果,基于少量标注语料F1值能够达到目前最高的80.87%,并且在跨科室句法分析上超过Stanford parser和Berkeley parser 2%以上.  相似文献   

6.
针对基于字符串匹配的分词方法、基于理解的分词方法和基于统计的分词方法所存在的缺陷,提出基于本体和句法分析的某领域分词方法,通过建立体裁本体进行句法分析,从智能化的角度进行查词,避免了传统方法不考虑上下文信息导致的语义丢失等情况。实验结果证明,该方法可以较大地提高分词的精度。  相似文献   

7.
针对汉维统计机器翻译中维吾尔语具有长距离依赖问题和语言模型具有数据稀疏现象,提出了一种基于泛化的维吾尔语语言模型.该模型借助维吾尔语语言模型的训练过程中生成的文本,结合字符串相似度算法,取相似的维文字符串经过归一化处理抽取规则,计算规则的参数值,利用规则给测试集在解码过程中生成n-best译文重新评分,将评分最高的译文作为最佳译文.实验结果表明,泛化语言模型减少了存储空间,同时,规则的合理使用有效地提高了翻译译文的质量.  相似文献   

8.
为了实现网络入侵检测系统中的精确字符串匹配,本文提出了一种基于叶子-附加和二叉搜索树的字符串匹配算法及其实现架构;首先采用叶子-追加算法来对给定的模式集进行处理,以消除模式之间的重叠。然后采用二叉搜索树算法提取叶子模式及其匹配向量来构建二叉搜索树,并根据每个节点的比较结果,通过左遍历或右遍历来实现字符串的精确匹配;为了进一步提高字符串匹配算法的内存效率,提出了级联二叉搜索树;最后給出了实现精确字符串匹配的总体架构和各个功能模块的架构;实验结果表明,本文提出的设计不仅在内存效率和吞吐量方面优于目前先进的设计技术,而且具有灵活的可扩展性。  相似文献   

9.
基于规则演算的不良信息文本过滤模型   总被引:2,自引:0,他引:2  
本文在定义元符号及演算规则的基础上,基于字符串匹配,给出了一个不良信息文本过滤模型。由于规则是通过元符号或其它规则演算生成,因而本模型具有较强的过滤能力。  相似文献   

10.
刘颖  姜巍 《计算机工程与应用》2012,48(32):98-101,146
对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。  相似文献   

11.
长距离调序是统计机器翻译领域的一个重要问题.层次短语模型提供了一个很好的解决方案,它使用层次短语规则可以很好地表示局部调序和长距离调序.但是,使用传统的算法抽取长距离层次规则将会导致规则表数量急剧增加,从而加大解码内存和时间消耗.为了解决这个问题,该文提出了一种利用依存限制抽取长距离调序规则的新方法.实验表明,该文的方...  相似文献   

12.
本文介绍和分析了主观题自动评分的国内外研究现状,在基于模糊数学中贴近度理论和单向贴近度字符串匹配方法的基础上,结合动态规划算法思想,设计并实现了基于语义脉络的自动评分算法。该算法以句子作为基本语义单元,将标准答案分解为代表得分点的词串,并为这些词串加入同义词链去匹配学生答案语句,使语义表达更加完善和准确;同时利用动态规划算法使匹配按照词的顺序进行,避免仅仅按照字的出现次数匹配所造成的机械式匹配错误;最后根据文本中句子与关键词的匹配程度给出得分。在给出基本算法的主要思想以及程序流程图的基础上,结合实例分析证明了该算法的可行性。  相似文献   

13.
在机器同传(MSI)流水线系统中,将自动语音识别(ASR)的输出直接输入神经机器翻译(NMT)中会产生语义不完整问题,为解决该问题,提出基于BERT(Bidirectional Encoder Representation from Transformers)和Focal Loss的模型。首先,将ASR系统生成的几个片段缓存并组成一个词串;然后,使用基于BERT的序列标注模型恢复该词串的标点符号,并利用Focal Loss作为模型训练过程中的损失函数来缓解无标点样本比有标点样本多的类别不平衡问题;最后,将标点恢复后的词串输入NMT中。在英-德和汉-英翻译上的实验结果表明,在翻译质量上,使用提出的标点恢复模型的MSI,比将ASR输出直接输入NMT的MSI分别提高了8.19 BLEU和4.24 BLEU,比使用基于注意力机制的双向循环神经网络标点恢复模型的MSI分别提高了2.28 BLEU和3.66 BLEU。因此所提模型可以有效应用于MSI中。  相似文献   

14.
Fuzzy matching techniques are the presently used methods in translating the words. Neural machine translation and statistical machine translation are the methods used in MT. In machine translator tool, the strategy employed for translation needs to handle large amount of datasets and therefore the performance in retrieving correct matching output can be affected. In order to improve the matching score of MT, the advanced techniques can be presented by modifying the existing fuzzy based translator and neural machine translator. The conventional process of modifying architectures and encoding schemes are tedious process. Similarly, the preprocessing of datasets also involves more time consumption and memory utilization. In this article, a new spider web based searching enhanced translation is presented to be employed with the neural machine translator. The proposed scheme enables deep searching of available dataset to detect the accurate matching result. In addition, the quality of translation is improved by presenting an optimal selection scheme for using the sentence matches in source augmentation. The matches retrieved using various matching scores are applied to an optimization algorithm. The source augmentation using optimal retrieved matches increases the translation quality. Further, the selection of optimal match combination helps to reduce time requirement, since it is not necessary to test all retrieved matches in finding target sentence. The performance of translation is validated by measuring the quality of translation using BLEU and METEOR scores. These two scores can be achieved for the TA-EN language pairs in different configurations of about 92% and 86%, correspondingly. The results are evaluated and compared with other available NMT methods to validate the work.  相似文献   

15.
Earley's algorithm has been commonly used for the parsing of general context-free languages and the error-correcting parsing in syntactic pattern recognition. The time complexity for parsing is 0(n3). This paper presents a parallel Earley's recognition algorithm in terms of an ``X*' operator. By restricting the input context-free grammar to be ?-free, the parallel algorithm can be executed on a triangular-shape VLSI array. This array system has an efficient way of moving data to the right place at the right time. Simulation results show that this system can recognize a string with length n in 2n + 1 system time. We also present a parallel parse-extraction algorithm, a complete parsing algorithm, and an error-correcting recognition algorithm. The parallel complete parsing algorithm has been simulated on a processor array which is similar to the triangular VLSI array. For an input string of length n the processor array will give the correct right-parse at system time 2n + 1 if the string is accepted. The error-correcting recognition algorithm has also been simulated on a triangular VLSI array. This array recognizes an erroneous string of length n in time 2n + 1 and gives the correct error count. These parallel algorithms are especially useful for syntactic pattern recognition.  相似文献   

16.
Support vector learning for fuzzy rule-based classification systems   总被引:11,自引:0,他引:11  
To design a fuzzy rule-based classification system (fuzzy classifier) with good generalization ability in a high dimensional feature space has been an active research topic for a long time. As a powerful machine learning approach for pattern recognition problems, the support vector machine (SVM) is known to have good generalization ability. More importantly, an SVM can work very well on a high- (or even infinite) dimensional feature space. This paper investigates the connection between fuzzy classifiers and kernel machines, establishes a link between fuzzy rules and kernels, and proposes a learning algorithm for fuzzy classifiers. We first show that a fuzzy classifier implicitly defines a translation invariant kernel under the assumption that all membership functions associated with the same input variable are generated from location transformation of a reference function. Fuzzy inference on the IF-part of a fuzzy rule can be viewed as evaluating the kernel function. The kernel function is then proven to be a Mercer kernel if the reference functions meet a certain spectral requirement. The corresponding fuzzy classifier is named positive definite fuzzy classifier (PDFC). A PDFC can be built from the given training samples based on a support vector learning approach with the IF-part fuzzy rules given by the support vectors. Since the learning process minimizes an upper bound on the expected risk (expected prediction error) instead of the empirical risk (training error), the resulting PDFC usually has good generalization. Moreover, because of the sparsity properties of the SVMs, the number of fuzzy rules is irrelevant to the dimension of input space. In this sense, we avoid the "curse of dimensionality." Finally, PDFCs with different reference functions are constructed using the support vector learning approach. The performance of the PDFCs is illustrated by extensive experimental results. Comparisons with other methods are also provided.  相似文献   

17.
Recent years have seen a surge of interest in extending statistical regression to fuzzy data. Most of the recent fuzzy regression models have undesirable performance when functional relationships are nonlinear. In this study, we propose a novel version of fuzzy regression model, called kernel based nonlinear fuzzy regression model, which deals with crisp inputs and fuzzy output, by introducing the strategy of kernel into fuzzy regression. The kernel based nonlinear fuzzy regression model is identified using fuzzy Expectation Maximization (EM) algorithm based maximum likelihood estimation strategy. Some experiments are designed to show its performance. The experimental results suggest that the proposed model is capable of dealing with the nonlinearity and has high prediction accuracy. Finally, the proposed model is used to monitor unmeasured parameter level of coal powder filling in ball mill in power plant. Driven by running data and expertise, a strategy is first proposed to construct fuzzy outputs, reflecting the possible values taken by the unmeasured parameter. With the engineering application, we then demonstrate the powerful performance of our model.  相似文献   

18.

模式匹配算法是入侵检测系统(IDS) 中非常重要的一种算法. 在研究和分析几种常用模式匹配算法的基础 上, 提出一种快速的基于BM(Boyer-Moore) 模式匹配的改进算法—–IBM 算法. 该算法充分利用模式串的末字符和 末字符所对应的文本串的后两字符的唯一性, 同时参考文本串本身的信息来提高模式串的移动量, 使得每次失配后, 在保证不丢失匹配成功可能性的前提下尽可能多地向后跳跃. 实验结果表明, 该算法相比其他模式匹配算法, 在检测 性能和匹配效率上均具有很大优势, 并且能够有效地提高IDS 的检测效率和性能.

  相似文献   

19.
目的 在传统的词袋模型图像搜索问题中,许多工作致力于提高局部特征的辨识能力。图像搜索得到的图像在细节部分和查询图像相似,但是有时候这些图像在语义层面却差别很大。而基于全局特征的图像搜索在细节部分丢失了很多信息,致使布局相似实则不相关的图像被认为是相关图像。为了解决这个问题,本文利用深度卷积特征来构建一个动态匹配核函数。方法 利用这个动态匹配核函数,在鼓励相关图像之间产生匹配对的同时,抑制不相关图像之间匹配对的个数。该匹配核函数将图像在深度卷积神经网络全连接层最后一层特征作为输入,构建一个动态匹配核函数。对于相关图像,图像之间的局部特征匹配数量和质量都会相对增强。反之,对于不相关的图像,这个动态匹配核函数会在减少局部特征匹配的同时,降低其匹配得分。结果 从数量和质量上评估了提出的动态匹配核函数,提出了两个指标来量化匹配核函数的表现。基于这两个指标,本文对中间结果进行了分析,证实了动态匹配核函数相比于静态匹配核函数的优越性。最后,本文在5个公共数据集进行了大量的实验,在对各个数据集的检索工作中,得到的平均准确率从85.11%到98.08%,均高于此领域的同类工作。结论 实验结果表明了本文方法是有效的,并且其表现优于当前这一领域的同类工作。本文方法相比各种深度学习特征提取方法具有一定优势,由于本文方法使用特征用于构建动态匹配内核,而不是粗略编码进行相似性匹配,因此能在所有数据集上获得更好的性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号