首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
一种层次化的LSD规则体系及其分析算法   总被引:1,自引:0,他引:1  
本文提出了一种基于词汇属性结构描述和规则继承的层次化LSD规则体系,讨论了该规则体系下的规则搜索策略和词汇化规则索引的实现方法,并在此基础上首次给出了LSD文法的非确定性分析算法。该规则系统具有从传统属性文法到现代词汇文法的可伸缩性,同时较好地解决了线性规则库中复杂的规则交互问题。  相似文献   

2.
本文论述了一种基于二元组合文法的汉语句法结构分析的消除歧义方法。首先给出了二元组合文法的基本概念以及基本思想,然后研究了概率上下文无关文法独立性假设的限性,并针对局限性引入了基于二元组合文法的上下文相关的概率信息,同时提出了一种新的计算分值模式。实验结果证明,这种方法对句法分析过程中的歧义消解是有效的。  相似文献   

3.
以词汇主义形式语法为基础,建立了链接文法与合一理论相结合的句法分析新方法.在封闭测试中,基于合一的链接文法句法分析精确率和召回率相比传统链接文法分别提高了9.6%和14.1%.实验表明方法具有一定独创性和高效性.  相似文献   

4.
云计算的核心是在虚拟化技术的基础上,通过互联网技术为用户提供动态易扩展的计算资源。利用中心服务器的计算模式来管控网络上大量云资源使得中心服务器成为整个系统的瓶颈,不利于云计算的大规模应用,因此提出使用对等网络技术构建分布式的云资源索引存储和查询系统,但是结构化拓扑系统维护比较复杂,一般不支持复杂搜索条件查询。本文提出了一种多关键字云资源搜索算法。在基于分层超级节点的云资源搜索算法基础上进行路由算法改进,希望实现多关键字的精确查询。对多关键字的生成、分割及存储做出了详细说明,提出一种有效的基于数据集的索引搜索策略,实现了包含三个或三个以上的关键字高效、准确查询。分析实验结果证明了算法明显提高了资源搜索的命中率,尤其是随着关键字数目的增多,不仅保证了资源搜索的命中率,同时大大增加了资源的召回率。  相似文献   

5.
为获得精确的模型参数,从率点选择和模型参数估计2个方面对H.264码率控制算法加以改进,利用一种基于曼哈顿和二维滑动窗口机制相结合的策略,实现高效的率点选择算法,采用加权的线性回归技术,实现有效的模型参数预测。实验结果表明,改进后的算法能改善视频编码质量,提高输出码率的控制精度。  相似文献   

6.
针对基于随机上下文无关文法(SCFG)建模的多功能雷达(MFR)概率学习问题,在传统Inside-Outside(IO)算法和Viterbi-Score(VS)算法的基础上,提出一种基于Earley算法的多功能雷达文法概率快速学习算法。该算法通过对截获的雷达数据进行预处理,构造可以反映派生过程的Earley剖析表,并且基于最大子树概率原则从剖析表中提取出最优剖析树,利用改进的IO算法和改进的VS算法对文法概率进行学习,实现MFR参数估计,得到文法参数后,再利用Viterbi算法对MFR状态进行估计。理论分析和实验仿真表明,与IO算法和VS算法相比,改进算法在保持估计精度的同时,可以有效降低计算复杂度和减少运行时间,验证了Earley算法能够提高文法概率的学习速度。  相似文献   

7.
基于隐马尔可夫模型(HMM)的词性标注的应用研究   总被引:3,自引:0,他引:3  
利用隐马尔可夫模型(HMM)对英语文本进行词性标注,首先介绍了对Viterbi算法的改进和基于HMM模型方法训练机器的步骤,然后通过一系列对比实验,得出两个结论:二元文法模型的“性能价格比”较三元文法模型更令人满意;词性标注集的个数对词性标注的准确率有影响。最后利用上述结论进行了封闭式测试和开放式测试。  相似文献   

8.
本文根据五种造型法则:平行扫法则、回转扫法则、箱体法则、异形体法则和曲面立体法则的离散化原理,提出了Sweeping体的拓扑结构的生成方法。讨论了离散化多面体边界的几何属性码和CSG-索引方法,以及基于这种几何属性码和CSG-索引的精确B-Rep反算策略。  相似文献   

9.
在大多数现有的检索模型中常常忽略了如下事实:一个文档中匹配到的查询词项的近邻性和打分时所基于的段落检索也可以被用来促进文档的打分。受此启发,提出了基于位置语言模型的中文信息检索系统,首先通过定义位置传播数的概念,为每个位置单独地建立语言模型;然后通过引入KL-divergence检索模型,并结合位置语言模型给每个位置单独打分;最后由多参数打分策略得到文档的最终得分。实验中还重点比较了基于词表和基于二元两种中文索引方法在位置语言模型中的检索效果。在标准NTCIR5、NTCIR6测试集上的实验结果表明,该检索方法在两种索引方式上都显著改善了中文检索系统的性能,并且优于向量空间模型、BM25概率模型、统计语言模型。  相似文献   

10.
文本聚类是聚类的一个重要研究分支,在文本处理领域中有着广泛的应用。在描述聚类特征树与动态索引树的文本聚类方法后,将原动态索引树文本聚类方法中的合并阀值由单一线性依赖关系修改为依赖于聚类节点半径值。实验证明,改进后的算法在聚类结果精确率与聚类时间上都有明显提高。  相似文献   

11.
The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; (3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English–Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English–Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.  相似文献   

12.
Text retrieval systems require an index to allow efficient retrieval of documents at the cost of some storage overhead. This paper proposes a novel full-text indexing model for Chinese text retrieval based on the concept of adjacency matrix of directed graph. Using this indexing model, on one hand, retrieval systems need to keep only the indexing data, instead of the indexing data and the original text data as the traditional retrieval systems always do. On the other hand, occurrences of index term are identified by labels of the so-called s-strings where the index term appears, rather than by its positions as in traditional indexing models. Consequently, system space cost as a whole can be reduced drastically while retrieval efficiency is maintained satisfactory. Experiments over several real-world Chinese text collections are carried out to demonstrate the effectiveness and efficiency of this model. In addition to Chinese, The proposed indexing model is also effective and efficient for text retrieval of other Oriental languages, such as Japanese and Korean. It is especially useful for digital library application areas where storage resource is very limited (e.g., e-books and CD-based text retrieval systems).  相似文献   

13.
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection.We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.  相似文献   

14.
一种结合超链接分析的搜索引擎排序方法   总被引:5,自引:0,他引:5  
吴明礼  施水才 《计算机工程》2004,30(15):143-145
为了提高搜索引擎的检索性能,文章设计了一种搜索引擎的综合排序方法。它采用改进的布尔检索模式、中文分词、超链接分析以及索引链接文本等技术,主要具有以下特点:对经典布尔型检索模式所作的改进使得文档相关度不再是严格的0或1;超链接分析通过互联网的链接结构计算出每个网络文档的质量;通过中文分词和索引链接文本可以更加准确地获得一个网络文档的信息内涵。将3者结合可以充分利用各自优势而弥补不足。  相似文献   

15.
Efficient and robust information retrieval from large image databases is an essential functionality for the reuse, manipulation, and editing of multimedia documents. Structural feature indexing is a potential approach to efficient shape retrieval from large image databases, but the indexing is sensitive to noise, scales of observation, and local shape deformations. It has now been confirmed that efficiency of classification and robustness against noise and local shape transformations can be improved by the feature indexing approach incorporating shape feature generation techniques (Nishida, Comput. Vision Image Understanding 73 (1) (1999) 121-136). In this paper, based on this approach, an efficient, robust method is presented for retrieval of model shapes that have parts similar to the query shape presented to the image database. The effectiveness is confirmed by experimental trials with a large database of boundary contours obtained from real images, and is validated by systematically designed experiments with a large number of synthetic data.  相似文献   

16.
In this paper, we present a novel approach to image indexing by incorporating a neural network model, Kohonen’s Self-Organising Map (SOM), for content-based image retrieval. The motivation stems from the idea of finding images by regarding users’ specifications or requirements imposed on the query, which has been ignored in most existing image retrieval systems. An important and unique aspect of our interactive scheme is to allow the user to select a Region-Of-Interest (ROI) from the sample image, and subsequent query concentrates on matching the regional colour features to find images containing similar regions as indicated by the user. The SOM algorithm is capable of adaptively partitioning each image into several homogeneous regions for representing and indexing the image. This is achieved by unsupervised clustering and classification of pixel-level features, called Local Neighbourhood Histograms (LNH), without a priori knowledge about the data distribution in the feature space. The indexes generated from the resultant prototypes of SOM learning demonstrate fairly good performance over an experimental image database, and therefore suggest the effectiveness and significant potential of our proposed indexing and retrieval strategy for application to content-based image retrieval. Receiveed: 4 June 1998?,Received in revised form: 7 January 1999?Accepted: 7 January 1999  相似文献   

17.
面向文本检索的语义计算   总被引:14,自引:1,他引:14  
赵军  金千里  徐波 《计算机学报》2005,28(12):2068-2078
随着信息社会尤其是互联网的发展,人们对文本检索的要求越来越高.作为对传统关键词匹配技术的改进,智能检索研究已经成为热点,并将是支撑下一代互联网的核心技术之一.将语义计算技术应用于文本检索,是智能检索的重要方向.文中在文本检索的两个关键技术(“标引”和“相似度计算”)中引入语义计算技术,用浅层语义来指导检索过程,提高检索准确率.针对“标引”技术,提出了语义树模型;针对“相似度计算”,基于语义张量的概念,结合自然语言处理的一些技术,提出三个可计算的窗口模型来近似语义张量的核心思想.以上工作在一定程度上实现了语义计算的功能.利用TREC数据集进行的评测表明,采用了语义计算技术后,文本检索的准确率可以提高10%左右.  相似文献   

18.
多维向量动态索引结构研究   总被引:4,自引:0,他引:4  
多维向量的索引技术是多媒体数据库系统中的关键技术之一.集中研究基于向量空间模型的动态索引结构,以解决在图像数据库系统中按内容快速检索图像的对象问题.在分析研究R-Tree和R*-Tree的基础上,提出了ER-Tree动态索引结构.该索引树用超球体划分多维向量空间,以有利于计算最近邻;吸取R*-Tree树的重插技术,以增强索引树对数据集整体特征的表达能力,从而提高检索效率;通过引入插入安全点和删除安全点概念,有效地提高建树的效率.同时,给出了基于该结构的特征向量插入算法.实验结果表明,所提出的索引结构建树的  相似文献   

19.
本文介绍了一种信息抽取和自动分类的新应用,分析了传统分类方法的不足,介绍了一种基于隐含语义索引技术的文本分类改进方案。该技术是一新型的检索模型,它通过奇异值分解,或增强或消减词在文档中的语义影响力,使得文档之间的语义关系更为明晰,从而能容易地剔除掉那些语义关联弱的噪声数据,提高特征值提取精度和最后的分类准确度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号