首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 907 毫秒
构建中医汉英双语语料库平台并介绍其主要功能。该平台以经典中医文献语料为生语料,通过语料匹配程序和智能自增词典等技术实现语料加工入库功能,运用B_树动态索引技术实现语料检索和统计分析功能并降低了检索时间。  相似文献   

基于语料库和面向统计学的自然语言处理技术   总被引:15,自引:1,他引:14  
1引言 语料库语言学(Corpus Linguistics)是八十年代才崭露头角的一门新的计算语言学分支学科.它研究机器可读的自然语言文本的采集、存储、检索、统计、语法标注、句法语义分,以及具有上述功能的语料库在语言定量分析、词典编纂、作品风格分  相似文献   

本文提出了一种基于伪相关反馈模型的领域词典自动生成算法。将领域词典生成过程视为领域术语的检索过程假设初始检索出来的前若干个字符串与领域相关,将这些字符串加到领域词典中,重新检索,如此迭代,直到生成的领域词典达到预先设定的规模。实验表明,本算法经过若干次迭代后生成的领域词典准确率高于已有领域词典生成算法。  相似文献   

为加快重庆市页岩气勘探开发工作,探讨了页岩气资源数据库及信息检索平台的搭建;确立了建立页岩气资源数据库和检索平台的技术方法,初步确定了页岩气资源数据库的内容和检索平台的结构、功能需求及检索模型,为平台建设提供了思路。  相似文献   

根据目前在线藏汉英词典使用的实际需求, 青海师范大学藏文信息处理省部共建教育部重点实验室设计实现了一种基于WAMP平台的藏汉英互译在线词典,并给出了词典数据库和查询页面的具体设计方法和关键代码。经测试,该在线词典根据用户的需要,输入单字和词就可以在藏汉英三语间交互查询并快速检索到对应的译词。词典采用B/S结构,它的实现有助于藏汉英三语间的交流和学习。  相似文献   

水利信息资源的种类、内容多,专业性强,而且分布散乱,难以检索.论文结合水利领域的特定需求,提出了一个基于云平台的水利垂直搜索引擎—Water-Searcher,以期为水利领域的工作者提供一个能及时、全面、系统地了解水利领域信息资源的平台.具体内容包括建立水利种子站点列表,构建水利领域词典和领域停用词典,筛选出水利核心网站,结合已有的云平台实现分布式搜索.根据实验分析结果和专家认定机制,Water-Searcher能为水利工作者提供更好的专业化检索服务.  相似文献   

马立东 《软件》2011,(10):8-11,15
研究适合词典编纂工作特点的英语拼写错误更正方法。根据VBA语法,用VB代码编程,对MicrosoftWord的可编程对象进行操作,实现计算机辅助英语拼写错误更正的半自动化处理。重点实现英语拼写错误及更正建议的批量自动提取和标注功能。通过对用户词典的程序控制,降低查错误报率,解决英语拼写变体差异引起的误报等问题。  相似文献   

视觉词典方法(Bag of visual words,BoVW)是当前图像检索领域的主流方法,然而,传统的视觉词典方法存在计算量大、词典区分性不强以及抗干扰能力差等问题,难以适应大数据环境.针对这些问题,本文提出了一种基于视觉词典优化和查询扩展的图像检索方法.首先,利用基于密度的聚类方法对SIFT特征进行聚类生成视觉词典,提高视觉词典的生成效率和质量;然后,通过卡方模型分析视觉单词与图像目标的相关性,去除不包含目标信息的视觉单词,增强视觉词典的分辨能力;最后,采用基于图结构的查询扩展方法对初始检索结果进行重排序.在Oxford5K和Paris6K图像集上的实验结果表明,新方法在一定程度上提高了视觉词典的质量和语义分辨能力,性能优于当前主流方法.  相似文献   

中文全文检索算法研究   总被引:3,自引:0,他引:3  
一、全文检索系统概况1.全文检索系统应具备的功能一个全文检索系统至少要具备两个功能:仅)文章中任何有意义的词、字都可被检索。(2)能对检索词之间的关系进行位置和逻辑操作。另外,全文检索的响应时间应在秒级以内。2.本文全文检索的善本技术目前,已开发出来的中文全文检索系统,其基本技术可归纳为三种类型:(1)主题词索弓I。建立主题词索弓l。根据主题词典,对检索条件中切分后相邻自由词组合与主题词典匹配。得出检索结果。(2)词索引。对源文献进行分词,抽词,用切分获得的词的全体作为标引词,据此建立索引文件。检索时…  相似文献   

电子技术的发展使得辞书的载体,查检和阅读方式发生了根本的变化,电子词典有着传统文本词典无法比拟的优越性;多种多样的检索方式,便利的查询窗口,灵活的显示界面和连续的参见功能,无一不体现这种知识媒体的智能化和人性化特色;电子多媒体在词典中的应用,更使得单调,呆板的词典变得形象生动,在词典中获取知识变得轻松和快捷。  相似文献   

The Accademia della Crusca is involved in historical, philological and lexicographical research into the Italian language. First, we provide some background information on the Accademia. Second, we discuss the problems of selectivity and inertia of nineteenth century lexicography, and define modern day requirements. We then consider the contribution of the computer to modern lexicography, and some computer-based dictionary projects, including those undertaken specifically at the Accademia. Giovanni Nencioni is the President of the Accademia della Crusca (Florence, Italy), quondam professor of Italian linguistics in the Scuola Normale Superiore, Pisa. His main research interests are the history of the literary Italian language and lexicography. He has two major publications: Tra grammatica e retorica, Torino: Einaudi Paperbacks, 1983; and Saggi di lingua antica e moderna, Torino: Rosenberg e Sellier, 1989.  相似文献   

This paper introduces database lexicography, a metadata analysis discipline that applies lexical graph theory to data design. Database lexicography proposes a formal design criterion for data dependencies, and it provides metrics to evaluate the conformance of designs to this criterion. It treats the data dictionary as a first class object encoding design concepts, and its benefits include identification of database dependency architecture; quantification of interdependent data elements' sensitivity to change; categorization of core and peripheral data elements; model integration; and figures of merit by which to fortify data architectures to withstand design fossilization and guide their evolution amidst changing requirements.  相似文献   

语料库语言学是借助大规模语料库对语言现象进行发现、挖掘的学科,目前已经存在很多在线语料库辅助语言学的研究。该文提供了一个按时间分片进行管理的语料库,并基于此提出了一个由社区维护的在线词典编纂系统,该系统将语料库查询结果动态结合在被编辑的词条中。该文还介绍了一个多义词词义发现和层次化聚类算法,用以自动生成一个默认的词条框架。该文概述了词典编纂系统的总体情况,重点介绍系统的设计和使用方法。  相似文献   

General-purpose database management systems, whose structure is built in, are not an appropriate solution to situations where problems of translation or areas of research cannot be bounded in advance, for example, when lexicography and linguistic research are closely related. Consequently, an original system has been developed, and is being applied to linguistic and lexicographical data on the Somali language.Jacqueline Lecarme has a master's degree in Lettres Classiques (University of Grenoble, 1969) and a Ph.D. in linguistics (University of Montreal, 1978). Carole Maury has a master's degree and a doctorate in computer science (University of Nice, 1986).  相似文献   

翻译等价对在词典编纂、机器翻译和跨语言信息检索中有着广泛的应用。文章从双语句对的译文等价树中抽取翻译等价对。使用译文直译率、短语对齐概率和目标语-源语言短语长度差异等特征对自动获取的等价对进行评价。提出了一种基于多重线性回归模型的等价对评价方法,并结合N-Best策略对候选翻译等价对进行过滤。实验结果表明:在开放测试中,基于多重线性回归模型的等价对评价及过滤方法其性能要优于其它方法。  相似文献   

基于未对齐汉英双语库的翻译对抽取   总被引:5,自引:2,他引:3  
王斌 《中文信息学报》2000,14(6):40-44,57
本文主要研究基于未对齐的汉英双语库翻译对抽取。文章首先介绍了Pascale Fung在这方面设计的两个算法。在此基础上,文章对后一种算法进行了部分的改进,使得其更适合于真实双语文本的翻译对抽取。实现结果表明改进后算法的有效性。本方法可以用于基于大规模双语语料库的短语翻译抽取、词典编纂等应用,具有较高的应用价值。  相似文献   

The article is devoted to the problem of word sense induction. We propose a method for inducing senses from a raw text corpus. The proposed sense induction algorithm (called SenseSearcher, or SnS) is based on closed frequent sets, and as a result, it provides a multilevel sense representation. To a large extent, it is a knowledge‐poor approach, as it does not need any kind of structured knowledge base about senses and there is no deep language knowledge embedded. By discovering a hierarchy of senses, the algorithm enables identifying subsenses (fine‐grained senses). SnS discovers not only frequent (dominating) senses but also infrequent ones (dominated). The method was evaluated in two main areas: lexicography and information retrieval. With the use of the SnS algorithm, we provide a tool able to induce from a textual corpus a structure of senses, with a varying number of granularity levels. In the area of information retrieval, SnS can be used for clustering search result, according to the discovered senses. The experiments have shown that SnS performs better than the methods participating in the SemEval2013 WSI Task 11 competition, and most of the known search result clustering methods.  相似文献   

We present a new method to describe the contextual meaning of a key word in a corpus. The vocabulary of the sentences containing this word is compared to that of the entire corpus in order to highlight the words which are significantly overutilized in the neighbourhood of this key word (they are associated in the author’s mind) and the ones which are significantly underutilized (they are mutually exclusive). This method provides an interesting tool for lexicography and literary studies as is shown by applying it to the word amour (love) in the work of Pierre Corneille, the most famous French playwright of the 17th century.  相似文献   

Dictionaries and related language reference works constitute a rich but under-exploited resource for the history of languages and of language study in the Middle Ages. Unfortunately, the size and complexity of typical medieval dictionaries make editions and analyses by traditional methods prohibitively expensive in time and money. Using as an example the Latin-Middle English dictionaryMedulla grammatice, the paper describes some central problems in the study of medieval English lexicography and the solutions provided by computers, which, with their immense speed, profound memory, and perfect accuracy can help scholars analyze, edit, and promulgate medieval documents and the linguistic data they contain.  相似文献   

汉语未登录词的知识表示与预测,包括词性、构词结构、词义等项目,是计算语言学领域中的基础性问题。该文依据“平行周遍”原则,从现有的语义构词知识中提取“平行条件”,将未登录词潜在的构词因素与这些“平行条件”进行适应性匹配,从而对其知识表示进行相对完整的预测。该方法将新的语言学理论与未登录词的理解应用问题结合,取得了显著的效果,其解释能力、便捷性和精细程度优于此前方法。这些研究,除了在自然语言处理领域有实用价值,也有望推动词典编撰、语言研究与教学等人文领域的进展。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号