共查询到20条相似文献,搜索用时 907 毫秒
1.
2.
基于语料库和面向统计学的自然语言处理技术 总被引:15,自引:1,他引:14
1引言 语料库语言学(Corpus Linguistics)是八十年代才崭露头角的一门新的计算语言学分支学科.它研究机器可读的自然语言文本的采集、存储、检索、统计、语法标注、句法语义分,以及具有上述功能的语料库在语言定量分析、词典编纂、作品风格分 相似文献
3.
4.
5.
6.
水利信息资源的种类、内容多,专业性强,而且分布散乱,难以检索.论文结合水利领域的特定需求,提出了一个基于云平台的水利垂直搜索引擎—Water-Searcher,以期为水利领域的工作者提供一个能及时、全面、系统地了解水利领域信息资源的平台.具体内容包括建立水利种子站点列表,构建水利领域词典和领域停用词典,筛选出水利核心网站,结合已有的云平台实现分布式搜索.根据实验分析结果和专家认定机制,Water-Searcher能为水利工作者提供更好的专业化检索服务. 相似文献
7.
研究适合词典编纂工作特点的英语拼写错误更正方法。根据VBA语法,用VB代码编程,对MicrosoftWord的可编程对象进行操作,实现计算机辅助英语拼写错误更正的半自动化处理。重点实现英语拼写错误及更正建议的批量自动提取和标注功能。通过对用户词典的程序控制,降低查错误报率,解决英语拼写变体差异引起的误报等问题。 相似文献
8.
视觉词典方法(Bag of visual words,BoVW)是当前图像检索领域的主流方法,然而,传统的视觉词典方法存在计算量大、词典区分性不强以及抗干扰能力差等问题,难以适应大数据环境.针对这些问题,本文提出了一种基于视觉词典优化和查询扩展的图像检索方法.首先,利用基于密度的聚类方法对SIFT特征进行聚类生成视觉词典,提高视觉词典的生成效率和质量;然后,通过卡方模型分析视觉单词与图像目标的相关性,去除不包含目标信息的视觉单词,增强视觉词典的分辨能力;最后,采用基于图结构的查询扩展方法对初始检索结果进行重排序.在Oxford5K和Paris6K图像集上的实验结果表明,新方法在一定程度上提高了视觉词典的质量和语义分辨能力,性能优于当前主流方法. 相似文献
9.
中文全文检索算法研究 总被引:3,自引:0,他引:3
一、全文检索系统概况1.全文检索系统应具备的功能一个全文检索系统至少要具备两个功能:仅)文章中任何有意义的词、字都可被检索。(2)能对检索词之间的关系进行位置和逻辑操作。另外,全文检索的响应时间应在秒级以内。2.本文全文检索的善本技术目前,已开发出来的中文全文检索系统,其基本技术可归纳为三种类型:(1)主题词索弓I。建立主题词索弓l。根据主题词典,对检索条件中切分后相邻自由词组合与主题词典匹配。得出检索结果。(2)词索引。对源文献进行分词,抽词,用切分获得的词的全体作为标引词,据此建立索引文件。检索时… 相似文献
10.
电子技术的发展使得辞书的载体,查检和阅读方式发生了根本的变化,电子词典有着传统文本词典无法比拟的优越性;多种多样的检索方式,便利的查询窗口,灵活的显示界面和连续的参见功能,无一不体现这种知识媒体的智能化和人性化特色;电子多媒体在词典中的应用,更使得单调,呆板的词典变得形象生动,在词典中获取知识变得轻松和快捷。 相似文献
11.
Giovanni Nencioni 《Computers and the Humanities》1990,24(5-6):345-352
The Accademia della Crusca is involved in historical, philological and lexicographical research into the Italian language. First, we provide some background information on the Accademia. Second, we discuss the problems of selectivity and inertia of nineteenth century lexicography, and define modern day requirements. We then consider the contribution of the computer to modern lexicography, and some computer-based dictionary projects, including those undertaken specifically at the Accademia.
Giovanni Nencioni is the President of the Accademia della Crusca (Florence, Italy), quondam professor of Italian linguistics in the Scuola Normale Superiore, Pisa. His main research interests are the history of the literary Italian language and lexicography. He has two major publications: Tra grammatica e retorica, Torino: Einaudi Paperbacks, 1983; and Saggi di lingua antica e moderna, Torino: Rosenberg e Sellier, 1989. 相似文献
12.
Gary 《Data & Knowledge Engineering》2002,42(3):293-314
This paper introduces database lexicography, a metadata analysis discipline that applies lexical graph theory to data design. Database lexicography proposes a formal design criterion for data dependencies, and it provides metrics to evaluate the conformance of designs to this criterion. It treats the data dictionary as a first class object encoding design concepts, and its benefits include identification of database dependency architecture; quantification of interdependent data elements' sensitivity to change; categorization of core and peripheral data elements; model integration; and figures of merit by which to fortify data architectures to withstand design fossilization and guide their evolution amidst changing requirements. 相似文献
13.
语料库语言学是借助大规模语料库对语言现象进行发现、挖掘的学科,目前已经存在很多在线语料库辅助语言学的研究。该文提供了一个按时间分片进行管理的语料库,并基于此提出了一个由社区维护的在线词典编纂系统,该系统将语料库查询结果动态结合在被编辑的词条中。该文还介绍了一个多义词词义发现和层次化聚类算法,用以自动生成一个默认的词条框架。该文概述了词典编纂系统的总体情况,重点介绍系统的设计和使用方法。 相似文献
14.
General-purpose database management systems, whose structure is built in, are not an appropriate solution to situations where problems of translation or areas of research cannot be bounded in advance, for example, when lexicography and linguistic research are closely related. Consequently, an original system has been developed, and is being applied to linguistic and lexicographical data on the Somali language.Jacqueline Lecarme has a master's degree in Lettres Classiques (University of Grenoble, 1969) and a Ph.D. in linguistics (University of Montreal, 1978). Carole Maury has a master's degree and a doctorate in computer science (University of Nice, 1986). 相似文献
15.
翻译等价对在词典编纂、机器翻译和跨语言信息检索中有着广泛的应用。文章从双语句对的译文等价树中抽取翻译等价对。使用译文直译率、短语对齐概率和目标语-源语言短语长度差异等特征对自动获取的等价对进行评价。提出了一种基于多重线性回归模型的等价对评价方法,并结合N-Best策略对候选翻译等价对进行过滤。实验结果表明:在开放测试中,基于多重线性回归模型的等价对评价及过滤方法其性能要优于其它方法。 相似文献
16.
基于未对齐汉英双语库的翻译对抽取 总被引:5,自引:2,他引:3
本文主要研究基于未对齐的汉英双语库翻译对抽取。文章首先介绍了Pascale Fung在这方面设计的两个算法。在此基础上,文章对后一种算法进行了部分的改进,使得其更适合于真实双语文本的翻译对抽取。实现结果表明改进后算法的有效性。本方法可以用于基于大规模双语语料库的短语翻译抽取、词典编纂等应用,具有较高的应用价值。 相似文献
17.
The article is devoted to the problem of word sense induction. We propose a method for inducing senses from a raw text corpus. The proposed sense induction algorithm (called SenseSearcher, or SnS) is based on closed frequent sets, and as a result, it provides a multilevel sense representation. To a large extent, it is a knowledge‐poor approach, as it does not need any kind of structured knowledge base about senses and there is no deep language knowledge embedded. By discovering a hierarchy of senses, the algorithm enables identifying subsenses (fine‐grained senses). SnS discovers not only frequent (dominating) senses but also infrequent ones (dominated). The method was evaluated in two main areas: lexicography and information retrieval. With the use of the SnS algorithm, we provide a tool able to induce from a textual corpus a structure of senses, with a varying number of granularity levels. In the area of information retrieval, SnS can be used for clustering search result, according to the discovered senses. The experiments have shown that SnS performs better than the methods participating in the SemEval2013 WSI Task 11 competition, and most of the known search result clustering methods. 相似文献
18.
We present a new method to describe the contextual meaning of a key word in a corpus. The vocabulary of the sentences containing this word is compared to that of the entire corpus in order to highlight the words which are significantly overutilized in the neighbourhood of this key word (they are associated in the author’s mind) and the ones which are significantly underutilized (they are mutually exclusive). This method provides an interesting tool for lexicography and literary studies as is shown by applying it to the word amour (love) in the work of Pierre Corneille, the most famous French playwright of the 17th century. 相似文献
19.
Jeffrey F. Huntsman 《Language Resources and Evaluation》1978,12(1-2):53-60
Dictionaries and related language reference works constitute a rich but under-exploited resource for the history of languages and of language study in the Middle Ages. Unfortunately, the size and complexity of typical medieval dictionaries make editions and analyses by traditional methods prohibitively expensive in time and money. Using as an example the Latin-Middle English dictionaryMedulla grammatice, the paper describes some central problems in the study of medieval English lexicography and the solutions provided by computers, which, with their immense speed, profound memory, and perfect accuracy can help scholars analyze, edit, and promulgate medieval documents and the linguistic data they contain. 相似文献
20.