首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
In this paper, we present the building of various language resources for a multi-engine bi-directional English-Filipino Machine Translation (MT) system. Since linguistics information on Philippine languages are available, but as of yet, the focus has been on theoretical linguistics and little is done on the computational aspects of these languages, attempts are reported here on the manual construction of these language resources such as the grammar, lexicon, morphological information, and the corpora which were literally built from almost non-existent digital forms. Due to the inherent difficulties of manual construction, we also discuss our experiments on various technologies for automatic extraction of these resources to handle the intricacies of the Filipino language, designed with the intention of using them for the MT system. To implement the different MT engines and to ensure the improvement of translation quality, other language tools (such as the morphological analyzer and generator, and the part of speech tagger) were developed.  相似文献   

2.
This paper describes a system of morphological and syntactic parsing of the Hebrew language. It contains an extensive morphological analyzer and an augmented transition network-based syntactic parser. The system has been written in the YLISP dialect of Lisp. A parallel effort for English (different grammars that use the same parsing software) has also been developed.  相似文献   

3.
This paper reviews the current state of the art in Natural LanguageProcessing for Hebrew, both theoretical and practical. The Hebrewlanguage, like other Semitic languages, poses special challenges fordevelopers of programs for natural language processing: the writingsystem, rich morphology, unique word formation process of roots andpatterns, lack of linguistic corpora that document language usage, allcontribute to making computational approaches to Hebrew challenging. The paper briefly reviews the field of computational linguistics andthe problems it addresses, describes the special difficulties inherentto Hebrew (as well as to other Semitic languages), surveys a widevariety of past and ongoing works and attempts to characterize futureneeds and possible solutions.  相似文献   

4.
A novel Italian Sign Language MultiWordNet (LMWN), which integrates the MultiWordNet (MWN) lexical database with the Italian Sign Language (LIS), is presented in this paper. The approach relies on LIS lexical resources which support and help to search for Italian lemmas in the database and display corresponding LIS signs. The lexical frequency analysis of the lexicon and some newly created signs approved by expert LIS signers are also discussed. The larger MWN database helps to enrich the variety and comprehensiveness of the lexicon. We also describe the approach which links the Italian lemmas and LIS signs to extract and display bilingual information from the collected lexicon and the semantic relationships of LIS Signs with MWN. The users can view the meanings of almost one fourth of the lemmas of MWN in LIS.  相似文献   

5.
基于WordNet和自然语言处理技术的半自动领域本体构建   总被引:3,自引:0,他引:3  
现有的大多数本体都是通过手工构建的,然而,本体的构建是一项非常费时费力的过程,近年来有关如何半自动地构建领域本体的研究越来越多。本文提出了一种基于WordNet和自然语言处理技术的领域本体半自动构建方法,该方法能够大大提高本体的构建效率,并且一定程度上能够保证结果本体的质量。实验表明,本文的方法在一定程度上令本体的生成过程实现自动化。  相似文献   

6.
We are delighted to bring you this special issue on speech and language processing for assistive technology. It addresses an important research area that is gaining increased recognition from researchers in speech and language processing as a rich and fulfilling area on which to focus their work, and by researchers in assistive technology as the means to dramatically improve communication technologies for individuals with disabilities. This special issue brings a wide swath of approaches and applications highlighting the variety this area offers.  相似文献   

7.
This paper examines the technologies that enable the representation of Hebrew on websites. Hebrew is written from right to left and in non‐Latin characters, issues shared by a number of languages which seem to be converging on a shared solution—Unicode. Regarding the case of Hebrew, I show how competing solutions have given way to one dominant technology. I link processes in the Israeli context with broader questions about the ‘multilingual Internet,’ asking whether the commonly accepted solution for representing non‐Latin texts on computer screens is an instance of cultural imperialism and convergence around a western artifact. It is argued that while minority languages are given an online voice by Unicode, the context is still one of western power.  相似文献   

8.
In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integration and interoperability of lexical resources, in the view of developing a prototype web application to support the GlobalWordNet Grid initiative.
Claudia SoriaEmail:
  相似文献   

9.
吕律 《计算机工程》2010,36(7):73-75
针对本体映射精确度不高的问题,提出一种基于自然语言处理的本体映射结果校验方法。对复合词进行启发式处理,分析WordNet词库中词汇所对应注释的语法树,提取与参考本体和目标本体相关的词汇,对已有的本体映射结果进行校验。实验结果表明,该方法能有效提高本体映射的精确度。  相似文献   

10.
传统的一问一答实验教学模式不能适应现代高等教育的需要。文中介绍了充分利用网络和相关软件。以互动方式向学生传授科学的调试程序方法,以实验报告电子文档的规范化及资源共享,实现了教学相长。实践证明取得了良好的教学效果。  相似文献   

11.
传统的一问一答实验教学模式不能适应现代高等教育的需要。文中介绍了充分利用网络和相关软件,以互动方式向学生传授科学的调试程序方法,以实验报告电子文档的规范化及资源共享,实现了教学相长。实践证明取得了良好的教学效果。  相似文献   

12.
Spoken language resources (SLRs) are essential for both research and application development. In this article we clarify the concept of SLR validation. We define validation and how it differs from evaluation. Further, relevant principles of SLR validation are outlined. We argue that the best way to validate SLRs is to implement validation throughout SLR production and have it carried out by an external and experienced institute. We address which tasks should be carried out by the validation institute, and which not. Further, we list the basic issues that validation criteria for SLR should address. A standard validation protocol is shown, illustrating how validation can prove its value throughout the production phase in terms of pre-validation, full validation and pre-release validation.
Henk van den HeuvelEmail:
  相似文献   

13.
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting natural language processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that a consensual specification on monolingual, bilingual and multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of Lexical Markup Framework (LMF, ISO-24613) is to define a standard for lexicons that covers multilingual lexical data.
Claudia SoriaEmail:
  相似文献   

14.
近年来,随着深度学习的快速发展,面向自然语言处理领域的预训练技术获得了长足的进步。早期的自然语言处理领域长期使用Word2Vec等词向量方法对文本进行编码,这些词向量方法也可看作静态的预训练技术。然而,这种上下文无关的文本表示给其后的自然语言处理任务带来的提升非常有限,并且无法解决一词多义问题。ELMo提出了一种上下文相关的文本表示方法,可有效处理多义词问题。其后,GPT和BERT等预训练语言模型相继被提出,其中BERT模型在多个典型下游任务上有了显著的效果提升,极大地推动了自然语言处理领域的技术发展,自此便进入了动态预训练技术的时代。此后,基于BERT的改进模型、XLNet等大量预训练语言模型不断涌现,预训练技术已成为自然语言处理领域不可或缺的主流技术。文中首先概述预训练技术及其发展历史,并详细介绍自然语言处理领域的经典预训练技术,包括早期的静态预训练技术和经典的动态预训练技术;然后简要梳理一系列新式的有启发意义的预训练技术,包括基于BERT的改进模型和XLNet;在此基础上,分析目前预训练技术研究所面临的问题;最后对预训练技术的未来发展趋势进行展望。  相似文献   

15.
文章论述了一种面向对象的数据库管理系统的直观对象查询语言。这种基于图形的查询语言可支持图表式查询语言的定义和说明,因此这种语言既有基于文本的结构查询语言的表达能力,也具有基于图形查询语言的直观性。  相似文献   

16.
We briefly discuss the origin and development of WordNet, a large lexical database for English. We outline its design and contents as well as its usefulness for Natural Language Processing. Finally, we discuss crosslinguistic WordNets and complementary lexical resources.
Christiane FellbaumEmail:
  相似文献   

17.
We present a method for combining two bilingual dictionaries to make a third, using one language as a pivot. In this case we combine a Japanese-English dictionary with a Malay-English dictionary, to produce a Japanese-Malay dictionary. Our method differs from previous methods in its improved matching through normalization of the pivot language. We have made a prototype dictionary of around 76,000 Japanese-Malay pairs for 50,000 Japanese head words.  相似文献   

18.
随着印尼语、马来语互联网普及率的上升,对海量印尼语、马来语文本进行信息处理存在重大需求.虽然研究人员对印尼语、马来语展开较广泛的研究,但是作为低资源语言,受到的关注远不及通用语,未能较好利用前沿的深度学习方法.文中梳理总结包括词法分析、句法分析、机器翻译、拼写检查等印尼语、马来语相关的自然语言处理技术.对比分析相关的研究成果发现,大多数研究因语料规模及评测标准不同难以客观对比各种算法的差异.最后结合印尼语、马来语现有的各领域语言资源开放情况,指出印尼语、马来语的自然语言处理研究面临的问题,并展望未来发展趋势.  相似文献   

19.
Automatic identification of handprinted Hebrew characters is described in this paper. The recognition model devised constitutes a multi-stage system. In the first stage a coarse classifier allocates the input patterns into one of 17 categories, based on the number and the location of end points within predetermined regions in the characters matrix. The second stage uses features extracted in the Hough transform space to classify characters assigned to each of 16 categories. The remaining one category, composed of similar, square-like (rotated L shape) classes, is recognized by structural analysis and a statistical classifier. An additional step of postprocessing is added to compensate for the sensitivity of the Hough transform to the existence of similar classes within some of the categories. Experiments were conducted with a multi-author (40 writers) data base. An average recognition rate of 86.9% was observed for the system. This compared favorably with the results of two other recognition methods.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号