首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interlingua and transfer-based approaches tomachine translation have long been in use in competing and complementary ways. The former proves economical in situations where translation among multiple languages is involved, and can be used as a knowledge-representation scheme. But given a particular interlingua, its adoption depends on its ability (a) to capture the knowledge in texts precisely and accurately and (b) to handle cross-language divergences. This paper studies the language divergence between English and Hindi and its implication to machine translation between these languages using the Universal Networking Language (UNL). UNL has been introduced by the United Nations University, Tokyo, to facilitate the transfer and exchange of information over the internet. The representation works at the level of single sentences and defines a semantic net-like structure in which nodes are word concepts and arcs are semantic relations between these concepts. The language divergences between Hindi, an Indo-European language, and English can be considered as representing the divergences between the SOV and SVO classes of languages. The work presented here is the only one to our knowledge that describes language divergence phenomena in the framework of computational linguistics through a South Asian language.  相似文献   

2.
There has been a growing interest in the use of networked virtual environment (NVE) technology to implement telepresence that allows participants to interact with each other in shared cyberspace. In addition, nonverbal language has attracted increased attention because of its association with more natural human communication, and especially sign languages play an important role for the hearing impaired. This paper proposes a novel real-time nonverbal communication system by introducing an artificial intelligence method into the NVE. We extract semantic information as an interlingua from the input text through natural language processing, and then transmit this semantic feature extraction (SFE) to the three-dimensional (3-D) articulated humanoid models prepared for each client in remote locations. Once the SFE is received, the virtual human is animated by the synthesized SFE. Experiments with Japanese and Chinese sign languages show this system makes the real-time animation of avatars available for the participants when chatting with each other. The communication is more natural since it is not just based on text or predefined gesture icons. This proposed system is suitable for sign language distance training as well.  相似文献   

3.
基于中间语言的翻译系统的关键是中间语言设计。文中介绍了汉英翻译系统ICENT 中间语言设计,实用的角度出发,在内容上权衡了语法和语义知识的比例,并根据汉英两种语言的特点,采用框架结构,通过描述词之间的语法关系来描述整个句子的结构,取得了比较满意的结果。  相似文献   

4.
DeConverter is core software in a Universal Networking Language(UNL) system.A UNL system has EnConverter and DeConverter as its two major components.EnConverter is used to convert a natural language sentence into an equivalent UNL expression,and DeConverter is used to generate a natural language sentence from an input UNL expression.This paper presents design and development of a Punjabi DeConverter.It describes five phases of the proposed Punjabi DeConverter,i.e.,UNL parser,lexeme selection,morphology generation,function word insertion,and syntactic linearization.This paper also illustrates all these phases of the Punjabi DeConverter with a special focus on syntactic linearization issues of the Punjabi DeConverter.Syntactic linearization is the process of defining arrangements of words in generated output.The algorithms and pseudocodes for implementation of syntactic linearization of a simple UNL graph,a UNL graph with scope nodes and a node having un-traversed parents or multiple parents in a UNL graph have been discussed in this paper.Special cases of syntactic linearization with respect to Punjabi language for UNL relations like ’and’,’or’,’fmt’,’cnt’,and ’seq’ have also been presented in this paper.This paper also provides implementation results of the proposed Punjabi DeConverter.The DeConverter has been tested on 1000 UNL expressions by considering a Spanish UNL language server and agricultural domain threads developed by Indian Institute of Technology(IIT),Bombay,India,as gold-standards.The proposed system generates 89.0% grammatically correct sentences,92.0% faithful sentences to the original sentences,and has a fluency score of 3.61 and an adequacy score of 3.70 on a 4-point scale.The system is also able to achieve a bilingual evaluation understudy(BLEU) score of 0.72.  相似文献   

5.
Discourse parsing has become an inevitable task to process information in the natural language processing arena. Parsing complex discourse structures beyond the sentence level is a significant challenge. This article proposes a discourse parser that constructs rhetorical structure (RS) trees to identify such complex discourse structures. Unlike previous parsers that construct RS trees using lexical features, syntactic features and cue phrases, the proposed discourse parser constructs RS trees using high‐level semantic features inherited from the Universal Networking Language (UNL). The UNL also adds a language‐independent quality to the parser, because the UNL represents texts in a language‐independent manner. The parser uses a naive Bayes probabilistic classifier to label discourse relations. It has been tested using 500 Tamil‐language documents and the Rhetorical Structure Theory Discourse Treebank, which comprises 21 English‐language documents. The performance of the naive Bayes classifier has been compared with that of the support vector machine (SVM) classifier, which has been used in the earlier approaches to build a discourse parser. It is seen that the naive Bayes probabilistic classifier is better suited for discourse relation labeling when compared with the SVM classifier, in terms of training time, testing time, and accuracy.  相似文献   

6.
We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language translation between English and Mandarin Chinese for limited domains, such as air travel reservations. In MARS, machine translation is embedded within a complex speech processing task, and the translation performance is highly effected by the performance of other components, such as the recognizer and semantic parser, etc. All components in the proposed system are statistically trained using an appropriate training corpus. The speech signal is first recognized by an automatic speech recognizer (ASR). Next, the ASR-transcribed text is analyzed by a semantic parser, which uses a statistical decision-tree model that does not require hand-crafted grammars or rules. Furthermore, the parser provides semantic information that helps further re-scoring of the speech recognition hypotheses. The semantic content extracted by the parser is formatted into a language-independent tree structure, which is used for an interlingua based translation. A Maximum Entropy based sentence-level natural language generation (NLG) approach is used to generate sentences in the target language from the semantic tree representations. Finally, the generated target sentence is synthesized into speech by a speech synthesizer.Many new features and innovations have been incorporated into MARS: the translation is based on understanding the meaning of the sentence; the semantic parser uses a statistical model and is trained from a semantically annotated corpus; the output of the semantic parser is used to select a more specific language model to refine the speech recognition performance; the NLG component uses a statistical model and is also trained from the same annotated corpus. These features give MARS the advantages of robustness to speech disfluencies and recognition errors, tighter integration of semantic information into speech recognition, and portability to new languages and domains. These advantages are verified by our experimental results.  相似文献   

7.
Using english to retrieve software   总被引:2,自引:0,他引:2  
  相似文献   

8.
9.
Natural language requirements specifications form the basis for the subsequent phase of the information system development process, namely the development of conceptual schemata. Both, the textual as well as the conceptual representations are not really appropriate for being thoroughly captured and validated by the ‘requirement holders’, i.e. the end users. Therefore, in our approach the textual specifications are firstly linguistically analyzed and translated into a so-called conceptual predesign schema. That schema is formulated using an interlingua which is based on a lean semantic model, thus allowing users to participate more efficiently in the design and validation process. After validation, the predesign schema is mapped to a conceptual representation (e.g. UML). The sequence of these translation and transformation steps is described by the “NIBA workflow”. This paper focuses on the information supporting a step by step mapping of natural language requirements specifications to a conceptual model, and on how that information is gained. On particular, we present a four-level interpretation of tagging-output.  相似文献   

10.
Concepts and relations in ontologies and in other knowledge organisation systems are usually annotated with natural language labels. Most ontology matchers rely on such labels in element-level matching techniques. State-of-the-art approaches, however, tend to make implicit assumptions about the language used in labels (usually English) and are either domain-agnostic or are built for a specific domain. When faced with labels in different languages, most approaches resort to general-purpose machine translation services to reduce the problem to monolingual English-only matching. We investigate a thoroughly different and highly extensible solution based on semantic matching where labels are parsed by multilingual natural language processing and then matched using language-independent and domain aware background knowledge acting as an interlingua. The method is implemented in NuSM, the language and domain aware evolution of the SMATCH semantic matcher, and is evaluated against a translation-based approach. We also design and evaluate a fusion matcher that combines the outputs of the two techniques in order to boost precision or recall beyond the results produced by either technique alone.  相似文献   

11.
This paper proposed the techniques of ontology and linguistics to develop a fully-automatic annotation technique, coupling with an automatic ontology construction method, could play a key role in the development of Semantic Portals. An ontology-supported portal architecture: OntoPortal was proposed according to this technique, in which three internal components Portal Interface, Semantic Portal, and OntoCrawler was integrated to rapidly and precisely collect information on Internet and capture true user’s intention and accordingly provide high-quality query answers to meet the user requests. This paper also demonstrated the OntoPortal prototype which defined how a semantic portal is interacting with the user by providing five different types of interaction patterns such as including keyword search, synonym search, POS (Part-of-Speech)-constrained keyword search, natural language query, and semantic index search. The preliminary experiment outcomes proved the technology proposed in this paper to be able to really up-rise the precision and recall rates of webpage searching and accordingly showed that it can indeed retrieve better semantic-directed information to meet user requests.  相似文献   

12.
Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.  相似文献   

13.
词向量在自然语言处理中起着重要的作用,近年来受到越来越多研究者的关注。然而,传统词向量学习方法往往依赖于大量未经标注的文本语料库,却忽略了单词的语义信息如单词间的语义关系。为了充分利用已有领域知识库(包含丰富的词语义信息),文中提出一种融合语义信息的词向量学习方法(KbEMF),该方法在矩阵分解学习词向量的模型上加入领域知识约束项,使得拥有强语义关系的词对获得的词向量相对近似。在实际数据上进行的单词类比推理任务和单词相似度量任务结果表明,KbEMF比已有模型具有明显的性能提升。  相似文献   

14.
Software modeling based on the assembly of reusable components to support software development has not been successfully implemented on a wide scale. Several models for reusable software components have been suggested which primarily address the wiring-level connectivity problem. While this is considered necessary, it is not sufficient to support an automated process of component assembly. Two critical issues that remain unresolved are (1) semantic modeling of components, and (2) deployment process that supports automated assembly. The first issue can be addressed through domain-based standardization that would make it possible for independent developers to produce interoperable components based on a common set of vocabulary and understanding of the problem domain. This is important not only for providing a semantic basis for developing components but also for the interoperability between systems. The second issue is important for two reasons: (a) eliminate the need for developers to be involved in the final assembly of software components, and (b) provide a basis for the development process to be potentially driven by the user. To resolve the above remaining issues (1) and (2), a late binding mechanism between components based on meta-protocols is required. In this paper we address the above issues by proposing a generic framework for the development of software components and an interconnection language, COMPILE, for the specification of software systems from components. The computational model of the COMPILE language is based on late and dynamic binding of the components' control, data, and function properties [1] through the use of adapters. The use of asynchronous callbacks for method invocation allows control binding among components to be late and dynamic. Data exchanged between components is defined through the use of a meta-language that can describe the semantics of the information but without being bound to any specific programming language type representation. Late binding to functions is accomplished by maintaining domain-based semantics as component meta-information. This information allows clients of components to map generic requested service to specific functions.  相似文献   

15.
Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural language query. In this paper, we propose to use tree-structured neural networks constructed based on the constituency tree to model natural language queries. We identify an interesting observation in the constituency tree: different constituents have their own semantic characteristics and might be suitable to solve different subtasks in a QA system. Based on this point, we incorporate the type information as an auxiliary supervision signal to improve the QA performance. We call our approach type-aware QA. We jointly characterize both the answer and its answer type in a unified neural network model with the attention mechanism. Instead of simply using the root representation, we represent the query by combining the representations of different constituents using task-specific attention weights. Extensive experiments on public datasets have demonstrated the effectiveness of our proposed model. More specially, the learned attention weights are quite useful in understanding the query. The produced representations for intermediate nodes can be used for analyzing the effectiveness of components in a QA system.  相似文献   

16.
Technology in the field of digital media generates huge amounts of nontextual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and user-friendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). The traditional solution employs keyword-based search. The only documents retrieved are those containing user-specified keywords. But many documents convey desired semantic information without containing these keywords. This limitation is frequently addressed through query expansion mechanisms based on the statistical co-occurrence of terms. Recall is increased, but at the expense of deteriorating precision. One can overcome this problem by indexing documents according to context and meaning rather than keywords, although this requires a method of converting words to meanings and the creation of a meaning-based index structure. We have solved the problem of an index structure through the design and implementation of a concept-based model using domain-dependent ontologies. An ontology is a collection of concepts and their interrelationships that provide an abstract view of an application domain. With regard to converting words to meaning, the key issue is to identify appropriate concepts that both describe and identify documents as well as language employed in user requests. This paper describes an automatic mechanism for selecting these concepts. An important novelty is a scalable disambiguation algorithm that prunes irrelevant concepts and allows relevant ones to associate with documents and participate in query generation. We also propose an automatic query expansion mechanism that deals with user requests expressed in natural language. This mechanism generates database queries with appropriate and relevant expansion through knowledge encoded in ontology form. Focusing on audio data, we have constructed a demonstration prototype. We have experimentally and analytically shown that our model, compared to keyword search, achieves a significantly higher degree of precision and recall. The techniques employed can be applied to the problem of information selection in all media types.Received: 7 October 2002, Accepted: 20 May 2003, Published online: 30 September 2003Edited by: E. LochovskyThis research has been funded [or funded in part] by the Integrated Media Systems Center, a National Science Foundation Engineering Research Center, Cooperative Agreement No. EEC-9529152.  相似文献   

17.
Semantic role labeling (SRL) is a fundamental task in natural language processing to find a sentence-level semantic representation. The semantic role labeling procedure can be viewed as a process of competition between many order parameters, in which the strongest order parameter will win by competition and the desired pattern will be recognized. To realize the above-mentioned integrative SRL, we use synergetic neural network (SNN). Since the network parameters of SNN directly influence the synergetic recognition performance, it is important to optimize the parameters. In this paper, we propose an improved particle swarm optimization (PSO) algorithm based on log-linear model and use it to effectively determine the network parameters. Our contributions are two-folds: firstly, a log-linear model is introduced to PSO algorithm which can effectively make use of the advantages of a variety of different knowledge sources, and enhance the decision making ability of the model. Secondly, we propose an improved SNN model based on the improved PSO and show its effectiveness in the SRL task. The experimental results show that the proposed model has a higher performance for semantic role labeling with more powerful global exploration ability and faster convergence speed, and indicate that the proposed model has a promising future for other natural language processing tasks.  相似文献   

18.
Chinese-English machine translation is a significant and challenging problem in information processing.The paper presents an interlingua-based Chinese-English natural language translation system(ICENT).It introduces the realization mechanism of Chinses language analysis,which contains syntactic parsing and semantic analyzing and gives the design of interlingua in details .Experimental results and system evaluation are given .The sesult is satisfying.  相似文献   

19.
自然语言转换为结构化查询语言(NL2SQL)是语义解析领域的重要任务,其核心为对数据库模式和自然语言问句进行联合学习。现有研究通过将整个数据库模式和自然语言问句联合编码构建异构图,使得异构图中引入大量无用信息,并且忽略了数据库模式中不同信息的重要性。为提高NL2SQL模型的逻辑与执行准确率,提出一种基于自裁剪异构图与相对位置注意力机制的NL2SQL模型(SPRELA)。采用序列到序列的框架,使用ELECTRA预训练语言模型作为骨干网络。引入专家知识,对数据库模式和自然语言问句构建初步异构图。基于自然语言问句对初步异构图进行自裁剪,并使用多头相对位置注意力机制编码自裁剪后的数据库模式与自然语言问句。利用树型解码器和预定义的SQL语法,解码生成SQL语句。在Spider数据集上的实验结果表明,SPRELA模型执行准确率达到71.1%,相比于相同参数量级别的RaSaP模型提升了1.1个百分点,能够更好地将数据库模式与自然语言问句对齐,从而理解自然语言查询中的语义信息。  相似文献   

20.
语义相似度计算就是把词语间语言学上的信息映射为0到1之间的数值.基于知识本体的语义相似度计算方法,利用知识本体提供的信息,建立词语关系和语义相似度之间的函数关系,该方法可解释性强、使用简单,成为语义相似度计算的一类重要方法.提出了一种基于《同义词词林》的语义相似度计算模型,该模型运用遗传算法探索了《同义词词林》语义编码...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号