首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper studies the Semantic Network Language Generation (SNLG), which is used to generate natural language from the information represented as Semantic Networks (SN). After a brief analysis of the challenges faced by SNLG, a Semantic Network Serialization Grammar (SNSG) is proposed to generate natural language from semantic networks. The SNSG is constituted by four components: (a) a semantic pattern approach to serializing a trivial semantic star into a language stream. (b) a transformative generation to serialize a trivial semantic tree by serializing semantic star recursively. (c) a procedure of trivialization to convert any complicated semantic star or semantic tree into composition of trivial semantic tree. (d) a mechanism of semantic pattern priority and semantic pattern network to guarantee a sentence generated from a semantic tree to be well formed. Based on the SNSG, a new approach of the content planning for SNLG is proposed to improve the content integrity. For discourse planning, a trivialization time splitting method is presented to make well-formed sentence, and a splitting time aggregation method is proposed to improve the readability of sentence. Finally a fully semantized Semantic Wiki system, the Natural Wiki, is developed to verify and demonstrate the theory and techniques addressed in this paper.  相似文献   

2.
通过大量记录的正确处理实例的分析过程和结果,在句法分析时,搜寻近似实例或片段,匹配相似语言结构和分析过程,这样的句法分析体现了“语言分析依赖经验”的思想。基于这样的思想,本文提出了一种基于模式匹配的句法分析的方法,即从大规模标注语料树库中抽取出蕴含的句法模式,构建模式、子模式及其规约库,句法分析的过程转化为模式匹配和局部模式转换的过程。实验表明句法分析的各项指标都比较理想,尤其是处理效率很高,平均句耗时0.46秒(CPU为Intel双核2.8G,内存为1G)。  相似文献   

3.
基于汉语自然语言信息查询的计算机理解实现   总被引:7,自引:0,他引:7  
刘忠  王成道 《计算机应用》2004,24(1):8-10,13
文中根据汉语的二层语义分析结构。深层语义结构-语意指向,表层语义结构-语义指向:针对四种汉语疑问句型进行具体分析其在计算机理解实现中的理论方法和规则;在进行正确的汉语词汇切分之后;根据语意指向与语义指向建立起各词汇的本体言语和本体行为标注,进行组合词汇生成符合语意的短语,再进行本体行为转化为本体言语的研究,归结为专业数据库的语义;最后通过实验系统得以验证。  相似文献   

4.
Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of sentence chunking. With respect to processing natural language, ILP can cater for the constant change in how we use language on a daily basis. At the same time, ILP does not require huge amounts of training examples such as other statistical methods and produces interpretable results, that means a set of rules, which can be analysed and tweaked if necessary. As contributions we extend XHAIL with (i) a pruning mechanism within the hypothesis generalisation algorithm which enables learning from larger datasets, (ii) a better usage of modern solver technology using recently developed optimisation methods, and (iii) a time budget that permits the usage of suboptimal results. We evaluate these improvements on the task of sentence chunking using three datasets from a recent SemEval competition. Results show that our improvements allow for learning on bigger datasets with results that are of similar quality to state-of-the-art systems on the same task. Moreover, we compare the hypotheses obtained on datasets to gain insights on the structure of each dataset.  相似文献   

5.

Natural language processing techniques contribute more and more in analyzing legal documents recently, which supports the implementation of laws and rules using computers. Previous approaches in representing a legal sentence often based on logical patterns that illustrate the relations between concepts in the sentence, often consist of multiple words. Those representations cause the lack of semantic information at the word level. In our work, we aim to tackle such shortcomings by representing legal texts in the form of abstract meaning representation (AMR), a graph-based semantic representation that gains lots of polarity in NLP community recently. We present our study in AMR Parsing (producing AMR from natural language) and AMR-to-text Generation (producing natural language from AMR) specifically for legal domain. We also introduce JCivilCode, a human-annotated legal AMR dataset which was created and verified by a group of linguistic and legal experts. We conduct an empirical evaluation of various approaches in parsing and generating AMR on our own dataset and show the current challenges. Based on our observation, we propose our domain adaptation method applying in the training phase and decoding phase of a neural AMR-to-text generation model. Our method improves the quality of text generated from AMR graph compared to the baseline model. (This work is extended from our two previous papers: “An Empirical Evaluation of AMR Parsing for Legal Documents”, published in the Twelfth International Workshop on Juris-informatics (JURISIN) 2018; and “Legal Text Generation from Abstract Meaning Representation”, published in the 32nd International Conference on Legal Knowledge and Information Systems (JURIX) 2019.).

  相似文献   

6.
7.
由于GIS中文查询语句与空间扩展SQL语句相差很大,直接转化非常困难,所以需要有某种中间语言作为过渡。本文对GIS中文查询系统中间语言的形成进行了研究,提出了以句子栈、实体栈、查询目标栈、查询条件栈和句型字符串为结构的中间语言,制订了空间查询语句的文法规则,设计了GIS中文查询语句到中间语言转换的算法。实验证明,该算法可以完成大部分查询语句到中间语言的转化。  相似文献   

8.
An intelligent machine can be thought of as a human friendly machine system that identifies or understands the problems of generating tasks, developing plans, compiling and executing the tasks automatically. High performance dependable intelligent systems must understand and translate natural languages. The translation of natural languages for intelligent systems has been one of the most challenging problems in intelligent systems from the very beginning. It is the responsibility of a translation system to assign the responsibility of task generation ability of the machine to automate a program generation.

In this paper, the problem of advanced machine translation capabilities is approached by examining the Sinhala natural language. Sinhalese has not been analyzed using computational linguistics. Our earlier system on Sinhalese morphology is the first attempt of such a study. This paper extends it to syntactic and semantic analysis. We formalize grammar rules for unit, phrase, clause and sentence, and developed a semantically characteristic Sinhalese dictionary, and a conceptual dictionary based on English, Japanese, and Sinhalese. Syntactic and semantic analyses are implemented on the computer and sound experimental results are obtained.  相似文献   

9.
Symbolic connectionism in natural language disambiguation   总被引:1,自引:0,他引:1  
Natural language understanding involves the simultaneous consideration of a large number of different sources of information. Traditional methods employed in language analysis have focused on developing powerful formalisms to represent syntactic or semantic structures along with rules for transforming language into these formalisms. However, they make use of only small subsets of knowledge. This article describes how to use the whole range of information through a neurosymbolic architecture which is a hybridization of a symbolic network and subsymbol vectors generated from a connectionist network. Besides initializing the symbolic network with prior knowledge, the subsymbol vectors are used to enhance the system's capability in disambiguation and provide flexibility in sentence understanding. The model captures a diversity of information including word associations, syntactic restrictions, case-role expectations, semantic rules and context. It attains highly interactive processing by representing knowledge in an associative network on which actual semantic inferences are performed. An integrated use of previously analyzed sentences in understanding is another important feature of our model. The model dynamically selects one hypothesis among multiple hypotheses. This notion is supported by three simulations which show the degree of disambiguation relies both on the amount of linguistic rules and the semantic-associative information available to support the inference processes in natural language understanding. Unlike many similar systems, our hybrid system is more sophisticated in tackling language disambiguation problems by using linguistic clues from disparate sources as well as modeling context effects into the sentence analysis. It is potentially more powerful than any systems relying on one processing paradigm  相似文献   

10.
11.
12.
汉语文章中复句占多数, 复句关系类别的识别是对复句分句之间的语义关系的甄别, 是分析复句语义的关键. 在关系词非充盈态复句中, 部分关系词缺省, 因此, 不能通过关系词搭配的规则来对非充盈态复句进行类别识别, 且通过人工分析分句的特征进行类别识别费时费力. 本文以二句式非充盈态复句为研究对象, 采用在卷积神经网络中融合关系词特征的FCNN模型, 尽可能减少对语言学知识和语言规则的依赖, 通过学习自动分析两个分句之间语法语义等特征, 从而识别出复句的关系类别. 使用本文提出的方法对复句关系类别识别准确率达97%, 实验结果证明了该方法的有效性.  相似文献   

13.
自动文摘系统中一个关键的问题是找出能构成摘要的重点句子。找出这些句子的方法很多,但用机器学习的方法却较少,该文提出了一种关于文摘句式的自动学习方法。该方法以经过简单的预处理的若干语句为训练样本集,以正例句为基点进行由底向上的泛化学习,抽象出关于句式的一般概念,形成句式规则集,作为判断文中哪些语句可作为文摘句的有效手段。这是文摘系统实现的核心部分。  相似文献   

14.
大规模语义角色标注语料库的构建可以为计算机理解自然语言的语义提供有用的训练数据。该文主要研究服务于语义角色标注语料库构建的语义角色标注规则。在人工语义角色标注的基础上,分析句式和句模的对应关系,并总结出一套基于句式的语义角色标注规则,在测试集上达到78.73%的正确率。基于上述规则,可以在构建语义角色标注语料库时完成自动标注的工作,标注人员在此基础上进行人工校对,可有效地减少工作量。  相似文献   

15.
随着智能家居的普及,用户期望通过自然语言指令实现智能设备的控制,并希望获得个性化的智能家居服务。然而,现有的挑战包括智能设备的互操作性和对用户环境的全面理解。针对上述问题,提出一个支持设备端用户智能家居服务推荐个性化的框架。首先,构建智能家居的运行时知识图谱,用于反映特定智能家居中的上下文信息,并生成用例场景语句;其次,利用预先收集的通用场景下,用户的自然语言指令和对应的用例场景语句训练出通用推荐模型;最后,用户在设备端以自然语言管理智能家居设备和服务,并通过反馈微调通用模型的权重得到个人模型。在基本指令集、复述集、场景指令集三个数据集上的实验表明,用户的个人模型相比于词嵌入方法的准确率提升了6.5%~30%,与Sentence-BERT模型相比准确率提升了2.4%~25%,验证了设备端基于深度学习的智能家居服务框架具有较高的服务推荐准确率,能够有效地管理智能家居设备和服务。  相似文献   

16.
刘广灿  曹宇  许家铭  徐波 《自动化学报》2019,45(8):1455-1463
目前自然语言推理(Natural language inference,NLI)模型存在严重依赖词信息进行推理的现象.虽然词相关的判别信息在推理中占有重要的地位,但是推理模型更应该去关注连续文本的内在含义和语言的表达,通过整体把握句子含义进行推理,而不是仅仅根据个别词之间的对立或相似关系进行浅层推理.另外,传统有监督学习方法使得模型过分依赖于训练集的语言先验,而缺乏对语言逻辑的理解.为了显式地强调句子序列编码学习的重要性,并降低语言偏置的影响,本文提出一种基于对抗正则化的自然语言推理方法.该方法首先引入一个基于词编码的推理模型,该模型以标准推理模型中的词编码作为输入,并且只有利用语言偏置才能推理成功;再通过两个模型间的对抗训练,避免标准推理模型过多依赖语言偏置.在SNLI和Breaking-NLI两个公开的标准数据集上进行实验,该方法在SNLI数据集已有的基于句子嵌入的推理模型中达到最佳性能,在测试集上取得了87.60%的准确率;并且在Breaking-NLI数据集上也取得了目前公开的最佳结果.  相似文献   

17.
This paper discusses a fundamental problem in natural language generation: how to organize the content of a text in a coherent and natural way. In this research, we set out to determine the semantic content and the rhetorical structure of texts and to develop heuristics to perform this process automatically within a text generation framework. The study was performed on a specific language and textual genre: French instructional texts. From a corpus analysis of these texts, we determined nine senses typically communicated in instructional texts and seven rhetorical relations used to present these senses. From this analysis, we then developed a set of presentation heuristics that determine how the senses to be communicated should be organized rhetorically in order to create a coherent and natural text. The heuristics are based on five types of constraints: conceptual, semantic, rhetorical, pragmatic, and intentional constraints. To verify the heuristics, we developed the spin natural language generation system, which performs all steps of text generation but focuses on the determination of the content and the rhetorical structure of the text.  相似文献   

18.
An implemented model of language processing has been developed that views the propositional components of a sentence as neural units. The propositional sentence units are linked through symbolic, reified representations of subordinate sentence parts. Large numbers of these highly standardized propositional units are encoded in a manner that interconnects propositional data through the declarative knowledge base structures, thus minimizing the importance of the procedural component and the need for backward chaining and inference generation. The introduction of new sentence information triggers a connectionist-like flurry of activity in which constantly changing propositional weights and reification strengths effect changes in the belief states encoded within the knowledge base. ©1999 John Wiley & Sons, Inc.  相似文献   

19.
There are numerous logical formalisms capable of drawing conclusions using default rules. Such systems, however, do not normally determine where the default rules come from; i.e., what it is that makes Birds fly a good rule, but Birds drive trucks a bad one.
Generic sentences such as Birds fly are often used informally to describe default rules. I propose to take this characterization seriously, and claim that a default rule is adequate if the corresponding generic sentence is true. Thus, if we know that Tweety is a bird, we may conclude by default that Tweety flies, just in case Birds fly is a true sentence.
In this paper, a quantificational account of the semantics of generic sentences is presented. It is argued that a generic sentence is evaluated not in isolation, but with respect to a set of relevant alternatives. For example, Mammals bear live young is true because among mammals that bear live young, lay eggs, undergo mitosis, or engage in some alternative form of procreation, the majority bear live young. Since male mammals do not procreate in any form, they do not count. Some properties of alternatives are presented, and their interactions with the phenomena of focus and presupposition is investigated.
It is shown how this account of generics can be used to characterize adequate default reasoning systems, and several desirable properties of such systems are proved. The problems of the automatic acquisition of rules from natural language are discussed. Because rules are often explicitly expressed as generics, it is argued that the interpretation of generic sentences plays a crucial role in this endeavor, and it is shown how the theory presented here can facilitate such interpretation.  相似文献   

20.
阅读理解是目前NLP领域的一个研究热点。阅读理解中好的复杂问题解答策略不仅要进行答案句的抽取,还要对答案句进行融合、生成相应的答案,但是目前的研究大多集中在前者。该文针对复杂问题解答中的句子融合进行研究,提出了一种兼顾句子重要信息、问题关联度与句子流畅度的句子融合方法。该方法的主要思想为: 首先,基于句子拆分和词重要度选择待融合部分;然后,基于词对齐进行句子相同信息的合并;最后,利用基于依存关系、二元语言模型及词重要度的整数线性规划优化生成句子。在历年高考阅读理解数据集上的测试结果表明,该方法取得了82.62%的F值,同时更好地保证了结果的可读性及信息量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号