首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 90 毫秒
1.
汉语句子谓语中心词的自动识别   总被引:7,自引:2,他引:7  
谓语中心词的识别是句法成分分析中的一个非常重要的部分。本文提出了一种规则和特征学习相结合的谓语识别方法,将整个谓语识别的过程分为语片捆绑、谓语粗筛选和谓语精筛选三个阶段。在谓语粗筛选中,利用规则过滤掉明显不能充当谓语的词,得到一个准谓语集;在精筛选阶段,选择谓语的支持特征,根据统计计算得到每个特征对谓语的支持度,然后利用准谓语在句子中的上下文出现的特征对准谓语集中的词进行再次筛选,从而确定出句子的谓语中心词。经过测试表明,该方法是有效可行的。  相似文献   

2.
面向EBMT的汉语单句谓语中心词识别研究   总被引:9,自引:3,他引:9  
在基于实例的汉英机器翻译( EBMT) 系统中,为计算语句相似度,需要对句子进行适当的分析。本文首先提出了一种折中的汉语句子分析方法———骨架依存分析法,通过确定谓语中心词来把握句子的整体结构,然后,提出了一种根据汉英例句集中英语例句的谓语中心词来识别相应的汉语例句的谓语中心词的策略。实验结果是令人满意的。  相似文献   

3.
兼语句式即兼语句,是文本知识中一种较为常见又比较特殊的句式,对兼语句进行知识获取方面的研究是文本知识获取的一个重要研究方向。对兼语语义类进行分类是进行兼语知识获取的基础,为了构建一种新的兼语分类体系:首先从句中第一个谓词的角度出发将兼语句式分为八个大类,并在语义分类和描述框架的基础上,对这八个大类进行进一步细分;然后从兼语中第二个谓词发生的时序角度出发进行归纳分类;最后,对于不能充当兼语句式中第一个谓词成分的语义类,从语义类的层级上分析总结了其原因和规律。该分类体系比已有的分类体系更全面更细致,它几乎涵盖了文本知识中所有的兼语句,并且在实现兼语语料的扩充上效果明显。  相似文献   

4.
识别谓语动词是理解句子的关键。由于中文谓语动词结构复杂、使用灵活、形式多变,识别谓语动词在中文自然语言处理中是一项具有挑战的任务。本文从信息抽取角度,介绍了与中文谓语动词识别相关的概念,提出了一种针对中文谓语动词标注方法。在此基础上,研究了一种基于Attentional-BiLSTM-CRF神经网络的中文谓语动词识别方法。该方法通过双向递归神经网络获取句子内部的依赖关系,然后用注意力机制建模句子的焦点角色。最后通过条件随机场(Conditional random field, CRF)层返回一条最大化的标注路径。此外,为解决谓语动词输出唯一性的问题,提出了一种基于卷积神经网络的谓语动词唯一性识别模型。通过实验,该算法超出传统的序列标注模型CRF,在本文标注的中文谓语动词数据上到达76.75%的F值。  相似文献   

5.
A previous publication introduced the logic Q for qualified syllogisms, which formalizes arguments such as “Most birds can fly; Tweety is a bird; therefore it is likely that Tweety can fly”. This provided a probabilistic semantics for fuzzy quantifiers (most, many, few, etc.), fuzzy usuality modifiers (usually, often, seldom, etc.), and fuzzy likelihood modifiers (likely, uncertain, unlikely, etc.). That work dealt only with crisp (nonfuzzy) predicates like Bird and CanFly, however. The present work proposes a simple and intuitively natural way to expand the semantics of Q so as to accommodate syllogisms such as “Most Swedes are tall; Helge is a Swede; therefore it is likely that Helge is tall”, where tall is a fuzzy predicate. To accomplish this, a formulation of the notion of the probability of a fuzzy event is proposed. It is shown that the new semantics validates the intended syllogisms as well as numerous other propositions and rules, including a rendition of the axioms and inference rules of classical first‐order predicate calculus. These results may be viewed as steps toward future applications. A future work will follow the lines of the previous work and show how Q thus extended may be employed for nonmonotonic reasoning with fuzzy predicates. a  相似文献   

6.
语义分析中谓词标识的特征工程   总被引:2,自引:0,他引:2       下载免费PDF全文
谓词是句子中的最重要的成分,它的正确与否对语义分析的影响非常大。而众多的特征直接影响到谓词标识的性能,如何组织这些特征显得尤为重要。选取了7个基本特征和30多个新特征以及它们的组合,使用最大熵分类器,在基本特征的基础上通过增加有利特征的方法,使得谓词标注的F1值增长了约5%(由84.7%增加到89.8%),词义识别的F1值增长了约2%(由80.3%增加到82.1%),结果表明,这些新特征及其组合大大提高了性能。  相似文献   

7.
谓词的自动识别是浅层句法分析的重要内容。以汉语的“谓词中枢论”为语言学基础,详细分析了汉语句子中谓词所处的上下文环境,讨论了影响谓词出现的主要语境因素。提出了一种基于统计学原理的汉语句子谓词自动识别概率模型,通过极大似然估计对谓词候选词在句中充当谓词的概率进行近似计算,利用绝对折扣模型对参数进行平滑。在小规模语料库上进行的实验显示,谓词识别率最高分别达到了80.6%(动词性谓词)和83.2%(形容词性谓词),表明了该方法的可行性和有效性。  相似文献   

8.
利用主语和谓语的句法关系识别谓语中心词   总被引:4,自引:0,他引:4  
谓语中心词识别对于整个句子的句法分析起着重要的作用。目前已有的谓语中心词识别方法,利用谓语中心词候选项的静态语法特征和动态语法特征来确定谓语中心词。在此基础上,本文提出一种利用句子的主语和谓语之间的句法关系来识别谓语中心词的方法。该方法除了利用谓语中心词候选项的静态语法特征和动态语法特征外,还利用主谓语之间的句法关系识别谓语中心词。实验表明,与传统方法相比,这种方法对谓语中心词的识别正确率可以提高3%左右。  相似文献   

9.
卢露  矫红岩  李梦  荀恩东 《自动化学报》2022,48(12):2911-2921
为快速构建一个大规模、多领域的高质树库,提出一种基于短语功能与句法角色组块的、便于标注多层次结构的标注体系,在篇章中综合利用标点、句法结构、表述功能作为句边界判断标准,确立合理的句边界与层次;在句子中以组块的句法功能为主,参考篇章功能、人际功能,以4个性质标记、8个功能标记、4个句标记来描写句中3类5种组块,标注基本句型骨架,突出中心词信息.目前已初步构建有质量保证的千万汉字规模的浅层结构分析树,包含60余万小句的9千余条句型结构库,语料涉及百科、新闻、专利等应用领域文本1万余篇;同时,也探索了高效的标注众包管理模式.  相似文献   

10.
传统语体学对于语体的描写多集中于词汇、句式、修辞等方面。近年来学者们开始在语法研究中更加重视语体因素,但是目前的研究多为微观的分析,没有宏观的理论体系支撑,难以探索到语体深层次的问题。广义话题理论根据汉语篇章的特点,以边界明确的标点句为基础,提出了广义话题和话题结构的概念。从广义话题的角度对比了工作报告语体和小说语体的差异,涉及到命名实体话题、状性话题、谓性话题、逻辑话题和关系话题等。并对这种差异的原因做了合理的解释。虽然工作报告与小说在语体上差异明显,但没有人从话题-说明的角度进行过比较,更从未有大规模语料库上的统计分析。该工作丰富了统计语体学的理论,并且为计算机自动分析话题结构、自动评判作文水平、文本按语体分类等应用打下了扎实的基础。  相似文献   

11.
从现代汉语语义学角度,可将句义类型划分为简单句义、复杂句义、复合句义和多重句义4种。作为在整体上对句义结构进行描述的方式之一,句义类型识别是对汉语句子进行完整句义结构分析的重要步骤。该文基于谓词及句义类型块提出了一种汉语句义类型识别的方法,实现了4种句义类型的识别。该方法先通过句中谓词的个数进行初步识别判断出部分简单句,再对剩余的句子先用C4.5机器学习的方法得到句中谓词经过的最大句义类型块的个数,再结合句法结构中顶端句子节点进行判决,最终给出剩余句子的句义类型判定结果。实验采用BFS-CTC汉语标注语料库中10221个句子进行开集测试,句义类型的整体识别准确率达到97.6%,为基于现代汉语语义学的研究奠定了一定的技术研究基础。  相似文献   

12.
Experimental work in software testing has generally focused on evaluating the effectiveness and effort requirements of various coverage criteria. The important issue of testing efficiency has not been sufficiently addressed. In this paper, we describe an approach for comparing the effectiveness and efficiency of test coverage criteria using mutation analysis. For each coverage criterion under study, we generate multiple coverage-adequate minimal test suites for a test-program from a test-pool, which are then executed on a set of systematically generated program mutants to obtain the fault data. We demonstrate the applicability of the proposed approach by describing the results of an experiment comparing the three code-based testing criteria, namely, block coverage, branch coverage, and predicate coverage. Our results suggest that there is a trade-off between effectiveness and efficiency of a coverage criterion. Specifically, the predicate coverage criterion was found to be most effective but least efficient whereas the block coverage criterion was most efficient but least effective. We observed high variability in the performance of block test suites whereas branch and predicate test suites were relatively consistent. Overall results suggest that the branch coverage criterion performs consistently with good efficiency and effectiveness, and it appears to be the most viable option for code-based control flow testing.  相似文献   

13.
This paper proposes a predicate nameddosim which provides a new function for parallel execution of logic programs. The parallelism achieved by this predicate is a simultaneous mapping operation such as bagof and setof predicates. However, the degree of parallelism can be easily decided by arranging the arguments of the dosim goal. The parallel processing system with dosim was realized on a tight-coupled multiprocessor machine. To control the degree of parallelism and reduce the amount of memory required for execution, we introduce the grouping method for the goals executed in parallel and some variations of the dosim predicate. The effectiveness of the proposed method is demonstrated by the results of the execution of several applications.  相似文献   

14.
句子成分分析是自然语言处理研究中的重点和难点。首先陈述现代维吾尔语短语和句子之间的关系、语类间的相互关系等现代维吾尔语单句成分划分问题;其次讨论现代维吾尔语语料库的预处理、短语标记集、句子成分划分基本思路、句子成分分析算法;探索现代维吾尔语谓语的识别算法设计、其他句子成分的识别、自动界定预测算法等研究现代维吾尔语句子成分分析问题;解决现代维吾尔语句子成分分析系统实现、实验数据分析等现代维吾尔语句子成分分析问题。  相似文献   

15.
Every pregroup grammar is shown to be strongly equivalent to one which uses basic types and left and right adjoints of basic types only. Therefore, a semantical interpretation is independent of the order of the associated logic. Lexical entries are read as expressions in a two sorted predicate logic with ∈ and functional symbols. The parsing of a sentence defines a substitution that combines the expressions associated to the individual words. The resulting variable free formula is the translation of the sentence. It can be computed in time proportional to the parsing structure. Non-logical axioms are associated to certain words (relative pronouns, indefinite article, comparative determiners). Sample sentences are used to derive the characterizing formula of the DRS corresponding to the translation.  相似文献   

16.
江荻 《中文信息学报》2007,21(4):111-115
本文讨论藏语述说动词管控的句子性小句宾语。藏语述说动词包括“说”类动词、认知动词、思考动词、询问动词及其他语义相关的动词。从小句自身结构看,可以是完整的句子,带主语、谓语以及句末动词体貌标记和语气词,也可能只是单一的谓语动词。小句宾语自身具有谓词性,通常通过添加名词化标记使之名词化。小句宾语的标记来自古代述说类动词的类典型zer 的语法化,而在现代藏语中作为小句标记语音和书写形式上都有多个变体。小句宾语内部也有复杂的关系和层次,类似于英语的直接引语与间接引语。小句缺省主语的情况下,动作发出者可通过表示体貌、情态的语法词以及上下文来确定。小句的句类包括陈述、疑问、祈使和感叹,可带不同的句类语气词。最后应该指出,有一部分述说动词小句宾语经常不带名词化标记,这种现象会给句法处理算法带来一定的麻烦,相关原因和解决办法还须进一步研究。  相似文献   

17.
Reasoning about program heap, especially if it involves handling unbounded, dynamically heap-allocated data structures such as linked lists and arrays, is challenging. Furthermore, sound analysis that precisely models heap becomes significantly more challenging in the presence of low-level pointer manipulation that is prevalent in systems software. The reachability predicate has already proved to be useful for reasoning about the heap in type-safe languages where memory is manipulated by dereferencing object fields. In this paper, we present a memory model suitable for reasoning about low-level pointer operations that is accompanied by a formalization of the reachability predicate in the presence of internal pointers and pointer arithmetic. We have designed an annotation language for C programs that makes use of the new predicate. This language enables us to specify properties of many interesting data structures present in the Windows kernel. We present our experience with a prototype verifier on a set of illustrative C benchmarks.  相似文献   

18.
话题的延续和转换是篇章中重要的语用功能。该文从句首话题共享的角度对话题延续和转换进行了分类,分为句首话题延续、句中子话题延续、完全话题转换、兼语话题转换、新支话题转换五种,进而对话题转换的特殊情况——新支话题展开研究。基于33万字的广义话题结构语料库,该文对新支话题的句法成分、语义角色进行了统计和分析。通过句法成分分析发现,宾语从句或补语从句主语、主谓谓语句小主语、状性成分起始句主语、句末宾语、连谓句非句末宾语、兼语句兼语、介词宾语甚至状语等都能成为新支话题,从而引出新支句,其中,句末宾语作为新支话题的情况最多,但未发现间接宾语作为新支话题的情况;语义角色分析发现,大部分主体论元(施事、感事、经事、主事)和客体论元(受事、系事、结果、对象、与事)及少数凭借论元(方式)和环境论元(处所、终点)能成为新支话题引出新支句。同时,系事和受事成为新支话题的情况最显著;施事、结果和对象次之;原因和目的等论元难以成为新支话题。该文的研究揭示了句法、语义对话题转换这一语用现象的一种可能的约束途径,有助于人和计算机更深入地理解汉语篇章的话题转换机制,以期将这种语用现象逐步落实到语义直至句法的形式中,最终实现计算机对话题转换的自动分析。  相似文献   

19.
This paper is concerned with an algorithm for identifying an unknown regular language from examples of its members and non-members. The algorithm is based on the model inference algorithm given by Shapiro. In our setting, however, a given first order language for describing a target logic program has countably many unary predicate symbols: q 0,q 1,q 2…. On the other hand, the oracle which gives information about the unknown regular language to the inference algorithm has no interpretation for predicates other than the predicate q 0. In such a setting,we cannot directly take advantage of the contradiction backtracing algorithm which is one of the most important parts for the efficiency of the model inference algorithm. In order to overcome this disadvantage, we develop a method for giving an interpretation for predicates other than the predicate q 0 indirectly, which is based on the idea of using the oracle and a one to one mapping from a set of predicates to a set of strings. Furthermore, we propose a model inference algorithm for regular languages using the method, then argue the correctness and the time complexity of the algorithm  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号