首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We present HamleDT—a HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. In the present article, we provide a thorough investigation and discussion of a number of phenomena that are comparable across languages, though their annotation in treebanks often differs. We claim that transformation procedures can be designed to automatically identify most such phenomena and convert them to a unified annotation style. This unification is beneficial both to comparative corpus linguistics and to machine learning of syntactic parsing.  相似文献   

2.
Dependency grammar is considered appropriate for many Indian languages. In this paper, we present a study of the dependency relations in Bangla language. We have categorized these relations in three different levels, namely intrachunk relations, interchunk relations and interclause relations. Each of these levels is further categorized and an annotation scheme has been developed. Both syntactic and semantic features have been taken into consideration for describing the relations. In our scheme, there are 63 such syntactico–semantic relations. We have verified the scheme by tagging a corpus of 4167 Bangla sentences to create a treebank (KGPBenTreebank).  相似文献   

3.
We describe annotation of multiword expressions (MWEs) in the Prague dependency treebank, using several automatic pre-annotation steps. We use subtrees of the tectogrammatical tree structures of the Prague dependency treebank to store representations of the MWEs in the dictionary and pre-annotate following occurrences automatically. We also show a way to measure reliability of this type of annotation.  相似文献   

4.
In the field of constituency parsing, there exist multiple human-labeled treebanks which are built on non-overlapping text samples and follow different annotation standards. Due to the extreme cost of annotating parse trees by human, it is desirable to automatically convert one treebank (called source treebank) to the standard of another treebank (called target treebank) which we are interested in. Conversion results can be manually corrected to obtain higher-quality annotations or can be directly used as additional training data for building syntactic parsers. To perform automatic treebank conversion, we divide constituency parses into two separate levels: the part-of-speech (POS) and syntactic structure (bracketing structures and constituent labels), and conduct conversion on these two levels respectively with a feature-based approach. The basic idea of the approach is to encode original annotations in a source treebank as guide features during the conversion process. Experiments on two Chinese treebanks show that our approach can convert POS tags and syntactic structures with the accuracy of 96.6 and 84.8 %, respectively, which are the best reported results on this task.  相似文献   

5.
针对中文组合范畴语法(CCG)分析困难的特点,研究如何将两种彼此相互独立的技术共同应用在中文CCG句法分析上。首先使用预标注算法,使用对数线性模型通过去除那些概率较低的词汇范畴来对句子的潜在分析空间进行剪枝。然后应用启发式搜索算法进一步加速分析过程。最后从时间效率和分析精度两个维度对所使用的方法进行验证。实验表明,基于启发式搜索与预标注的句法分析算法可以显著地提高分析效率与分析精度。  相似文献   

6.
7.
In this paper, we have dealt on the problem of part-of-speech tagging of multi-category words which appear within the sentences of Hindi language. Firstly, a Hindi tagger is proposed which provides part-of-speech tags developed using grammar of Hindi language. For this purpose, Hindi Devanagari alphabets are used and their Hindi transliteration is done within the proposed tagger. Thereafter, a Rules’ based TENGRAM method is described with an illustrative example, which guides to disambiguate multi-category words within sentences of Hindi corpus. The rules generated in TENGRAM are the result of computation of discernibility matrices, discernibility functions and reducts. These computations have been generated from decision tables which are based on theory of Rough sets. Basically, a discernibility matrix helps in cutting down indiscernible condition attributes; a discernibility function has rows corresponding to each column in the discernibility matrix which develops reducts; and the reducts provide a minimal subset of attributes which preserve indiscernibility relation of decision tables and hence they generate the decision rules.  相似文献   

8.
We describe experimental work in logic programming for architects, leading to the setting up of a fact dependency system. The system operates as an interpreter of the user's instructions, storing his decision and the conclusions inferred from those decisions. Consistency from a user's point of view is automatically maintained. A separate introduction to the Prolog logic programming language is appended to this paper.  相似文献   

9.
Violations of functional dependencies (FDs) and conditional functional dependencies (CFDs) are common in practice, often indicating deviations from the intended data semantics. These violations arise in many contexts such as data integration and Web data extraction. Resolving these violations is challenging for a variety of reasons, one of them being the exponential number of possible repairs. Most of the previous work has tackled this problem by producing a single repair that is nearly optimal with respect to some metric. In this paper, we propose a novel data cleaning approach that is not limited to finding a single repair, namely sampling from the space of possible repairs. We give several motivating scenarios where sampling from the space of CFD repairs is desirable, we propose a new class of useful repairs, and we present an algorithm that randomly samples from this space in an efficient way. We also show how to restrict the space of repairs based on constraints that reflect the accuracy of different parts of the database. We experimentally evaluate our algorithms against previous approaches to show the utility and efficiency of our approach.  相似文献   

10.
A simple and elegant set-theoretic characterization is given as to when a given set of functional and multivalued dependencies logically implies a given functional or multivalued dependency. A simple proof of the characterization is given which makes use of a result of Sagiv, Delobel, Parker, and Fagin (1981).  相似文献   

11.
12.
The more knowledge industrial practitioners detain of their production processes, the more they are capable of performing process improvements. Nonetheless, there may exist process characteristics and dependencies that are not easily extractable from business models, such as routing dependent attributes. This paper introduces an algorithm-driven framework to establish whether process path decisions influence the attributes in non-direct sequences, e.g., deploying machine A instead of machine B affects the % of rejected parts on the process, 4 stages down the line. This problem is shown to bears similarities with sequential pattern mining problems. The basis of the solution framework relies on process mining and data mining techniques. The approach proposed is applied on a real industrial log, unveiling deficiencies in the system and providing further improvement recommendations.  相似文献   

13.
ContextDependency management often suffers from labor intensity and complexity in creating and maintaining the dependency relations in practice. This is even more critical in a distributed development, in which developers are geographically distributed and a wide variety of tools is used. In those settings, different interpretations of software requirements or usage of different terminologies make it challenging to predict the change impact.Objectiveis (a) to describe a method facilitating change management in geographically distributed software engineering by effective discovery and establishment of dependency links using domain models; (b) to evaluate the effectiveness of the proposed method.MethodA domain model, providing a common reference point, is used to manage development objects and to automatically support dependency discovery. We propose to associate (annotate) development objects with the concepts from the model. These associations are used to compute dependency among development objects, and are stepwise refined to direct dependency links (i.e. enabling product traceability). To evaluate the method, we conducted a laboratory-based randomized experiment on two real cases. Six participants were using an implemented prototype and two comparable tools to perform simulated tasks.ResultsIn the paper we elaborate on the proposed method discussing its functional steps. Results from the experiment show that the method can be effectively used to assist in discovery of dependency links. Users have discovered on average fourteen percent more dependency links than by using the comparable tools.ConclusionsThe proposed method advocates the use of domain models throughout the whole development life-cycle and is apt to facilitate multi-site software engineering. The experimental study and results suggest that the method is effective in the discovery of dependencies among development objects.  相似文献   

14.
A new dependency and correlation analysis for features   总被引:3,自引:0,他引:3  
The quality of the data being analyzed is a critical factor that affects the accuracy of data mining algorithms. There are two important aspects of the data quality, one is relevance and the other is data redundancy. The inclusion of irrelevant and redundant features in the data mining model results in poor predictions and high computational overhead. This paper presents an efficient method concerning both the relevance of the features and the pairwise features correlation in order to improve the prediction and accuracy of our data mining algorithm. We introduce a new feature correlation metric Q/sub Y/(X/sub i/,X/sub j/) and feature subset merit measure e(S) to quantify the relevance and the correlation among features with respect to a desired data mining task (e.g., detection of an abnormal behavior in a network service due to network attacks). Our approach takes into consideration not only the dependency among the features, but also their dependency with respect to a given data mining task. Our analysis shows that the correlation relationship among features depends on the decision task and, thus, they display different behaviors as we change the decision task. We applied our data mining approach to network security and validated it using the DARPA KDD99 benchmark data set. Our results show that, using the new decision dependent correlation metric, we can efficiently detect rare network attacks such as User to Root (U2R) and Remote to Local (R2L) attacks. The best reported detection rates for U2R and R2L on the KDD99 data sets were 13.2 percent and 8.4 percent with 0.5 percent false alarm, respectively. For U2R attacks, our approach can achieve a 92.5 percent detection rate with a false alarm of 0.7587 percent. For R2L attacks, our approach can achieve a 92.47 percent detection rate with a false alarm of 8.35 percent.  相似文献   

15.
Neural Computing and Applications - Children learn and develop their abilities at their own pace. One of the most basic skills that they acquire is reading. However, some children struggle with...  相似文献   

16.
17.
Developing automatable methods for proving termination of term rewrite systems that resist traditional techniques based on simplification orders has become an active research area in the past few years. The dependency pair method of Arts and Giesl is one of the most popular such methods. However, there are several obstacles that hamper its automation. In this paper we present new ideas to overcome these obstacles. We provide ample numerical data supporting our ideas.  相似文献   

18.
19.
Agile software development is designed to achieve collaborative software development. A supporting pillar of collaboration is effective coordination, which is necessary to manage dependencies in projects. Understanding the dependencies arising in agile software development projects can help practitioners choose appropriate coordinative practices from the large number of practices provided by the various agile methods. To achieve this understanding, this article analyses dependencies in three typical cases of co-located agile software development and presents the dependencies as a taxonomy with decision rules for allocating dependencies into categories. Findings show that knowledge, process, and resource dependencies are present, with knowledge dependencies predominant. In addition, there are agile practices with a coordinative function that address multiple dependencies in these agile software development projects. These practices would be a good choice for coordinating a project and supporting collaboration in agile software projects.  相似文献   

20.
Standardization in the field of geographic information started in the 1990s when geographic information systems (GIS) matured and the advent of the Internet accelerated the exchange of information. Recent developments, such as location-based services and the use of the Global Positioning System (GPS) on handheld devices, have further increased the demand for standardization in the field. ISO/TC 211, Geographic information/Geomatics develops the ISO 19100 series of geographic information standards and collaborates with other standards organizations, for example, by developing abstract standards for which the Open Geospatial Consortium (OGC) develops implementation specifications. To date, forty nine ISO 19100 standards have been published. Many of these were recently approved for revision. One of the challenges of standard maintenance is to determine whether a change in a revised standard affects other standards, and how. Object dependency analysis is commonly used in object-oriented software maintenance. Geographic information standards, however, are not composed purely of objects. Instead, dependencies in all the normative elements of the standard have to be analyzed and understood. This paper presents the novel approach of a normative dependency analysis for standard maintenance in which interdependencies between the normative elements of standards are analyzed. In the paper a normative dependency between two standards is defined for the first time, a notation for normative dependencies is introduced, a normative dependency data model is presented and results from a normative dependency analysis of the ISO 19100 geographic information standards are discussed. The paper concludes with results, applicable to any suite of standards, and a discussion of further work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号