首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Applications of linguistic techniques for use case analysis   总被引:2,自引:2,他引:0  
Use cases are effective techniques to express the functional requirements of a system in a very simple and easy-to-learn way. Use cases are mainly composed of natural language (NL) sentences, and the use of NL to describe the behaviour of a system is always a critical point, due to the inherent ambiguities originating from the different possible interpretations of NL sentences. We discuss in this paper the application of analysis techniques based on a linguistic approach to detect, within requirements documents, defects related to such an inherent ambiguity. Starting from the proposed analysis techniques, we will define some metrics that will be used to perform a quality evaluation of requirements documents. Some available automatic tools supporting the linguistic analysis of NL requirements have been used to evaluate an industrial use cases document according to the defined metrics. A discussion on the application of linguistic analysis techniques to support the semantic analysis of use cases is also reported.  相似文献   

2.
ContextRequirements engineering is one of the most important and critical phases in the software development life cycle, and should be carefully performed to build high quality and reliable software. However, requirements are typically gathered through various sources and are represented in natural language (NL), making requirements engineering a difficult, fault prone, and a challenging task.ObjectiveTo ensure high-quality software, we need effective requirements verification methods that can clearly handle and address inherently ambiguous nature of NL specifications. The objective of this paper is to propose a method that can address the challenges with NL requirements verification and to evaluate our proposed method through controlled experiments.MethodWe propose a model-based requirements verification method, called NLtoSTD, which transforms NL requirements into a State Transition Diagram (STD) that can help to detect and to eliminate ambiguities and incompleteness. The paper describes the NLtoSTD method to detect requirement faults, thereby improving the quality of the requirements. To evaluate the NLtoSTD method, we conducted two controlled experiments at North Dakota State University in which the participants employed the NLtoSTD method and a traditional fault checklist during the inspection of requirement documents to identify the ambiguities and incompleteness of the requirements.ResultsTwo experiment results show that the NLtoSTD method can be more effective in exposing the missing functionality and, in some cases, more ambiguous information than the fault-checklist method. Our experiments also revealed areas of improvement that benefit the method’s applicability in the future.ConclusionWe presented a new approach, NLtoSTD, to verify requirements documents and two controlled experiments assessing our approach. The results are promising and have motivated the refinement of the NLtoSTD method and future empirical evaluation.  相似文献   

3.
Engineering activities often produce considerable documentation as a by-product of the development process. Due to their complexity, technical analysts can benefit from text processing techniques able to identify concepts of interest and analyze deficiencies of the documents in an automated fashion. In practice, text sentences from the documentation are usually transformed to a vector space model, which is suitable for traditional machine learning classifiers. However, such transformations suffer from problems of synonyms and ambiguity that cause classification mistakes. For alleviating these problems, there has been a growing interest in the semantic enrichment of text. Unfortunately, using general-purpose thesaurus and encyclopedias to enrich technical documents belonging to a given domain (e.g. requirements engineering) often introduces noise and does not improve classification. In this work, we aim at boosting text classification by exploiting information about semantic roles. We have explored this approach when building a multi-label classifier for identifying special concepts, called domain actions, in textual software requirements. After evaluating various combinations of semantic roles and text classification algorithms, we found that this kind of semantically-enriched data leads to improvements of up to 18% in both precision and recall, when compared to non-enriched data. Our enrichment strategy based on semantic roles also allowed classifiers to reach acceptable accuracy levels with small training sets. Moreover, semantic roles outperformed Wikipedia- and WordNET-based enrichments, which failed to boost requirements classification with several techniques. These results drove the development of two requirements tools, which we successfully applied in the processing of textual use cases.  相似文献   

4.
Named entity recognition (NER) denotes the task to detect entities and their corresponding classes, such as person or location, in unstructured text data. For most applications, state of the art NER software is producing reasonable results. However, as a consequence of the methodological limitations and the well‐known pitfalls when analyzing natural language data, the NER results are likely to contain ambiguities. In this paper, we present an interactive NER ambiguity resolution technique, which enables users to create (post‐processing) rules for named entity recognition data based on the content and entity context of the analyzed documents. We specifically address the problem that in use‐cases where ambiguities are problematic, such as the attribution of fictional characters with traits, it is often unfeasible to train models on custom data to improve state of the art NER software. We derive an iterative process model for improving NER results, show an interactive NER ambiguity resolution prototype, illustrate our approach with contemporary literature, and discuss our work and future research.  相似文献   

5.
The development of product design specifications (PDS) is an important part of the product development process. Incompleteness, ambiguity, or inconsistency in the PDS can lead to problems during the design process and may require unnecessary design iterations. This generally results in increased design time and cost. Currently, in many organizations, PDS are written using word processors. Since documents written by different authors can be inconsistent in style and word choice, it is difficult to automatically search for specific requirements. Moreover, this approach does not allow the possibility of automated design verification and validation against the design requirements and specifications.In this paper, we present a computational framework and a software tool based on this framework for writing, annotating, and searching computer-interpretable PDS. Our approach allows authors to write requirement statements in natural language to be consistent with the existing authoring practice. However, using mathematical expressions, keywords from predefined taxonomies, and other metadata the author of PDS can then annotate different parts of the requirement statements. This approach provides unambiguous meaning to the information contained in PDS, and helps to eliminate mistakes later in the process when designers must interpret requirements. Our approach also enables users to construct a new PDS document from the results of the search for requirements of similar devices and in similar contexts. This capability speeds up the process of creating PDS and helps authors write more detailed documents by utilizing previous, well written PDS documents. Our approach also enables checking for internal inconsistencies in the requirement statements.  相似文献   

6.
We demonstrate a text-mining method, called associative Naïve Bayes (ANB) classifier, for automated linking of MEDLINE documents to gene ontology (GO). The approach of this paper is a nontrivial extension of document classification methodology from a fixed set of classes C={c1,c2,…,cn} to a knowledge hierarchy like GO. Due to the complexity of GO, we use a knowledge representation structure. With that structure, we develop the text mining classifier, called ANB classifier, which automatically links Medline documents to GO. To check the performance, we compare our datasets under several well-known classifiers: NB classifier, large Bayes classifier, support vector machine and ANB classifier. Our results, described in the following, indicate its practical usefulness.  相似文献   

7.
《Information Systems》2005,30(7):543-563
One of the main problems in the (web) information retrieval is the ambiguity of users’ queries, since they tend to post very short queries which do not express their information need clearly. This seems to be valid for the ontology-based information retrieval in which the domain ontology is used as the backbone of the searching process. In this paper, we present a novel approach for determining possible refinements of an ontology-based query. The approach is based on measuring the ambiguity of a query with respect to the original user's information need. We defined several types of the ambiguities concerning the structure of the underlying ontology and the content of the information repository. These ambiguities are interpreted regarding the user's information need, which we infer from the user's behaviour in searching process. Finally, the ranked list of the potentially useful refinements of her query is provided to the user. We present a small evaluation study that shows the advantages of the proposed approach.  相似文献   

8.
Modern statistical techniques used in the field of natural language processing are limited in their applications by the fact they suffer from the loss of most of the semantic information contained in text documents. Fuzzy techniques have been proposed as a way to correct this problem through the modelling of the relationships between words while accommodating the ambiguities of natural languages. However, these techniques are currently either restricted to modelling the effects of simple words or are specialized in a single domain. In this paper, we propose a novel statistical-fuzzy methodology to represent the actions described in a variety of text documents by modelling the relationships between subject-verb-object triplets. The research will focus in the first place on the technique used to accurately extract the triplets from the text, on the necessary equations to compute the statistics of the subject-verb and verb-object pairs, and on the formulas needed to interpolate the fuzzy membership functions from these statistics and on those needed to de fuzzify the membership value of unseen triplets. Taken together, these sets of equations constitute a comprehensive system that allows the quantification and evaluation of the meaning of text documents, while being general enough to be applied to any domain. In the second phase, this paper will proceed to experimentally demonstrate the validity of our new methodology by applying it to the implementation of a fuzzy classifier conceived especially for this research. This classifier is trained using a section of the Brown Corpus, and its efficiency is tested with a corpus of 20 unseen documents drawn from three different domains. The positive results obtained from these experimental tests confirm the soundness of our new approach and show that it is a promising avenue of research.  相似文献   

9.
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.  相似文献   

10.
Interviews are the most common and effective means to perform requirements elicitation and support knowledge transfer between a customer and a requirements analyst. Ambiguity in communication is often perceived as a major obstacle for knowledge transfer, which could lead to unclear and incomplete requirements documents. In this paper, we analyze the role of ambiguity in requirements elicitation interviews, when requirements are still tacit ideas to be surfaced. To study the phenomenon, we performed a set of 34 customer–analyst interviews. This experience was used as a baseline to define a framework to categorize ambiguity. The framework presents the notion of ambiguity as a class of four main sub-phenomena, namely unclarity, multiple understanding, incorrect disambiguation and correct disambiguation. We present examples of ambiguities from our interviews to illustrate the different categories, and we highlight the pragmatic components that determine the occurrence of ambiguity. Along the study, we discovered a peculiar relation between ambiguity and tacit knowledge in interviews. Tacit knowledge is the knowledge that a customer has but does not pass to the analyst for any reason. From our experience, we have discovered that, rather than an obstacle, the occurrence of an ambiguity is often a resource for discovering tacit knowledge. Again, examples are presented from our interviews to support this vision.  相似文献   

11.
Term frequency–inverse document frequency (TF–IDF), one of the most popular feature (also called term or word) weighting methods used to describe documents in the vector space model and the applications related to text mining and information retrieval, can effectively reflect the importance of the term in the collection of documents, in which all documents play the same roles. But, TF–IDF does not take into account the difference of term IDF weighting if the documents play different roles in the collection of documents, such as positive and negative training set in text classification. In view of the aforementioned text, this paper presents a novel TF–IDF‐improved feature weighting approach, which reflects the importance of the term in the positive and the negative training examples, respectively. We also build a weighted voting classifier by iteratively applying the support vector machine algorithm and implement one‐class support vector machine and Positive Example Based Learning methods used for comparison. During classifying, an improved 1‐DNF algorithm, called 1‐DNFC, is also adopted, aiming at identifying more reliable negative documents from the unlabeled examples set. The experimental results show that the performance of term frequency inverse positive–negative document frequency‐based classifier outperforms that of TF–IDF‐based one, and the performance of weighted voting classifier also exceeds that of one‐class support vector machine‐based classifier and Positive Example Based Learning‐based classifier. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

12.
Gupta  P. McKeown  N. 《Micro, IEEE》2000,20(1):34-41
Increasing demands on Internet router performance and functionality create a need for algorithms that classify packets quickly with minimal storage requirements and allow frequent updates. Unlike previous algorithms, the algorithm proposed here meets this need well by using heuristics that exploit structure present in classifiers. Our approach, which we call HiCuts (hierarchical intelligent cuttings), attempts to partition the search space in each dimension, guided by simple heuristics that exploit the classifier's structure. We discover this structure by preprocessing the classifier. We can tune the algorithm's parameters to trade off query time against storage requirements. In classifying packets based on four header fields, HiCuts performs quickly and requires relatively little storage compared with previously described algorithms  相似文献   

13.
In this article we propose a data treatment strategy to generate new discriminative features, called compound-features (or c-features), for the sake of text classification. These c-features are composed by terms that co-occur in documents without any restrictions on order or distance between terms within a document. This strategy precedes the classification task, in order to enhance documents with discriminative c-features. The idea is that, when c-features are used in conjunction with single-features, the ambiguity and noise inherent to their bag-of-words representation are reduced. We use c-features composed of two terms in order to make their usage computationally feasible while improving the classifier effectiveness. We test this approach with several classification algorithms and single-label multi-class text collections. Experimental results demonstrated gains in almost all evaluated scenarios, from the simplest algorithms such as kNN (13% gain in micro-average F1 in the 20 Newsgroups collection) to the most complex one, the state-of-the-art SVM (10% gain in macro-average F1 in the collection OHSUMED).  相似文献   

14.
There are three factors involved in text classification. These are classification model, similarity measure and document representation model. In this paper, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classifier. In our experiments, we have used the centroid-based text classifier, which is a simple and robust text classification scheme. We will compare four different types of document representations: N-grams, Single terms, phrases and RDR which is a logic-based document representation. The N-gram representation is a string-based representation with no linguistic processing. The Single term approach is based on words with minimum linguistic processing. The phrase approach is based on linguistically formed phrases and single words. The RDR is based on linguistic processing and representing documents as a set of logical predicates. We have experimented with many text collections and we have obtained similar results. Here, we base our arguments on experiments conducted on Reuters-21578. We show that RDR, the more complex representation, produces more effective classifier on Reuters-21578, followed by the phrase approach.  相似文献   

15.
Public information services and documents should be accessible to the widest possible readership. In particular, information from these sources often takes the form of numerical expressions, which pose comprehension problems for many people, including people with disabilities, who are often also exposed to poverty, illiteracy, or lack of access to advanced technology. This paper presents an approach to treat numerical information in the text simplification process to make it more accessible. A generic model for automatic text simplification systems is presented, aimed at making documents more accessible to readers with cognitive disabilities. The proposed approach is validated with a real system to simplify numerical expressions in Spanish. This system is then evaluated and the results show that it is appropriate for the task at hand.  相似文献   

16.
This paper proposes a two-step approach to identifying ambiguities in natural language (NL) requirements specifications (RSs). In the first step, a tool would apply a set of ambiguity measures to a RS in order to identify potentially ambiguous sentences in the RS. In the second step, another tool would show what specifically is potentially ambiguous about each potentially ambiguous sentence. The final decision of ambiguity remains with the human users of the tools. The paper describes several requirements-identification experiments with several small NL RSs using four prototypes of the first tool based on linguistic instruments and resources of different complexity and a manual mock-up of the second tool.
Daniel M. Berry (Corresponding author)Email:
  相似文献   

17.
Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic search by browsing topics of interests. The consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification. Questions concern the representation of documents, the learning process, the classification process and the evaluation criteria of experimental results. They are systematically investigated in this paper, whose main contribution is a general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document. An automated threshold determination method for classification scores is embedded in the proposed framework. It can be applied to any classifier that returns a degree of membership of a document to a category. In this work three learning methods are considered for the construction of document classifiers, namely centroid-based, naïve Bayes and SVM. The proposed framework has been implemented in the system WebClassIII and has been tested on three datasets (Yahoo, DMOZ, RCV1) which present a variety of situations in terms of hierarchical structure. Experimental results are reported and several conclusions are drawn on the comparison of the flat vs. the hierarchical approach as well as on the comparison of different hierarchical classifiers. The paper concludes with a review of related work and a discussion of previous findings vs. our findings.  相似文献   

18.
Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.  相似文献   

19.
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.  相似文献   

20.
Natural Language (NL) deliverables suffer from ambiguity, poor understandability, incompleteness, and inconsistency. Howewer, NL is straightforward and stakeholders are familiar with it to produce their software requirements documents. This paper presents a methodology, SOLIMVA, which aims at model-based test case generation considering NL requirements deliverables. The methodology is supported by a tool that makes it possible to automatically translate NL requirements into Statechart models. Once the Statecharts are derived, another tool, GTSC, is used to generate the test cases. SOLIMVA uses combinatorial designs to identify scenarios for system and acceptance testing, and it requires that a test designer defines the application domain by means of a dictionary. Within the dictionary there is a Semantic Translation Model in which, among other features, a word sense disambiguation method helps in the translation process. Using as a case study a space application software product, we compared SOLIMVA with a previous manual approach developed by an expert under two aspects: test objectives coverage and characteristics of the Executable Test Cases. In the first aspect, the SOLIMVA methodology not only covered the test objectives associated to the expert’s scenarios but also proposed a better strategy with test objectives clearly separated according to the directives of combinatorial designs. The Executable Test Cases derived in accordance with the SOLIMVA methodology not only possessed similar characteristics with the expert’s Executable Test Cases but also predicted behaviors that did not exist in the expert’s strategy. The key benefits from applying the SOLIMVA methodology/tool within a Verification and Validation process are the ease of use and, at the same time, the support of a formal method consequently leading to a potential acceptance of the methodology in complex software projects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号