首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Learning non-taxonomic relationships is a sub-field of Ontology Learning that aims at automating the extraction of these relationships from text. Several techniques have been proposed based on Natural Language Processing and Machine Learning. However just like for other techniques for Ontology Learning, evaluating techniques for learning non-taxonomic relationships is an open problem. Three general proposals suggest that the learned ontologies can be evaluated in an executable application or by domain experts or even by a comparison with a predefined reference ontology. This article proposes two procedures to evaluate techniques for learning non-taxonomic relationships based on the comparison of the relationships obtained with those of a reference ontology. Also, these procedures are used in the evaluation of two state of the art techniques performing the extraction of relationships from two corpora in the domains of biology and Family Law.  相似文献   

3.
Multimedia Tools and Applications - With the development of GPS, Internet, and mobile devices, the map searching services become an essential application in people’s lives. However, existing...  相似文献   

4.
Learning from past accidents is pivotal for improving safety in construction. However, hazard records are typically documented and stored as unstructured or semi-structured free-text rendering the ability to analyse such data a difficult task. The research presented in this study presents a novel and robust framework that combines deep learning and text mining technologies that provide the ability to analyse hazard records automatically. The framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation algorithm (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project. It is envisaged that the use of the framework can provide managers with new insights and knowledge to better ensure positive safety outcomes in projects. The contributions of this research are threefold: (1) it is demonstrated that the process of analysing hazard records can be automated by combining deep learning and text learning; (2) hazards are able to be visualized using a systematic and data-driven process; and (3) the automatic generation of hazard topics and their classification over specific time periods enabling managers to understand their patterns of manifestation and therefore put in place strategies to prevent them from reoccurring.  相似文献   

5.
We present the novel framework of knowledge construction (ICC: Independent Co-occurring based Construction) based on co-occurrence relations of objects. We compare its characteristics with that of general approach (DCC: Dependent Co-occurring based Construction) in various construction aspects: variations of trained probability values, percentage differences (probability value and priority ranking order), and reconstruction time. The similarity of their data content and faster reconstruction time of ICC suggest that ICC is more suitable for applications of service robot. Instead of using visual feature, we employed annotated data, such as word-tagging images, as the training set to increase the accuracy of correspondence between related keywords and images. The task of object search in unknown environment is selected to evaluate the applicability of using constructed knowledge (OCR: Object Co-occurrence Relations). We explore the search behaviors, provided by OCR-based search (indirect search) and greedy search (direct search), in simulation experiments with five different starting robot positions. Their search behaviors are also compared from the aspects of consumed computational time, travel distance, and number of visited locations. The certainty of success of OCR-based search assures us of its benefit. Moreover, the object search experiment in unknown human environment is conducted by a mobile robot, equipped with a stereo camera, to show the possibility of using OCR in the search in real world.  相似文献   

6.
Today’s major search engines return ranked search results that match the keywords the user specifies. There have been many proposals to rank the search results such that they match the user’s intentions and needs more closely. Despite good advances during the past decade, this problem still requires considerable research, as the number of search results has become ever larger. We define the collection of each search result and all the Web pages that are linked to the result as a search-result drilldown. We hypothesize that by mining and analyzing the top terms in the search-result drilldown of search results, it may be possible to make each search result more meaningful to the user, so that the user may select the desired search results with higher confidence. In this paper, we describe this technique, and show the results of preliminary validation work that we have done.  相似文献   

7.
Seo  Jiwan  Yoo  Karam  Choi  Seungjin  Kim  Yura Alex  Han  Sangyong 《Multimedia Tools and Applications》2019,78(20):28649-28663
Multimedia Tools and Applications - Unstructured text data is very important in many applications because it reflects the thought of the people who create this data. However, it is difficult to...  相似文献   

8.
Shneiderman  B. 《Software, IEEE》1997,14(2):18-20
Searching textual databases can be confusing for users. Popular search systems for the World Wide Web and stand alone systems typically provide a simple interface: users type in keywords and receive a relevance ranked list of 10 results. This is appealing in its simplicity, but users are often frustrated because search results are confusing or aspects of the search are out of their control. If we are to improve user performance, reduce mistaken assumptions, and increase successful searches, we need more predictable design. To coordinate design practice, we suggest a four-phase framework that would satisfy first time, intermittent, and frequent users accessing a variety of textual and multimedia libraries  相似文献   

9.
Ontology is playing an increasingly important role in knowledge management and the Semantic Web. This study presents a novel episode-based ontology construction mechanism to extract domain ontology from unstructured text documents. Additionally, fuzzy numbers for conceptual similarity computing are presented for concept clustering and taxonomic relation definitions. Moreover, concept attributes and operations can be extracted from episodes to construct a domain ontology, while non-taxonomic relations can be generated from episodes. The fuzzy inference mechanism is also applied to obtain new instances for ontology learning. Experimental results show that the proposed approach can effectively construct a Chinese domain ontology from unstructured text documents.  相似文献   

10.
A Semi-Structured Document Model for Text Mining   总被引:7,自引:0,他引:7       下载免费PDF全文
A semi-structured document has more structured information compared to an ordinary document,and the relation among semi-structured documents can be fully utilized.In order to take advantage of the structure and link information in a semi-structured document for better mining,a structured link vector model (SLVM) is presented in this paper,where a vector represents a document,and vectors‘ elements are determined by terms,document structure and neighboring documents.Text mining based on SLVM is described in the procedure of K-means for briefness and clarity:calculating document similarity and calculating cluster center.The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments,and its F value increases from 0.65-0.73 to 0.82-0.86.  相似文献   

11.
A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and evaluation results are presented.  相似文献   

12.
With the large volume of alerts produced by low-level detectors, management of intrusion alerts is becoming more challenging. Manual analysis of a large number of raw alerts is both time consuming and labor intensive. Alert Correlation addresses this issue by finding similarity and causality relationships between raw alerts to provide a condensed, yet more meaningful view of the network from the intrusion standpoint. While some efforts have been made in the literature by researchers to find the relationships between alerts automatically, not much attention has been given to the issue of real-time correlation of alerts. Previous learning-based approaches either fail to cope with a large number of generated alerts in a large-scale network or do not address the problem of concept drift directly.In this paper, we propose a framework for real-time alert correlation which incorporates novel techniques for aggregating alerts into structured patterns and incremental mining of frequent structured patterns. Our approach to aggregation provides a reduced view of developed patterns of alerts. At the core of the proposed framework is a new algorithm (FSP_Growth) for mining frequent patterns of alerts considering their structures. In the proposed framework, time-sensitive statistical relationships between alerts are maintained in an efficient data structure and are updated incrementally to reflect the latest trends of patterns.The results of experiments conducted with the DARPA 2000 dataset as well as artificial data clearly demonstrate the efficiency of proposed techniques. A promising reduction ratio of 96% is achieved on the DARPA 2000 dataset. The running time of the FSP_Growth algorithm scales linearly with the size of artificial datasets. Moreover, testing the proposed framework with alert logs of a real-world network shows its ability to extract interesting patterns among the alerts. The ability to answer useful time-sensitive queries regarding pattern co-occurrences is another advantage of the proposed method compared to other approaches.  相似文献   

13.
Real world applications provide many examples of unstructured processes, where process execution is mainly driven by contingent decisions taken by the actors, with the result that the process is rarely repeated exactly in the same way. In these cases, traditional Process Discovery techniques, aimed at extracting complete process models from event logs, reveal some limits. In fact, when applied to logs of unstructured processes, Process Discovery techniques usually return complex, “spaghetti-like” models, which usually provide limited support to analysts. As a remedy, in the present work we propose Behavioral Process Mining as an alternative approach to enlighten relevant subprocesses, representing meaningful collaboration work practices. The approach is based on the application of hierarchical graph clustering to the set of instance graphs generated by a process. We also describe a technique for building instance graphs from traces. We assess advantages and limits of the approach on a set of synthetic and real world experiments.  相似文献   

14.
Named entity recognition (NER) is the core part of information extraction that facilitates the automatic detection and classification of entities in natural language text into predefined categories, such as the names of persons, organizations, locations, and so on. The output of the NER task is crucial for many applications, including relation extraction, textual entailment, machine translation, information retrieval, etc. Literature shows that machine learning and deep learning approaches are the most widely used techniques for NER. However, for entity extraction, the abovementioned approaches demand the availability of a domain‐specific annotated data set. Our goal is to develop a hybrid NER system composed of rule‐based deep learning as well as clustering‐based approaches, which facilitates the extraction of generic entities (such as person, location, and organization) out of natural language texts of domains that lack generic named entities labeled domain data sets. The proposed approach takes the advantages of both deep learning and clustering approaches but separately, in combination with a knowledge‐based approach by using a postprocessing module. We evaluated the proposed methodology on court cases (judgments) as a use case since it contains generic named entities of different forms that are poorly or not present in open‐source NER data sets. We also evaluated our hybrid models on two benchmark data sets, namely, Computational Natural Language Learning (CoNLL) 2003 and Open Knowledge Extraction (OKE) 2016. The experimental results obtained from benchmark data sets show that our hybrid models achieved substantially better performance in terms of the F‐score in comparison to other competitive systems.  相似文献   

15.
The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. The chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach, in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics. Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set of pre-defined blogs.  相似文献   

16.
Unstructured Peer-to-Peer (P2P) networks have become a very popular architecture for content distribution in large-scale and dynamic environments. Searching for content in unstructured P2P networks is a challenging task because the distribution of objects has no association with the organization of peers. Proposed methods in recent years either depend too much on objects replication rate or suffer from a sharp decline in performance when objects stored in peers change rapidly, although their performance is better than flooding or random walk algorithms to some extent. In this paper, we propose a novel query routing mechanism for improving query performance in unstructured P2P networks. We design a data structure called traceable gain matrix (TGM) that records every query's gain at each peer along the query hit path, and allows for optimizing query routing decision effectively. Experimental results show that our query routing mechanism achieves relatively high query hit rate with low bandwidth consumption in different types of network topologies under static and dynamic network conditions.  相似文献   

17.
We propose a novel Probabilistic Rating infErence Framework, known as Pref, for mining user preferences from reviews and then mapping such preferences onto numerical rating scales. Pref applies existing linguistic processing techniques to extract opinion words and product features from reviews. It then estimates the sentimental orientations (SO) and strength of the opinion words using our proposed relative-frequency-based method. This method allows semantically similar words to have different SO, thereby addresses a major limitation of existing methods. Pref takes the intuitive relationships between class labels, which are scalar ratings, into consideration when assigning ratings to reviews. Empirical results validated the effectiveness of Pref against several related algorithms, and suggest that Pref can produce reasonably good results using a small training corpus. We also describe a useful application of Pref as a rating inference framework. Rating inference transforms user preferences described as natural language texts into numerical rating scales. This allows Collaborative Filtering (CF) algorithms, which operate mostly on databases of scalar ratings, to utilize textual reviews as an additional source of user preferences. We integrated Pref with a classical CF algorithm, and empirically demonstrated the advantages of using rating inference to augment ratings for CF.  相似文献   

18.
Human-coding reliant conversation analysis methods are ineffective when analyzing large volumes of data. In this paper, we propose a text analytics framework for automated conversation pattern analysis. This framework first extracts speech acts (i.e., activities) from conversation logs, and then analyzes their flow through frequent pattern mining algorithms to reveal insightful communication patterns. Using a real-world data set collected from a customer service center, we demonstrate the usefulness of the framework for identifying patterns that are associated with service quality outcomes. Our work has implications for the design of communication policies and systems for customer service management.  相似文献   

19.
20.
Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号