首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Knowledge and Information Systems - Massive open online courses (MOOCs) have emerged as a great resource for learners. Numerous challenges remain to be addressed in order to make MOOCs more useful...  相似文献   

2.
Keyphrase extraction from social media is a crucial and challenging task. Previous studies usually focus on extracting keyphrases that provide the summary of a corpus. However, they do not take users’ specific needs into consideration. In this paper, we propose a novel three-stage model to learn a keyphrase set that represents or related to a particular topic. Firstly, a phrase mining algorithm is applied to segment the documents into human-interpretable phrases. Secondly, we propose a weakly supervised model to extract candidate keyphrases, which uses a few pre-specific seed keyphrases to guide the model. The model consequently makes the extracted keyphrases more specific and related to the seed keyphrases (which reflect the user’s needs). Finally, to further identify the implicitly related phrases, the PMI-IR algorithm is employed to obtain the synonyms of the extracted candidate keyphrases. We conducted experiments on two publicly available datasets from news and Twitter. The experimental results demonstrate that our approach outperforms the state-of-the-art baselines and has the potential to extract high-quality task-oriented keyphrases.  相似文献   

3.
This paper describes the organization and results of the automatic keyphrase extraction task held at the Workshop on Semantic Evaluation 2010 (SemEval-2010). The keyphrase extraction task was specifically geared towards scientific articles. Systems were automatically evaluated by matching their extracted keyphrases against those assigned by the authors as well as the readers to the same documents. We outline the task, present the overall ranking of the submitted systems, and discuss the improvements to the state-of-the-art in keyphrase extraction.  相似文献   

4.
The P300 is an endogenous event-related potential (ERP) that is naturally elicited by rare and significant stimuli, arisen from the frontal, temporal and occipital lobe of the brain, although is usually measured in the parietal lobe. P300 signals are increasingly used in brain-computer interfaces (BCI) because the users of ERP-based BCIs need no special training. In order to detect the P300 signal, most studies in the field have been focused on a supervised approach, dealing with over-fitting filters and the need for later validation. In this paper we start bridging this gap by modeling an unsupervised classifier of the P300 presence based on a weighted score. This is carried out through the use of matched filters that weight events that are likely to represent the P300 wave. The optimal weights are determined through a study of the data’s features. The combination of different artifact cancelation methods and the P300 extraction techniques provides a marked, statistically significant, improvement in accuracy at the level of the top-performing algorithms for a supervised approach presented in the literature to date. This innovation brings a notable impact in ERP-based communicators, appointing to the development of a faster and more reliable BCI technology.  相似文献   

5.
6.
Automatic keyphrase extraction has many important applications including but not limited to summarization, cataloging/indexing, feature extraction for clustering and classification, and data mining. This paper presents the KP-Miner system, and demonstrates through experimentation and comparison with widely used systems that it is effective and efficient in extracting keyphrases from both English and Arabic documents of varied length. Unlike other existing keyphrase extraction systems, the KP-Miner system does not need to be trained on a particular document set in order to achieve its task. It also has the advantage of being configurable as the rules and heuristics adopted by the system are related to the general nature of documents and keyphrases. This implies that the users of this system can use their understanding of the document(s) being input into the system to fine-tune it to their particular needs.  相似文献   

7.
An automatic keyphrase extraction system for scientific documents   总被引:1,自引:0,他引:1  
Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75% without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.  相似文献   

8.
Pattern Analysis and Applications - The most important intricacy when processing natural scene text images is the existence of fog, smoke or haze. These intrusion elements decrease the contrast and...  相似文献   

9.
Automatic text summarization (ATS) has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale corpora. However, there is still no guarantee that the generated summaries are grammatical, concise, and convey all salient information as the original documents have. To make the summarization results more faithful, this paper presents an unsupervised approach that combines rhetorical structure theory, deep neural model, and domain knowledge concern for ATS. This architecture mainly contains three components: domain knowledge base construction based on representation learning, the attentional encoder–decoder model for rhetorical parsing, and subroutine-based model for text summarization. Domain knowledge can be effectively used for unsupervised rhetorical parsing thus rhetorical structure trees for each document can be derived. In the unsupervised rhetorical parsing module, the idea of translation was adopted to alleviate the problem of data scarcity. The subroutine-based summarization model purely depends on the derived rhetorical structure trees and can generate content-balanced results. To evaluate the summary results without golden standard, we proposed an unsupervised evaluation metric, whose hyper-parameters were tuned by supervised learning. Experimental results show that, on a large-scale Chinese dataset, our proposed approach can obtain comparable performances compared with existing methods.  相似文献   

10.
Named entity recognition (NER) is the core part of information extraction that facilitates the automatic detection and classification of entities in natural language text into predefined categories, such as the names of persons, organizations, locations, and so on. The output of the NER task is crucial for many applications, including relation extraction, textual entailment, machine translation, information retrieval, etc. Literature shows that machine learning and deep learning approaches are the most widely used techniques for NER. However, for entity extraction, the abovementioned approaches demand the availability of a domain‐specific annotated data set. Our goal is to develop a hybrid NER system composed of rule‐based deep learning as well as clustering‐based approaches, which facilitates the extraction of generic entities (such as person, location, and organization) out of natural language texts of domains that lack generic named entities labeled domain data sets. The proposed approach takes the advantages of both deep learning and clustering approaches but separately, in combination with a knowledge‐based approach by using a postprocessing module. We evaluated the proposed methodology on court cases (judgments) as a use case since it contains generic named entities of different forms that are poorly or not present in open‐source NER data sets. We also evaluated our hybrid models on two benchmark data sets, namely, Computational Natural Language Learning (CoNLL) 2003 and Open Knowledge Extraction (OKE) 2016. The experimental results obtained from benchmark data sets show that our hybrid models achieved substantially better performance in terms of the F‐score in comparison to other competitive systems.  相似文献   

11.
12.
基于语义Web技术延伸策略管理的范畴,在实现Web安全访问控制的同时通过推理也实现了策略的动态调整过程,提出了一种实现安全Web服务访问的多层策略方法,对下一代Web服务应用进行了有益的探索.  相似文献   

13.
Training data matrix used for classification of text documents to multiple categories is characterized by large number of dimensions while the number of manually classified training documents is relatively small. Thus the suitable dimensionality reduction techniques are required to be able to develop the classifier. The article describes two-step supervised feature extraction method that takes advantage of projections of terms into document and category spaces. We propose several enhancements that make the method more efficient and faster than it was presented in our former paper. We also introduce the adjustment score that enables to correct defected targets or helps to identify improper training examples that bias extracted features.  相似文献   

14.
Knowledge and Information Systems - Relation extraction is an important information extraction task that must be solved in order to transform data into Knowledge Graph (KG), as semantic relations...  相似文献   

15.
Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context K L , which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from K L . Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.  相似文献   

16.
Semantic relation extraction is a significant topic in semantic web and natural language processing with various important applications such as knowledge acquisition, web and text mining, information retrieval and search engine, text classification and summarization. Many approaches such rule base, machine learning and statistical methods have been applied, targeting different types of relation ranging from hyponymy, hypernymy, meronymy, holonymy to domain-specific relation. In this paper, we present a computational method for extraction of explicit and implicit semantic relation from text, by applying statistic and linear algebraic approaches besides syntactic and semantic processing of text.  相似文献   

17.
Reformatting blocks of semi-structured information is a common editing task that typically involves highly repetitive action sequences, but ones where exceptional cases arise constantly and must be dealt with as they arise. This paper describes a procedural programming-by-example approach to repetitive text editing which allows users to construct programs within a standard editing interface and extend them incrementally. Following a brief practice period during which they settle on an editing strategy for the task at hand, users commence editing in the normal way. Once the first block of text has been edited, they inform the learning system which constructs a generalized procedure from the actions that have been recorded. The system then attempts to apply the procedure to the next block of text, by predicting editing actions and displaying them for confirmation. If the user accepts a prediction, the action is carried out (and the program may be generalized accordingly); otherwise the user is asked to intervene and supply additional information, in effect debugging the program on the fly. A pilot implementation is described that operates in a simple interactive point-and-click editor (Macintosh MINI-EDIT), along with its performance on three sample tasks. In one case the procedure was learned correctly from the actions on the first text block, while in the others minor debugging was needed on subsequent text blocks. In each case a much smaller number of both keystrokes and mouse-clicks was required than with normal editing, without the system making any prior assumptions about the stucture of the text except for some general knowledge about lexical patterns. Although a smooth interactive interface has not yet been constructed, the results obtained serve to indicate the potential of this approach for semi-structured editing tasks.  相似文献   

18.
Abstract

Reformatting blocks of semi-structured information is a common editing task that typically involves highly repetitive action sequences, but ones where exceptional cases arise constantly and must be dealt with as they arise. This paper describes a procedural programming-by-example approach to repetitive text editing which allows users to construct programs within a standard editing interface and extend them incrementally. Following a brief practice period during which they settle on an editing strategy for the task at hand, users commence editing in the normal way. Once the first block of text has been edited, they inform the learning system which constructs a generalized procedure from the actions that have been recorded. The system then attempts to apply the procedure to the next block of text, by predicting editing actions and displaying them for confirmation. If the user accepts a prediction, the action is carried out (and the program may be generalized accordingly); otherwise the user is asked to intervene and supply additional information, in effect debugging the program on the fly. A pilot implementation is described that operates in a simple interactive point-and-click editor (Macintosh MINI-EDIT), along with its performance on three sample tasks. In one case the procedure was learned correctly from the actions on the first text block, while in the others minor debugging was needed on subsequent text blocks. In each case a much smaller number of both keystrokes and mouse-clicks was required than with normal editing, without the system making any prior assumptions about the stucture of the text except for some general knowledge about lexical patterns. Although a smooth interactive interface has not yet been constructed, the results obtained serve to indicate the potential of this approach for semi-structured editing tasks.  相似文献   

19.
Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.  相似文献   

20.
The multi-orientation occurs frequently in ancient handwritten documents, where the writers try to update a document by adding some annotations in the margins. Due to the margin narrowness, this gives rise to lines in different directions and orientations. Document recognition needs to find the lines everywhere they are written whatever their orientation. This is why we propose in this paper a new approach allowing us to extract the multi-oriented lines in scanned documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image meshing allowing us to progressively and locally determine the lines. Once the meshing is established, the orientation is determined using the Wigner–Ville distribution on the projection histogram profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterward, the text lines are extracted locally in each zone basing on the follow-up of the orientation lines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an accuracy of about 98.6%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号