首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 109 毫秒
1.
进入21世纪以来,知识数据大量存储在文档中,但各类文档的粒度和结构不便于知识的加工、整合和管理. 如何从这些无序的、非结构化的数据(知识)源中提取语义,首要任务是将蕴藏在数据、信息中的知识抽取出来,建立文本资源的语义网,采用RDF来表示语义数据,其次采用TFIDF算法计算得出文本特征词的可信度,最后将文本信息录入到数据库中,实现文本类资源的自动分类,最终目的是实现文本资源知识的共享.  相似文献   

2.
TTFS:一个倾向性文本过滤系统的设计与实现   总被引:3,自引:0,他引:3  
以往文本过滤的研究主要集中于主题性过滤,然而随着网络的发展,倾向性文本过滤在网络信息安全方面的作用越来越大。论文阐述了一个倾向性文本过滤系统TTFS(Tenclency Text Filtering System),能够对具有关于某个主题的特定倾向的文本进行过滤。该系统充分利用了领域知识,采用了语义模式分析等技术,实验表明其查全率和查准率高,速度较快。  相似文献   

3.
基于搜索引擎的知识发现   总被引:3,自引:0,他引:3  
数据挖掘一般用于高度结构化的大型数据库,以发现其中所蕴含的知识。随着在线文本的增多,其中所蕴含的知识也越来越丰富,但是,它们却难以被分析利用。因而,研究一套行之有效的方案发现文本中所蕴含的知识是非常重要的,也是当前重要的研究课题。该文利用搜索引擎Google获取相关Web页面,进行过滤和清洗后得到相关文本,然后,进行文本聚类,利用Episode进行事件识别和信息抽取,数据集成及数据挖掘,从而实现知识发现。最后给出了原型系统,对知识发现进行实践检验,收到了很好的效果。  相似文献   

4.
This article provides an overview of, and thematic justification for, the special issue of the journal of Artificial Intelligence and Law entitled “E-Discovery”. In attempting to define a characteristic “AI & Law” approach to e-discovery, and since a central theme of AI & Law involves computationally modeling legal knowledge, reasoning and decision making, we focus on the theme of representing and reasoning with litigators’ theories or hypotheses about document relevance through a variety of techniques including machine learning. We also identify two emerging techniques for enabling users’ document queries to better express the theories of relevance and connect them to documents: social network analysis and a hypothesis ontology.  相似文献   

5.
In the context of information retrieval (IR) from text documents, the term weighting scheme (TWS) is a key component of the matching mechanism when using the vector space model. In this paper, we propose a new TWS that is based on computing the average term occurrences of terms in documents and it also uses a discriminative approach based on the document centroid vector to remove less significant weights from the documents. We call our approach Term Frequency With Average Term Occurrence (TF-ATO). An analysis of commonly used document collections shows that test collections are not fully judged as achieving that is expensive and maybe infeasible for large collections. A document collection being fully judged means that every document in the collection acts as a relevant document to a specific query or a group of queries. The discriminative approach used in our proposed approach is a heuristic method for improving the IR effectiveness and performance and it has the advantage of not requiring previous knowledge about relevance judgements. We compare the performance of the proposed TF-ATO to the well-known TF-IDF approach and show that using TF-ATO results in better effectiveness in both static and dynamic document collections. In addition, this paper investigates the impact that stop-words removal and our discriminative approach have on TF-IDF and TF-ATO. The results show that both, stop-words removal and the discriminative approach, have a positive effect on both term-weighting schemes. More importantly, it is shown that using the proposed discriminative approach is beneficial for improving IR effectiveness and performance with no information on the relevance judgement for the collection.  相似文献   

6.
Documenting software architecture rationale is essential to reuse and evaluate architectures, and several modeling and documentation guidelines have been proposed in the literature. However, in practice creating and updating these documents rarely is a primary activity in most software projects, and rationale remains hidden in casual and semi-structured records, such as e-mails, meeting notes, wikis, and specialized documents. This paper describes the TREx (Toeska Rationale Extraction) approach to recover, represent and explore rationale information from text documents, combining: (1) pattern-based information extraction to recover rationale; (2) ontology-based representation of rationale and architectural concepts; and (3) facet-based interactive exploration of rationale. Initial results from TREx’s application suggest that some kinds of architecture rationale can be semi-automatically extracted from a project’s unstructured text documents, namely decisions, alternatives and requirements. The approach and some tools are illustrated with a case study of rationale recovery for a financial securities settlement system.  相似文献   

7.
8.
Sharing sustainable and valuable knowledge among knowledge workers is a fundamental aspect of knowledge management. In organizations, knowledge workers usually have personal folders in which they organize and store needed codified knowledge (textual documents) in categories. In such personal folder environments, providing knowledge workers with needed knowledge from other workers’ folders is important because it increases the workers’ productivity and the possibility of reusing and sharing knowledge. Conventional recommendation methods can be used to recommend relevant documents to workers; however, those methods recommend knowledge items without considering whether the items are assigned to the appropriate category in the target user’s personal folders. In this paper, we propose novel document recommendation methods, including content-based filtering and categorization, collaborative filtering and categorization, and hybrid methods, which integrate text categorization techniques, to recommend documents to target worker’s personalized categories. Our experiment results show that the hybrid methods outperform the pure content-based and the collaborative filtering and categorization methods. The proposed methods not only proactively notify knowledge workers about relevant documents held by their peers, but also facilitate push-mode knowledge sharing.  相似文献   

9.
A key task for students learning about a complex topic from multiple documents on the web is to establish the existing rhetorical relations between the documents. Traditional search engines such as Google® display the search results in a listed format, without signalling any relationship between the documents retrieved. New search engines such as Kartoo® go a step further, displaying the results as a constellation of documents, in which the existing relations between pages are made explicit. This presentation format is based on previous studies of single-text comprehension, which demonstrate that providing a graphical overview of the text contents and their relation boosts readers’ comprehension of the topic. We investigated the assumption that graphical overviews can also facilitate multiple-documents comprehension. The present study revealed that undergraduate students reading a set of web pages on climate change comprehended them better when using a search engine that makes explicit the relationships between documents (i.e. Kartoo-like) than when working with a list-like presentation of the same documents (i.e. Google-like). The facilitative effect of a graphical-overview interface was reflected in inter-textual inferential tasks, which required students to integrate key information between documents, even after controlling for readers’ topic interest and background knowledge.  相似文献   

10.
Web文本表示方法作为所有Web文本分析的基础工作,对文本分析的结果有深远的影响。提出了一种多维度的Web文本表示方法。传统的文本表示方法一般都是从文本内容中提取特征,而文档的深层次特征和外部特征也可以用来表示文本。本文主要研究文本的表层特征、隐含特征和社交特征,其中表层特征和隐含特征可以由文本内容中提取和学习得到,而文本的社交特征可以通过分析文档与用户的交互行为得到。所提出的多维度文本表示方法具有易用性,可以应用于各种文本分析模型中。在实验中,改进了两种常用的文本聚类算法——K-means和层次聚类算法,并命名为多维度K-means MDKM和多维度层次聚类算法MDHAC。通过大量的实验表明了本方法的高效性。此外,我们在各种特征的结合实验结果中还有一些深层次的发现。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号