首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The creation and deployment of knowledge repositories for managing, sharing, and reusing tacit knowledge within an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge repository, knowledge maps, often created by document clustering techniques, represent an appealing and promising approach. Various document clustering techniques have been proposed in the literature, but most deal with monolingual documents (i.e., written in the same language). However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual document clustering (MLDC) to create organizational knowledge maps. Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents. The empirical evaluation results show that the proposed LSI-based MLDC technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision, and is capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus.  相似文献   

2.
Finding relevant information in a large and comprehensive collection of cross-referenced documents like Wikipedia usually requires a quite accurate idea where to look for the pieces of data being sought. A user might not yet have enough domain-specific knowledge to form a precise search query to get the desired result on the first try. Another problem arises from the usually highly cross-referenced structure of such document collections. When researching a subject, users usually follow some references to get additional information not covered by a single document. With each document, more opportunities to navigate are added and the structure and relations of the visited documents gets harder to understand.  相似文献   

3.
一种基于群体智能的Web文档聚类算法   总被引:31,自引:0,他引:31  
将群体智能聚类模型运用于文档聚类,提出了一种基于群体智能的Web文档聚类算法,首先运用向量空间模型表示Web文档信息,采用常规方法如消除无用词和特征词条约简法则得到文本特征集,然后将文档的向量随机分布到一个平面上,运用基于群体智能的聚类方法进行文档聚类,最后从平面上采用递归算法收集聚类结果,为了改善算法的实用性,将原算法与k均值算法结合提出一种混合聚类算法,通过实验比较,结果表明基于群体智能的Web文档聚类算法具有较好的聚类特性,它能将与一个主题相关的Web文档较完全而准确地聚成一类。  相似文献   

4.
针对卫星星座健康状态管理文档涉及多项遥测参数的查询和计算、文档格式要求严格、编制工作量巨大、人工耗时较长的问题,提出了一种卫星星座健康状态管理文档自动生成方法.通过对文档中所含的基本数据类型进行归类分析,制定配置文件存储规则,对文档模板进行自定义设置,并应用文档自动生成算法,利用文档模板及相关参数生成数据汇总文档.该方法能够实现文档编制过程中的知识复用和通用内容生成,建立规范有效的文档编制流程.  相似文献   

5.
Various techniques for computer-based knowledge representation and processing are widely used in management and economics. Other techniques such as rules and demons have arisen in the artificial intelligence field. These too can be useful in managerial and economics settings. A major issue is how to effectively employ multiple traditional and artificial intelligence techniques when working on a problem. In this paper, we examine the various knowledge management techniques with respect to their applicability to handling distinct types of knowledge. An object-oriented framework is presented as a basis for the unified and coordinated treatment of multiple knowledge management techniques in a single environment. Using this framework, two approaches are identified for delivering these techniques to a knowledge worker: skeletal environments and furnished environments.  相似文献   

6.
Complex business models in large-scale enterprises deal with voluminous knowledge based on which most decisive official and technical documents are generated. Nowadays, template processors are available for generating such documents. However, the existing template processors are either labor intensive or complicated to suit well-established business model and knowledge repositories in a heterogeneous environment. Hence, a novel generalized adaptable and flexible template processor that utilizes the existing resources without modifying the business model is proposed. The tacit business intelligence defined as rules, knowledge repositories and document structure are the nodal agents of this approach. Further, an XML based Object Query Definition Markup Language for rule definition is newly suggested. The rules are reorganized into hierarchical DAG structured rules using a transformation algorithm and traversed using hybrid traversal. The required output document is represented through a template. Object wrappers act as the communicating agent between diversified datasets and the templates. The proposed architecture is modeled and implemented using set theory. It is experimented in a web-based distributed environment using JAVA and tested using a real world dataset of a large-scale engineering enterprise. The results demonstrate its adaptability and extensibility to any multi-organizational structure.  相似文献   

7.
This paper reports on ways of using digitised video from television cameras in user interfaces for computer systems. The DigitalDesk is built around an ordinary physical desk and can be used as such, but it has extra capabilities. A video camera mounted above the desk, pointing down at the work surface, is used to detect where the user is pointing and to read documents that are placed on the desk. A computer-driven projector is also mounted above the desk, allowing the system to project electronic objects onto the work surface and onto real paper documents. The animated paper documents project is considering particular applications of the technology in electronic publishing. The goal is to combine electronic and printed documents to give a richer presentation than that afforded by either separate medium. This paper describes the framework that has been developed to assist with the preparation and presentation of these mixed-media documents. The central component is a registry that associates physical locations on pieces of paper with actions. This is surrounded by a number of adaptors that assist with the creation of new documents either from scratch or by translating from conventional hypermedia, and also allow the documents to be edited. Finally the DigitalDesk itself identifies pieces of paper and animates them with the actions described in the registry.  相似文献   

8.
情报信息综合处理方法的研究   总被引:1,自引:0,他引:1  
讨论了通过对不一致情报的聚类,按重要性分级,从类中提取原型来对新获取的情报进行分类等手段对情报信息融合做预处理,实现对情报的实时处理。在与确定事件相关的情报或不知与何事件关联的情报到达时,都可以采用这种方法。通过后台的聚类操作,按照所关联的事件把情报分为不同的子集,在前台并行运行一个快速分类算法把新获取的情报归入正确的信息融合过程。  相似文献   

9.
RUBRIC: A System for Rule-Based Information Retrieval   总被引:1,自引:0,他引:1  
A research prototype software system for conceptual information retrieval has been developed. The goal of the system, called RUBRIC, is to provide more automated and relevant access to unformatted textual databases. The approach is to use production rules from artificial intelligence to define a hierarchy of retrieval subtopics, with fuzzy context expressions and specific word phrases at the bottom. RUBRIC allows the definition of detailed queries starting at a conceptual level, partial matching of a query and a document, selection of only the highest ranked documents for presentation to the user, and detailed explanation of how and why a particular document was selected. Initial experiments indicate that a RUBRIC rule set better matches human retrieval judgment than a standard Boolean keyword expression, given equal amounts of effort in defining each. The techniques presented may be useful in stand-alone retrieval systems, front-ends to existing information retrieval systems, or real-time document filtering and routing.  相似文献   

10.
文本复制检测是这样一种行为:它判断一个文档的内容是否抄袭、剽窃或者复制于另外一个或者多个文档。文档复制检测领域的算法有很多,基于句子相似度的检测算法结合了基于字符串比较的方法和基于词频统计的方法的优点,在抓住了文档的全局特征的同时又能兼顾文档的结构信息,是一种很好的算法。本文在该算法的基础上对相似度算法进行了改进,提出了一种新的面向中文文档的基于句子相似度的文档复制检测算法。本算法充分考虑了中文文档的特点,选择句子作为文档的特征单元,并解决了需要人工设定阈值的问题,提高了检测精度。实验证明,无论是在效率上,还是在准确性上,该算法都是可行的。  相似文献   

11.
本文阐述了大数据、人工智能、移动技术等新一代信息技术在核电企业的作用。江苏核电深入探索相关技术的适用范围和应用场景,把握当前文档管理的发展趋势,结合当前江苏核电面临的实际问题以及当前已具备的信息化资源情况和建设路径,按照“创新驱动,分步实施”的工作思路,积极探索运用多种手段,多措并举,实现了基于大数据的文档信息资源整合与利用、基于移动技术的案卷管理和文档服务、基于电子文件全周期的自动归档管理、基于用户行为大数据的文档利用热点分析等新兴应用,提升了文档管理水平,助推创新型文档信息化建设。通过一系列手段,实现了核电企业文档管理工作智能管控,有力提升了核电企业文档管理水平,取得了较好的效果。  相似文献   

12.
The paper tries to bridge gap between knowledge management and artificial intelligence approaches proposing agent-based framework for modelling organization and personal knowledge. The perspective of knowledge management is chosen to develop two conceptual models—one describes the intelligent enterprise memory, another models an intelligent organization’s knowledge management system. The concept of an agent-based environment of the knowledge worker for personal and organizational knowledge management support is introduced.  相似文献   

13.
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.  相似文献   

14.
Patents are a type of intellectual property with ownership and monopolistic rights that are publicly accessible published documents, often with illustrations, registered by governments and international organizations. The registration allows people familiar with the domain to understand how to re-create the new and useful invention but restricts the manufacturing unless the owner licenses or enters into a legal agreement to sell ownership of the patent. Patents reward the costly research and development efforts of inventors while spreading new knowledge and accelerating innovation. This research uses artificial intelligence natural language processing, deep learning techniques and machine learning algorithms to extract the essential knowledge of patent documents within a given domain as a means to evaluate their worth and technical advantage. Manual patent abstraction is a time consuming, labor intensive, and subjective process which becomes cost and outcome ineffective as the size of the patent knowledge domain increases. This research develops an intelligent patent summarization methodology using artificial intelligence machine learning approaches to allow patent domains of extremely large sizes to be effectively and objectively summarized, especially for cases where the cost and time requirements of manual summarization is infeasible. The system learns to automatically summarize patent documents with natural language texts for any given technical domain. The machine learning solution identifies technical key terminologies (words, phrases, and sentences) in the context of the semantic relationships among training patents and corresponding summaries as the core of the summarization system. To ensure the high performance of the proposed methodology, ROUGE metrics are used to evaluate precision, recall, accuracy, and consistency of knowledge generated by the summarization system. The Smart machinery technologies domain, under the sub-domains of control intelligence, sensor intelligence and intelligent decision-making provide the case studies for the patent summarization system training. The cases use 1708 training pairs of patents and summaries while testing uses 30 randomly selected patents. The case implementation and verification have shown the summary reports achieve 90% and 84% average precision and recall ratios respectively.  相似文献   

15.
可解释的知识图谱推理方法综述   总被引:2,自引:0,他引:2       下载免费PDF全文
近年来,以深度学习模型为基础的人工智能研究不断取得突破性进展,但其大多具有黑盒性,不利于人类认知推理过程,导致高性能的复杂算法、模型及系统普遍缺乏决策的透明度和可解释性。在国防、医疗、网络与信息安全等对可解释性要求严格的关键领域,推理方法的不可解释性对推理结果及相关回溯造成较大影响,因此,需要将可解释性融入这些算法和系统中,通过显式的可解释知识推理辅助相关预测任务,形成一个可靠的行为解释机制。知识图谱作为最新的知识表达方式之一,通过对语义网络进行建模,以结构化的形式描述客观世界中实体及关系,被广泛应用于知识推理。基于知识图谱的知识推理在离散符号表示的基础上,通过推理路径、逻辑规则等辅助手段,对推理过程进行解释,为实现可解释人工智能提供重要途径。针对可解释知识图谱推理这一领域进行了全面的综述。阐述了可解释人工智能和知识推理相关概念。详细介绍近年来可解释知识图谱推理方法的最新研究进展,从人工智能的3个研究范式角度出发,总结了不同的知识图谱推理方法。提出对可解释的知识图谱推理研究前景和未来研究方向。  相似文献   

16.
A large-scale project produces a lot of text data during construction commonly achieved as various management reports. Having the right information at the right time can help the project team understand the project status and manage the construction process more efficiently. However, text information is presented in unstructured or semi-structured formats. Extracting useful information from such a large text warehouse is a challenge. A manual process is costly and often times cannot deliver the right information to the right person at the right time. This research proposes an integrated intelligent approach based on natural language processing technology (NLP), which mainly involves three stages. First, a text classification model based on Convolution Neural Network (CNN) is developed to classify the construction on-site reports by analyzing and extracting report text features. At the second stage, the classified construction report texts are analyzed with improved frequency-inverse document frequency (TF-IDF) by mutual information to identify and mine construction knowledge. At the third stage, a relation network based on the co-occurrence matrix of the knowledge is presented for visualization and better understanding of the construction on-site information. Actual construction reports are used to verify the feasibility of this approach. The study provides a new approach for handling construction on-site text data which can lead to enhancing management efficiency and practical knowledge discovery for project management.  相似文献   

17.
概念的形成是实现人工智能的基础,为研究人工智能系统中概念的形成过程,从人对事物形成概念的过程出发进行了研究。比较人和人工智能系统的概念形成过程得到了如下特点:人的优势在于能自主地确定对象表象和对象功能中的各种特征和划分等,能在对象、描述性定义和功能性定义对应关系不完备情况下通过思维和联想建立概念;人工智能系统的优势在于丰富的对象表象感知能力,对象的各种特征和划分的长期存储、运算和分析能力;而人工智能的概念形成过程存在的缺点基本与人的概念形成过程的优点对应。因此本文认为人工智能的概念形成过程必须关注因素的智能识别、功能的系统实践和人经验知识的有师学习。现有技术在缺乏人经验知识的情况下,人工智能系统不能自主建立概念和知识库,不能实现智能过程。  相似文献   

18.
Nowadays, decision-making activities of knowledge-intensive enterprises depend heavily on the successful classification of patents. A considerable amount of time is required to achieve successful classification because of the complexity associated with patent information and of the large number of potential patents. Several different patent classification approaches have been developed in the past, but most of these studies focus on using computational models for the International Patent Classification (IPC) system rather than using these models in real-world cases of patent classification. In contrast to previous studies that combined algorithms and the IPC system directly without using expert screening, this study proposes a novel artificial intelligence (AI)-aided patent decision-making process. In this process, an expert screening approach is integrated with a hybrid genetic-based support vector machine (HGA-SVM) model for developing a patent classification system with the high classification accuracy and generalization ability for real-world patent searching cases. The proposed approach is tested on a real-world case—an expert's patent document searching history that contains 234 patent documents of semiconductor equipment components. The research results demonstrate that our proposed hybrid genetic algorithm approach can optimize all the parameters of the SVM for developing a patent classification system with a high accuracy. The proposed HGA-SVM model is able to dynamically and automatically classify patent documents by recording and learning the experts’ knowledge and logic. Finally, we propose a new decision-making process for improving the development of the SVM patent classification and searching system.  相似文献   

19.
ACIRD: intelligent Internet document organization and retrieval   总被引:6,自引:0,他引:6  
This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier, and two-phase search engine. The knowledge acquisition process of ACIRD automatically learns classification knowledge from classified Internet documents. The document classifier applies learned classification knowledge to classify newly collected Internet documents into one or more classes. Experimental results indicate that ACIRD performs as well or better than human experts in both knowledge acquisition and document classification. By using the learned classification knowledge and the given class lattice, the ACIRD two-phase search engine responds to user queries with hierarchically structured navigable results (instead of a conventional flat ranked document list), which greatly aids users in locating information from numerous, diversified Internet documents  相似文献   

20.
Machine Learning for Intelligent Processing of Printed Documents   总被引:1,自引:0,他引:1  
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. This article proposes the application of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated from a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classification and understanding of documents. Issues concerning the incremental induction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the proposed solutions is empirically evaluated by processing a set of real printed documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号