首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
TEXPROS (TEXT PROcessing System) is an intelligent document processing, system; it supports storing, extracting, classifying, categorizing, retrieving, and browsing information from a variety of office documents [76]. This article presents a retrieval subsystem for TEXPROS, which is capable of processing incomplete, imprecise, and vague queries, and providing semantically meaningful responses to the user. The design of the retrieval subsystem is highly integrated with various mechanisms for achieving these goals. First, a system catalog including a thesaurus is used to store the knowledge about the database. Second, there is a query transformation mechanism composed of context construction and algebraic query formulation modules. Given an incomplete or imprecise query, the context construction module searches the system for the required terms and constructs a query that has a complete and precise representation: The resulting query is then formulated into an algebraic expression. Third, in practice, the user may not have a clear idea of what he is searching for. A browing mechanism is employed for such situations to assist the user in the retrieval process. With the browser, vague queries can be entered into the system until sufficient information, is obtained to the extent that the user is able to construct a query for his request. Finally, when processing of queries fails by responding with a null answer to the user, a generalizer mechanism is used to give the user cooperative explanation for the null answer. The presented techniques will contribute to our research toward development of highly intelligent data processing facilities beyond the present scope of database technology.This work was supported, in part, by the New Jersey Institute of Technology under grant No. 421280 and by a grant from the AT&T Foundation.  相似文献   

This paper formally specifies a document model for office information systems, including formal definitions of document types (frame templates), a document type hierarchy, folders, and folder organizations. Folder Organizations are defined using predicates and directed graphs. AReconstruction Problem for folder organizations is then formulated; viz., under what circumstances it is possible to reconstruct a folder organization from its folder level predicates. The Reconstruction Problem is solved in terms of such graph-theoretic concepts as Associated Digraphs, transitive closure, and redundant/nonredundant filing paths. A Transitive Closure Inversion algorithm is then presented which efficiently recovers a Folder Organization digraph from its Associated Digraph.This work was supported in part by the National Science Foundation under Grant No. IRI-9224602, by the New Jersey Institute of Technology undre Grant No. 421280 and by a grant from AT&T Foundation.  相似文献   

This paper presents a predicate-driven document filing system for organizing and automatically filing documents. A document model consists of two basic elements: frame templates representing document classes, and folders which are repositories of frame instances. The frame templates can be organized to form a document type hierarchy, which helps classify and file documents. Frame instances are grouped into a folder on the basis of user-defined criteria, specified as predicates which determine whether a frame instance belongs to a folder. Folders can naturally organized into a folder organization which represents the user's real world document filing system. The predicate consistency problem is discussed to eliminate two abnormalities from a folder organization: inapplicable edges (filing paths) and redundant folders. An evaluating net (including an association dictionary, an instantiation component and a production system) is then proposed for evaluating whether a frame instance satisfies the predicate of a folder during document filing. And the concept of consistency a rule base is also discussed.This work was supported by the Separately Budgeted Research (SBR) grant (No. 421190) from New Jersey Institute of Technology and the Systems Integration Program grant from AT&T Foundation  相似文献   

File system metadata management has become a bottleneck for many data-intensive applications that rely on high-performance file systems. Part of the bottleneck is due to the limitations of an almost 50-year-old interface standard with metadata abstractions that were designed at a time when high-end file systems managed less than 100 MB. Today's high-performance file systems store 7–9 orders of magnitude more data, resulting in a number of data items for which these metadata abstractions are inadequate, such as directory hierarchies unable to handle complex relationships among data. Users of file systems have attempted to work around these inadequacies by moving application-specific metadata management to relational databases to make metadata searchable. Splitting file system metadata management into two separate systems introduces inefficiencies and systems management problems. To address this problem, we propose QMDS: a file system metadata management service that integrates all file system metadata and uses a graph data model with attributes on nodes and edges. Our service uses a query language interface for file identification and attribute retrieval. We present our metadata management service design and architecture and study its performance using a text analysis benchmark application. Results from our QMDS prototype show the effectiveness of this approach. Compared to the use of a file system and relational database, the QMDS prototype shows superior performance for both ingest and query workloads.  相似文献   

评价智能答疑系统优劣的重要指标是准确率和召回率.系统结合Q/A库和文档库搜索技术的优势实现,利用成熟的Q/A技术回答常见问题,保证了系统的准确率和高效率.利用智能文档搜索技术解答非常见问题,提高了系统的召回率,又因为事先对文档作了预处理,使搜索效率明显提高.同时系统基于课程开发,关键词的词汇量少而精确,使得语义理解的处理得以简化.  相似文献   

高校数字图书馆元数据检索系统的设计与实现   总被引:10,自引:0,他引:10  
结合承担某高校数字图书馆建设工程项目背景,详细分析了元数据的重要性和都柏林核心数据的特点,提出了高校数字图书馆信息检索系统总的设计思想和统一资源检索模型,最后设计出了数字资源的元数据结构和基于元数据的检索系统。  相似文献   

The development of user-adaptive systems is of increasing importance for industrial applications. User modeling emerged from the need to represent in the system knowledge about the user in order to allow informed decisions on how to adapt to match the user's needs. Most of the research in this field, however, has been theoretical, top-down. Our approach, in contrast, was driven by the needs of the application and shows features of bottom-up, user-centered design.We have implemented a user modeling component supporting a task-based interface to a hypermedia information system for hospitals and tested it under realistic conditions. A new architecture for user modeling has been developed which focuses on the tasks performed by users. It allows adaptive browsing support for users with different level of experience, and a level of adaptability. The requirements analysis shows that the differences in the information needs of users with different levels of experience are not only quantitative, but qualitative. Experienced users are not only able to cope with a wider browsing space, but sometimes prefer to organize their search in a different way. That is why the user model and the interface of the system are designed to support a smooth transition in the access options provided to novice users and to expert users.  相似文献   

气象归档与查询系统(MARS)是欧洲中期天气预报中心(ECMWF)开发的用于多种类海量气象数据管理的框架,其核心是利用多维数据模型和数据立方体来组织和管理气象数据。重点研究了MARS系统的主要架构及其超立方体结构的数据索引方法,在此基础上提出了一种大数据背景下数据立方体的元数据查询优化和并行计算方法。实验表明,该方法能够有效缩短大数据量查询及归档情况下的系统响应时间。  相似文献   

针对图档管理的一体化问题,提出了多维文档管理的模式,从文档结构视图、文档类型视图、地理信息视图、文件版本视图4个维度来对文档进行描述、管理和控制。以文档为载体,将地理信息和文档的其它属性有机地结合起来,进一步扩充了文档的内涵,使整个系统的数据管理更加灵活,反映的文档信息更加全面,细化了实体关联的操作粒度,有效地实现了道路规划系统中的图档一体化管理。  相似文献   

Retrieval of document fragments has a great potential for application in engineering information management. Frequently engineers have neither the time nor inclination to sift through long documents for small pieces of useful information. Yet it is frequently in the form of one or more long documents that the information that they seek is presented. Supporting the delivery of the right information, in the right format and in the right quantity motivates the search for better ways of handling document sub-components or fragments. Document fragment retrieval can be facilitated using modern computational technologies. This paper proposes a novel framework for information access utilising state-of-the-art computational technologies and introducing the use of multiple document structure views through decomposition schemes. The framework integrates document structure study, mark-up technologies, automated fragment extraction, faceted classification and a document navigation mechanism to achieve the target of retrieval of specific document fragments using precise, complex queries. These disparate elements have been brought together in an exploratory Engineering Document Content Management System (EDCMS). Using this, investigations using representative engineering documents have shown that information users can access and retrieve document content – at fragment level rather than at document level – both through data in a document and document metadata, through different perspectives and at different granularities, and simultaneously across multiple documents as well as within a single document.  相似文献   

重点设计并实现了863项目"南海深水区动力环境立体检测技术研发"中5频段微波辐射计的数据处理与控制系统,以Xilinx公司Virtex-4系列FPGA为核心,包括数据采集、AGC自动增益控制、系统开关控制、数据通信等模块,精确满足了系统要求,同时给出了系统电路设计、关键模块逻辑图及软件流程图。  相似文献   

In this paper, we present a prototype system, an integrated data management system, which is capable of querying, retrieving, and visualizing datasets with heterogeneous formats and large sizes without requiring users to have any knowledge of any other specific software. Our system has three distinguished characteristics: (1) modular structure and simple architecture which make it easy and feasible for users to add new functions and features to the system, (2) a new search concept and method based on the bounding box and on dynamically delineated watershed boundary from GIS (Geographic Information System), and (3) no requirement on having any knowledge about or installation of any other complicated software. The architecture of our integrated data management system is based on a metadata approach, which consists of four components including a metadata mechanism and a Java-based application engine. The metadata mechanism in conjunction with the Java-based application engine allows users to access and retrieve diverse data formats and structures from many heterogeneous hydrological data sources. The visualization component of the system makes it possible for users to view their queried data first before spending time retrieving them. The extensible and integrative characteristics of our system are illustrated by an example in which new and unique functions for data merging and GIS-based data querying are added to the system. Although the data sources and applications shown in this prototype system are related to the field of hydrology, the ideas, approaches, and system architecture are not domain-specific, and can be used/applied to other fields as well.  相似文献   

We present a general purpose model for routing user requests, e.g. queries, in a network of autonomous heterogeneous databases. The database schemas and other information on the database nodes are used to construct a multi-level knowledge-base (MKB) that resides in various nodes. Access to the databases is not done by creating direct connections between the user and the nodes where the data are presumably located. Rather, the user approaches the network by contents via an intelligent system that utilizes the MKB in order to identify the nodes and databases where the most relevant information resides, and establishes access routes to those nodes.  相似文献   

电子邮件是互联网的重要应用之一,邮件分类问题已成为当今研究的热点。本文基于粗糙集理论,利用0-1贝努利数据提出双向邮件分类模型,在保证当前分类正确率的前提 下,约简了邮件分类所需的文本词频信息,较好地提高了分类效率,推进了粗糙集理论在邮件分类中的应用。  相似文献   

In Spain, the subtitling service on television for the deaf has been improving in quantity since the General Law on Audiovisual Communication was enacted in 2010. This law establishes a series of quality standards that must be followed in the subtitling process. One of the most relevant aspects of subtitle quality is the speed at which they are shown on the screen, due to the fact that a too high speed (less time on screen) will make them difficult to read and the information hard to understand. In order to determine whether the speed at which the subtitles are being shown is adequate, first, it is necessary to process all the information associated with the broadcast of the digital TV channels including data from different sources. In this research, the authors have worked with the data obtained within the time period between July 2012 and December 2017, that is, with more than 950 million records. This article presents a framework for integration and processing of heterogeneous information associated with the subtitling of audiovisual content from different sources. Moreover, the framework will provide an automatic adjustment of subtitles in broadcasting regarding quality indicators by means of a genetic algorithm approach. The results show that the system is able to estimate the best relationship between the time and size of the subtitles and maintaining the quality levels established for this research. These results have been validated by experts and users of this domain.  相似文献   

实现了多数据库中间件中分布异构数据缓冲区,该模型能够实现分布异构数据对象的数据双向同步/异步更新机制;采用的全局数据对象存储模型和缓存管理使数据在缓冲区存储并被用户高效快捷地访问;实现了全局事务管理策略,包括局部代理的设计、全局事务的并发控制、全局事务提交协议等.  相似文献   

在实时监控系统中,数据处理是一项基本功能,对于较复杂的数据计算通常不得不在系统设计期间确定计算的方式和输入变量,这样降低了系统的灵活性。本文介绍了一种通过调用Excel组件实现复杂二次计算的方法,利用此法可使在设计期并不确定二次计算的方式,而在运行中通过数据库的设定和Excel工作表的设计实现各种二次计算。  相似文献   

In today's open and dynamic learning environment, a significant percentage of students have a preference for flexible learning systems whereby they can reconcile their academic pursuits with their job responsibilities and family obligations.Non face-to-face educational models, like e-learning (electronic learning), evolved in order to offer such flexibility. E-learning systems have major strengths but also pose major challenges to the educational community.One such challenge is the large spatial and temporal gap between the teacher and student, which is an obstacle to student follow-up by teachers. The information generated by virtual learning systems sometimes overwhelms instructors who are unable to process the data without the support of special-purpose techniques and tools that are useful for analysing large dataflows.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号