首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics, images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented our classification scheme using decision tree classifiers and self-organizing maps. Received June 15, 2000 / Revised November 15, 2000  相似文献   

Transforming paper documents into XML format with WISDOM++   总被引:1,自引:1,他引:0  
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported. Received June 15, 2000 / Revised November 7, 2000  相似文献   

Interactive voice browsers offer an alternative paradigm that affords ubiquitous mobile access to the WWW using a wide range of consumer devices. This technology can facilitate a safe, “hands-free” browsing environment that is of importance both to car drivers and various mobile and technical professionals. This paper describes the challenges of architecting an interactive voice browser that combines digital audio with the features of a speech synthesizer to make structural elements of the document explicit to the listener. The aesthetics of the audio rendition can simultaneously help reduce the monotony factor and enhance comprehension. The evolution of the voice browser gave rise to a new conceptual model of the HTML document structure and its mapping to a 3D audio space. A number of novel features are discussed for improving both the user’s comprehension of the HTML document structure and their orientation within it. These factors, in turn, can improve the effectiveness of the browsing experience.  相似文献   

In this paper, we address the question of how flesh and blood decision makers manage the combinatorial explosion in scenario development for decision making under uncertainty. The first assumption is that the decision makers try to undertake ‘robust’ actions. For the decision maker a robust action is an action that has sufficiently good results whatever the events are. We examine the psychological as well as the theoretical problems raised by the notion of robustness. Finally, we address the false feeling of decision makers who talk of ‘risk control’. We argue that ‘risk control’ results from the thinking that one can postpone action after nature moves. This ‘action postponement’ amounts to changing look-ahead reasoning into diagnosis. We illustrate these ideas in the framework of software development and examine some possible implications for requirements analysis.  相似文献   

The way in which humans perceive and react to visual complexity is an important issue in many areas of research and application, particularly because simplification of complex matter can lead to better understanding of both human behaviour in visual control tasks as well as the visual environment itself. One area of interest is how people perceive their world in terms of complexity and how this can be modelled mathematically and/or computationally. A prototype model of complexity has been derived using subcomponents called ‘SymGeons’ (Symmetrical Geometric Icons) based on Biederman’s original Geon Model for human perception. The SymGeons are primitive shapes which constitute foreground objects. This paper outlines the derivation and ongoing development of the ‘SymGeon’ model and how it compares to human perception of visual complexity. The application of the model to understanding complex human-in-the-loop problems associated with visual remote control operations, e.g. control of remotely operated vehicles, is discussed.  相似文献   

The contributors to this special issue focus on socio-technical and soft approaches to information requirements elicitation and systems development. They represent a growing body of research and practice in this field. This review presents an overview and analysis of the salient themes within the papers encompassing their common underlying framework, the methodologies and tools and techniques presented, the organisational situations in which they are deployed and the issues they seek to address. It will be argued in the review that the contributions to this special edition exemplify the ‘post-methodological era’ and the ‘contingency approaches’ from which it is formed.  相似文献   

In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution. Received March 30, 2000 / Revised June 26, 2001  相似文献   

This article offers a research update on a 3-year programme initiated by the Kamloops Art Gallery and the University College of the Cariboo in Kamloops, British Columbia. The programme is supported by a ‘Community–University Research Alliance’ grant from the Social Sciences and Humanities Research Council of Canada, and the collaboration focuses on the cultural future of small cities – on how cultural and arts organisations work together (or fail to work together) in a small city setting. If not by definition, then certainly by default, ‘culture’ is associated with big city life: big cities are equated commonly with ‘big culture’; small cities with something less. The Cultural Future of Small Cities research group seeks to provide a more nuanced view of what constitutes culture in a small Canadian city. In particular, the researchers are exploring notions of social capital and community asset building: in this context, ‘visual and verbal representation’, ‘home’, ‘community’ and the need to define a local ‘sense of place’ have emerged as important themes. As the Small Cities programme begins its second year, a unique but key aspect has become the artist-as-researcher. Correspondence and offprint requests to: L. Dubinsky, Kamloops Art Gallery, 101–465 Victoria Street, Kamloops, BC V2C 2A9 Canada. Tel.: 250-828-3543; Email: ldubinsky@museums.ca  相似文献   

Personalized, interactive news on the Web   总被引:2,自引:0,他引:2  
We present Krakatoa Chronicle, an interactive, personalized newspaper on the World Wide Web implemented as a Java applet. The newspaper is similar in appearance to newspapers in the real world, with a multi-column layout and justified text. At the same time, it provides various interaction techniques for browsing the content of articles, giving relevance feedback, and dynamically changing layout. As users interact with the system, individual ‘user profiles’ are built up at the webserver site. These are used to tailor the newspaper's content and layout to each user's declared and inferred preferences. The system allows for a balancing of personal and community interests, allowing the user to navigate through a space of newspapers corresponding to a range of viewpoints.  相似文献   

This paper is centred on evaluating some significant features of decision-making in process control tasks. The study was carried out in a petrol refinery, specifically, with distillation console operators. Operator verbalisations were recorded during the completion of two specific tasks and later categorised by raters using a list of cognitive categories. The inter-rater reliability was calculated together with qualitative evaluations of the main overlaps among the categories. From the raters’ evaluations, flow diagrams were drawn that represented the plans and tactics developed by operators and the implied cognitive processes (evaluation, prediction, action, etc.). We found that the operators began the tasks with a primary global situation assessment that determined the choice of whether to cope with the task ‘step by step’ or ‘globally’. The results showed two patterns of decision sequences made in normal adjustment performance or in problem situations. Other findings are related to the importance of characteristics such as prediction, anticipation, feedback and the role of the alarms selected by a situation assessment and individual characteristics.  相似文献   

This paper presents a novel computer entertainment system which recaptures human touch and physical interaction with the real-world environment as essential elements of the game play, whilst also maintaining the exciting fantasy features of traditional computer entertainment. Our system called ‘Touch-Space’ is an embodied (ubiquitous, tangible, and social) computing based Mixed Reality (MR) game space which regains the physical and social aspects of traditional game play. In this novel game space, the real-world environment is an essential and intrinsic game element, and the human’s physical context influences the game play. It also provides the full spectrum of game interaction experience ranging from the real physical environment (human to human and human to physical world interaction), to augmented reality, to the virtual environment. It allows tangible interactions between players and virtual objects, and collaborations between players in different levels of reality. Thus, the system re-invigorates computer entertainment systems with social human-to-human and human-to-physical touch interactions. Correspondence to: Professor A. Cheok, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Email: adriancheok@nus.edu.sg  相似文献   

This paper presents the current state of the A2iA CheckReaderTM – a commercial bank check recognition system. The system is designed to process the flow of payment documents associated with the check clearing process: checks themselves, deposit slips, money orders, cash tickets, etc. It processes document images and recognizes document amounts whatever their style and type – cursive, hand- or machine printed – expressed as numerals or as phrases. The system is adapted to read payment documents issued in different English- or French-speaking countries. It is currently in use at more than 100 large sites in five countries and processes daily over 10 million documents. The average read rate at the document level varies from 65 to 85% with a misread rate corresponding to that of a human operator (1%). Received October 13, 2000 / Revised December 4, 2000  相似文献   

The elicitation or communication of user requirements comprises an early and critical but highly error-prone stage in system development. Socially oriented methodologies provide more support for user involvement in design than the rigidity of more traditional methods, facilitating the degree of user–designer communication and the ‘capture’ of requirements. A more emergent and collaborative view of requirements elicitation and communication is required to encompass the user, contextual and organisational factors. From this accompanying literature in communication issues in requirements elicitation, a four-dimensional framework is outlined and used to appraise comparatively four different methodologies seeking to promote a closer working relationship between users and designers. The facilitation of communication between users and designers is subject to discussion of the ways in which communicative activities can be ‘optimised’ for successful requirements gathering, by making recommendations based on the four dimensions to provide fruitful considerations for system designers.  相似文献   

While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. Received July 18, 2000 / Accepted October 4, 2001  相似文献   

This paper describes an approach to the problem of articulating multimedia information based on parsing and syntax-directed translation that uses Relational Grammars. This translation is followed by a constraint-solving mechanism to create the final layout. Grammatical rules provide the mechanism for mapping from a representation of the content and context of a presentation to forms that specify the media objects to be realized. These realization forms include sets of spatial and temporal constraints between elements of the presentation. Individual grammars encapsulate the “look and feel” of a presentation and can be used as generators of such a style. By making the grammars sensitive to the requirements of the output medium, parsing can introduce flexibility into the information realization process.  相似文献   

Document image segmentation is the first step in document image analysis and understanding. One major problem centres on the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid and mismatched regions. Received July 14, 2000 / Revised June 12, 2001[-1mm]  相似文献   

In this paper, we present some of the results from our ongoing research work in the area of ‘agent support’ for electronic commerce, particularly at the user interface level. Our goal is to provide intelligent agents to assist both the consumers and the vendors in an electronic shopping environment. Users with a wide variety of different needs are expected to use the electronic shopping application and their expectations about the interface could vary a lot. Traditional studies of user interface technology have shown the existence of a ‘gap’ between what the user interface actually lets the users do and the users’ expectations. Agent technology, in the form of personalized user interface agents, can help to narrow this gap. Such agents can be used to give a personalized service to the user by knowing the user’s preferences. By doing so, they can assist in the various stages of the users’ shopping process, provide tailored product recommendations by filtering information on behalf of their users and reduce the information overload. From a vendor’s perspective, a software sales agent could be used for price negotiation with the consumer. Such agents would give the flexibility offered by negotiation without the burden of having to provide human presence to an online store to handle such negotiations. Published online: 25 July 2001  相似文献   

Managing dynamic environments often requires decision making under uncertainty and risk. Two types of uncertainty are involved: uncertainty about the state and the evolution of the situation, and ‘openness’ of the possible actions to face possible consequences. In an experimental study on risk management in dynamic situations, two contrasted ‘ecological’ scenarios – transposed from effective situations of emergency management – were compared in order to identify the impact of their ‘openness’ in the subjects’ strategies for decision making. The ‘Lost Child’ scenario presented qualitative and irreversible consequences (child’s death) and high uncertainty; it exerted high demands both in risk assessment (risk representation) and action elaboration and choice. A less open situation (‘Hydrocarbon Fire’) required a main choice between two contrasted actions, with quantitative computable consequences. The strategies of ‘experimental subjects’ (university students) and ‘operative subjects’ (professional fire-fighter officers) were compared in order to evaluate the ecological validity of experimental research in this field, from the point of view of the subjects themselves. The two scenarios appeared to be independent, so that quite different models of decision making have to be hypothesised, differing by the importance of assessing risk and defining possible actions on the one hand, and by the process of choice on the other. ‘Experimental’ subjects dramatically differed from ‘operative’ subjects when confronted with the same scenario, particularly for the less technical but more demanding scenario. It is hypothesised that three components might account for the effect of the situations and for the differences between and within groups of subjects: importance of situation assessment, spatial abilities, and global orientation of activity in managing dynamic risk.  相似文献   

Linguistic Problems with Requirements and Knowledge Elicitation   总被引:1,自引:0,他引:1  
Human and conversational aspects of requirements and knowledge identification are employed to show that requirements ‘engineering’ is not the same as civil engineering or scientific problem solving. Not only can requirements not be made fully explicit at the start of a project, they cannot be made fully explicit at all. A need is identified to enhance computer-based information systems (CBIS) development methods to accommodate: plurality of incommensurable perspectives, languages and agendas; dynamic representations of system features that can be experienced rather than abstracted and forced into an abstract paper-based representation; recognition that CBIS development is in general a continuous process where users changing their minds is a natural and necessary indication or organisational vitality.  It is suggested that prototyping and rapid application development go some way to addressing these requirements but that they require further development in the light of the theoretical light thrown on the nature of the problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号