首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The combination of SGML and database technology allows to refine both declarative and navigational access mechanisms for structured document collection: with regard to declarative access, the user can formulate complex information needs without knowing a query language, the respective document type definition (DTD) or the underlying modelling. Navigational access is eased by hyperlink-rendition mechanisms going beyond plain link-integrity checking. With our approach, the database-internal representation of documents is configurable. It allows for an efficient implementation of operations, because DTD knowledge is not needed for document structure recognition. We show how the number of method invocations and the cost of parsing can be significantly reduced. Edited by Y.C. Tay. Received April 22, 1996 / Accepted March 16, 1997  相似文献   

2.
Developed forms of task analysis allow designers to focus on both utility and usability issues in the development of interactive work systems. The models they generate represent aspects of the human, computer and domain elements of an interactive work system. Many interactive work systems are embedded in an organisational context. Pressure for changes are present in this context and provide impetus to stakeholders to change work tasks and the supporting tools. Interactive work systems also provide evolutionary pressures of their own, changing the very task they were designed to support. One approach to coping with change has been to evolve interactive work systems. Currently none of these techniques place focus on the performance of tasks as central, and consideration of usability is minimal. However, an evolutionary design approach forces an evolutionary experience upon users, and we cannot be sure whether this approach enhances the user’s experience or degrades their performance. Given the strength of task analysis it is likely that it will be applied within evolutionary contexts. Yet, little work has been undertaken to examine whether its role will, or could be different. We ask how we can move task analysis towards being used in a principled manner in the evolution of interactive work systems. This paper examines a number of features of the approach called task knowledge structures that may be useful in evolving interactive work systems. We look at tasks and their representativeness, roles, goals, objects (their attributes, relationships, typicality and centrality) and actions. We present a developing framework for examining other task analysis approaches for their utility in supporting interactive work systems evolution. Finally, we discuss future work within the area of applying task analysis in the evolution of interactive work systems.  相似文献   

3.
This paper looks at how human values influence the reception of technology in organisations. It suggests that we need to know what values are and how value systems evolve in order to manage technological change effectively. This proposition is based on research into the issues surrounding performance measurement as part of an information system, the cognition of which contains many parallels with that of technology. The analysis places human values’ theory within the context of systems thinking, where values are taken as system components, their groupings as systems and the expectations and behaviour produced by them as emergence.  相似文献   

4.
This paper is predicated on requirements analysis as the Achilles heel of information systems development, and accepts that information systems often disappoint. Most design paradigms can be located within a rationalistic framework polarised by requirements analysis and system delivery. Such traditional design paradigms are seen as palliatives that prevent us moving toward more satisfying information systems. It is argued that this rationalistic framework forces us to identify, and attempt to solve, problems that are symptomatic of the approach adopted. A pluralistic framework for information system development is presented which rejects the notions of requirements analysis and system optimality. Participatory design, derived from the field of human computer interaction, is located within this framework and identified as a possible paradigm for information system development. A case study is conducted to assess the benefits of participatory design techniques and to evaluate the extent to which participatory design can overcome the failings of traditional methodologies.  相似文献   

5.
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. Received July 18, 2000 / Accepted October 4, 2001  相似文献   

6.
Dealing with forward and backward jumps in workflow management systems   总被引:1,自引:0,他引:1  
Workflow management systems (WfMS) offer a promising technology for the realization of process-centered application systems. A deficiency of existing WfMS is their inadequate support for dealing with exceptional deviations from the standard procedure. In the ADEPT project, therefore, we have developed advanced concepts for workflow modeling and execution, which aim at the increase of flexibility in WfMS. On the one hand we allow workflow designers to model exceptional execution paths already at buildtime provided that these deviations are known in advance. On the other hand authorized users may dynamically deviate from the pre-modeled workflow at runtime as well in order to deal with unforeseen events. In this paper, we focus on forward and backward jumps needed in this context. We describe sophisticated modeling concepts for capturing deviations in workflow models already at buildtime, and we show how forward and backward jumps (of different semantics) can be correctly applied in an ad-hoc manner during runtime as well. We work out basic requirements, facilities, and limitations arising in this context. Our experiences with applications from different domains have shown that the developed concepts will form a key part of process flexibility in process-centered information systems. Received: 6 October 2002 / Accepted: 8 January 2003 Published online: 27 February 2003 This paper is a revised and extended version of [40]. The described work was partially performed in the research project “Scalability in Adaptive Workflow Management Systems” funded by the Deutsche Forschungsgemeinschaft (DFG).  相似文献   

7.
This paper describes aminimally immersive three-dimensional volumetric interactive information visualization system for management and analysis of document corpora. The system, SFA, uses glyph-based volume rendering, enabling more complex data relationships and information attributes to be visualized than traditional 2D and surface-based visualization systems. Two-handed interaction using three-space magnetic trackers and stereoscopic viewing are combined to produce aminimally immersive interactive system that enhances the user’s three-dimensional perception of the information space. This new system capitalizes on the human visual system’s pre-attentive learning capabilities to quickly analyze the displayed information. SFA is integrated with adocument management and information retrieval engine named Telltale. Together, these systems integrate visualization and document analysis technologies to solve the problem of analyzing large document corpora. We describe the usefulness of this system for the analysis and visualization of document similarity within acorpus of textual documents, and present an example exploring authorship of ancient Biblical texts. Received: 15 December 1997 / Revised: June 1999  相似文献   

8.
Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document ‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules can be formulated based on features which might be observed within one specific layout object. However, rules can also express dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common objects (e.g., lists). Received June 19, 2000 / Revised November 8, 2000  相似文献   

9.
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution. Received March 30, 2000 / Revised June 26, 2001  相似文献   

10.
Document image segmentation is the first step in document image analysis and understanding. One major problem centres on the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid and mismatched regions. Received July 14, 2000 / Revised June 12, 2001[-1mm]  相似文献   

11.
This paper discusses multimedia and hypermedia modeling, authoring and formatting tools, presenting the proposals of the HyperProp system and comparing them to related work. It also highlights several research challenges that still need to be addressed. Moreover, it stresses the importance of document logical structuring and considers the use of compositions in order to represent context relations, synchronization relations, derivation relations and task relations in hypermedia systems. It discusses temporal and spatial synchronization among multimedia objects and briefly presents the HyperProp graphical authoring and formatting tools. Integration between the proposed system and the WWW is also addressed.  相似文献   

12.
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics, images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented our classification scheme using decision tree classifiers and self-organizing maps. Received June 15, 2000 / Revised November 15, 2000  相似文献   

13.
Transforming paper documents into XML format with WISDOM++   总被引:1,自引:1,他引:0  
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported. Received June 15, 2000 / Revised November 7, 2000  相似文献   

14.
The most noticeable characteristic of a construction tender document is that its hierarchical architecture is not obviously expressed but is implied in the citing information. Currently available methods cannot deal with such documents. In this paper, the intra-page and inter-page relationships are analyzed in detail. The creation of citing relationships is essential to extracting the logical structure of tender documents. The hierarchy of tender documents naturally leads to extracting and displaying the logical structure as tree structure. This method is successfully implemented in VHTender, and is the key to the efficiency and flexibility of the whole system. Received February 28, 2000 / Revised October 20, 2000  相似文献   

15.
Binarization of document images with poor contrast, strong noise, complex patterns, and variable modalities in the gray-scale histograms is a challenging problem. A new binarization algorithm has been developed to address this problem for personal cheque images. The main contribution of this approach is optimizing the binarization of a part of the document image that suffers from noise interference, referred to as the Target Sub-Image (TSI), using information easily extracted from another noise-free part of the same image, referred to as the Model Sub-Image (MSI). Simple spatial features extracted from MSI are used as a model for handwriting strokes. This model captures the underlying characteristics of the writing strokes, and is invariant to the handwriting style or content. This model is then utilized to guide the binarization in the TSI. Another contribution is a new technique for the structural analysis of document images, which we call “Wavelet Partial Reconstruction” (WPR). The algorithm was tested on 4,200 cheque images and the results show significant improvement in binarization quality in comparison with other well-established algorithms. Received: October 10, 2001 / Accepted: May 7, 2002 This research was supported in part by NCR and NSERC's industrial postgraduate scholarship No. 239464. A simplified version of this paper has been presented at ICDAR 2001 [3].  相似文献   

16.
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation, and density) of characters and propose a characteristic value for classification using the run-length frequency of the image component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D space according to the area of the bounding box and positional information from the document. We conducted tests with more than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental documents. Received August 3, 2001 / Accepted August 8, 2001  相似文献   

17.
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i) different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach on a large number of good quality, as well as degraded, document images are presented. Received July 12, 2000 / Revised October 1, 2000  相似文献   

18.
Specifications in Context: Stakeholders, Systems and Modelling of Conflict   总被引:1,自引:1,他引:0  
This paper looks from an ethnographic viewpoint at the case of two information systems in a multinational engineering consultancy. It proposes using the rich findings from ethnographic analysis during requirements discovery. The paper shows how context – organisational and social – can be taken into account during an information system development process. Socio-technical approaches are holistic in nature and provide opportunities to produce information systems utilising social science insights, computer science technical competence and psychological approaches. These approaches provide fact-finding methods that are appropriate to system participants’ and organisational stakeholders’ needs.  The paper recommends a method of modelling that results in a computerised information system data model that reflects the conflicting and competing data and multiple perspectives of participants and stakeholders, and that improves interactivity and conflict management.  相似文献   

19.
Shared memory provides a convenient programming model for parallel applications. However, such a model is provided on physically distributed memory systems at the expense of efficiency of execution of the applications. For this reason, applications can give minimum consistency requirements on the memory system, thus allowing alternatives to the shared memory model to be used which exploit the underlying machine more efficiently. To be effective, these requirements need to be specified in a precise way and to be amenable to formal analysis. Most approaches to formally specifying consistency conditions on memory systems have been from the viewpoint of the machine rather than from the application domain.  In this paper we show how requirements on memory systems can be given from the viewpoint of the application domain formally in a first-order theory MemReq, to improve the requirements engineering process for such systems. We show the general use of MemReq in expressing major classes of requirements for memory systems and conduct a case study of the use of MemReq in a real-life parallel system out of which the formalism arose.  相似文献   

20.
With regard to the design of information content in information display, it is often claimed that the abstraction hierarchy (AH) of the work domain should be considered as a basis for identifying and structuring the information content. The primary advantage of AH-based analysis and design is that functionally abstracted information can systematically be identified and provided to the operator, which has rarely been presented in traditional displays. This study evaluated the effectiveness of providing functional information, which was abstracted and represented based on goal–means analysis along the AH, to the operator in two task situations (fault diagnosis and operation). The results showed that the operator’s performance was improved with the high-level information, and the latter’s utility became greater when the goal–means relations between information at different abstraction levels were exhibited. From the results, three design principles for information display can be drawn. First, information should be identified and displayed at multiple abstraction levels. Second, the goal–means relations among the abstraction levels should be explicitly presented, especially for analytical cognitive tasks. Third, information layout should support information integration along decomposition structure within an abstraction level as well as along abstraction levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号