共查询到20条相似文献,搜索用时 31 毫秒
1.
Structured document storage and refined declarative and navigational access mechanisms in HyperStorM 总被引:2,自引:0,他引:2
Klemens Böhm Karl Aberer Erich J. Neuhold Xiaoya Yang 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):296-311
The combination of SGML and database technology allows to refine both declarative and navigational access mechanisms for
structured document collection: with regard to declarative access, the user can formulate complex information needs without
knowing a query language, the respective document type definition (DTD) or the underlying modelling. Navigational access is
eased by hyperlink-rendition mechanisms going beyond plain link-integrity checking. With our approach, the database-internal
representation of documents is configurable. It allows for an efficient implementation of operations, because DTD knowledge
is not needed for document structure recognition. We show how the number of method invocations and the cost of parsing can
be significantly reduced.
Edited by Y.C. Tay. Received April 22, 1996 / Accepted March 16, 1997 相似文献
2.
Developed forms of task analysis allow designers to focus on both utility and usability issues in the development of interactive
work systems. The models they generate represent aspects of the human, computer and domain elements of an interactive work
system. Many interactive work systems are embedded in an organisational context. Pressure for changes are present in this
context and provide impetus to stakeholders to change work tasks and the supporting tools. Interactive work systems also provide
evolutionary pressures of their own, changing the very task they were designed to support. One approach to coping with change
has been to evolve interactive work systems. Currently none of these techniques place focus on the performance of tasks as
central, and consideration of usability is minimal. However, an evolutionary design approach forces an evolutionary experience
upon users, and we cannot be sure whether this approach enhances the user’s experience or degrades their performance. Given
the strength of task analysis it is likely that it will be applied within evolutionary contexts. Yet, little work has been
undertaken to examine whether its role will, or could be different. We ask how we can move task analysis towards being used
in a principled manner in the evolution of interactive work systems. This paper examines a number of features of the approach
called task knowledge structures that may be useful in evolving interactive work systems. We look at tasks and their representativeness,
roles, goals, objects (their attributes, relationships, typicality and centrality) and actions. We present a developing framework
for examining other task analysis approaches for their utility in supporting interactive work systems evolution. Finally,
we discuss future work within the area of applying task analysis in the evolution of interactive work systems. 相似文献
3.
M. Hebel 《Cognition, Technology & Work》2000,2(2):106-115
This paper looks at how human values influence the reception of technology in organisations. It suggests that we need to
know what values are and how value systems evolve in order to manage technological change effectively. This proposition is
based on research into the issues surrounding performance measurement as part of an information system, the cognition of which
contains many parallels with that of technology. The analysis places human values’ theory within the context of systems thinking,
where values are taken as system components, their groupings as systems and the expectations and behaviour produced by them
as emergence. 相似文献
4.
The Importance of Context in Information System Design: An Assessment of Participatory Design 总被引:2,自引:0,他引:2
This paper is predicated on requirements analysis as the Achilles heel of information systems development, and accepts that
information systems often disappoint. Most design paradigms can be located within a rationalistic framework polarised by requirements
analysis and system delivery. Such traditional design paradigms are seen as palliatives that prevent us moving toward more
satisfying information systems. It is argued that this rationalistic framework forces us to identify, and attempt to solve,
problems that are symptomatic of the approach adopted. A pluralistic framework for information system development is presented
which rejects the notions of requirements analysis and system optimality. Participatory design, derived from the field of
human computer interaction, is located within this framework and identified as a possible paradigm for information system
development. A case study is conducted to assess the benefits of participatory design techniques and to evaluate the extent
to which participatory design can overcome the failings of traditional methodologies. 相似文献
5.
J. Hu R.S. Kashi D. Lopresti G.T. Wilfong 《International Journal on Document Analysis and Recognition》2002,4(3):140-153
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition
have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a
fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity.
In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and
table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies
work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield
various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely
(deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined
to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results
of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed
acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by
the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could
be applied to other document recognition tasks as well.
Received July 18, 2000 / Accepted October 4, 2001 相似文献
6.
Workflow management systems (WfMS) offer a promising technology for the realization of process-centered application systems.
A deficiency of existing WfMS is their inadequate support for dealing with exceptional deviations from the standard procedure.
In the ADEPT project, therefore, we have developed advanced concepts for workflow modeling and execution, which aim at the
increase of flexibility in WfMS. On the one hand we allow workflow designers to model exceptional execution paths already
at buildtime provided that these deviations are known in advance. On the other hand authorized users may dynamically deviate
from the pre-modeled workflow at runtime as well in order to deal with unforeseen events. In this paper, we focus on forward
and backward jumps needed in this context. We describe sophisticated modeling concepts for capturing deviations in workflow
models already at buildtime, and we show how forward and backward jumps (of different semantics) can be correctly applied
in an ad-hoc manner during runtime as well. We work out basic requirements, facilities, and limitations arising in this context.
Our experiences with applications from different domains have shown that the developed concepts will form a key part of process
flexibility in process-centered information systems.
Received: 6 October 2002 / Accepted: 8 January 2003
Published online: 27 February 2003
This paper is a revised and extended version of [40]. The described work was partially performed in the research project “Scalability
in Adaptive Workflow Management Systems” funded by the Deutsche Forschungsgemeinschaft (DFG). 相似文献
7.
Christopher D. Shaw James M. Kukla Ian Soboroff David S. Ebert Charles K. Nicholas Amen Zwa Ethan L. Miller D. Aaron Roberts 《International Journal on Digital Libraries》1999,2(2-3):144-156
This paper describes aminimally immersive three-dimensional volumetric interactive information visualization system for management
and analysis of document corpora. The system, SFA, uses glyph-based volume rendering, enabling more complex data relationships
and information attributes to be visualized than traditional 2D and surface-based visualization systems. Two-handed interaction
using three-space magnetic trackers and stereoscopic viewing are combined to produce aminimally immersive interactive system
that enhances the user’s three-dimensional perception of the information space. This new system capitalizes on the human visual
system’s pre-attentive learning capabilities to quickly analyze the displayed information. SFA is integrated with adocument
management and information retrieval engine named Telltale. Together, these systems integrate visualization and document analysis
technologies to solve the problem of analyzing large document corpora. We describe the usefulness of this system for the analysis
and visualization of document similarity within acorpus of textual documents, and present an example exploring authorship
of ancient Biblical texts.
Received: 15 December 1997 / Revised: June 1999 相似文献
8.
Stefan Klink Thomas Kieninger 《International Journal on Document Analysis and Recognition》2001,4(1):18-26
Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document
‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid
in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are
the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules
can be formulated based on features which might be observed within one specific layout object. However, rules can also express
dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to
specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common
objects (e.g., lists).
Received June 19, 2000 / Revised November 8, 2000 相似文献
9.
E. Appiani F. Cesarini A.M. Colla M. Diligenti M. Gori S. Marinai G. Soda 《International Journal on Document Analysis and Recognition》2001,4(2):69-83
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described.
This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes.
The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically
index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled
users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying
reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents
automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to
dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning
passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing
strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining
to the specific document class. Experimental results are encouraging overall; in particular, document classification results
fulfill the requirements of high-volume application. Integration into production lines is under execution.
Received March 30, 2000 / Revised June 26, 2001 相似文献
10.
Amit Kumar Das Sanjoy Kumar Saha Bhabatosh Chanda 《International Journal on Document Analysis and Recognition》2002,4(3):183-190
Document image segmentation is the first step in document image analysis and understanding. One major problem centres on
the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the
Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but
some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms
of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation
algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also
produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid
and mismatched regions.
Received July 14, 2000 / Revised June 12, 2001[-1mm] 相似文献
11.
Luiz Fernando G. Soares Rogério F. Rodrigues Débora C. Muchaluat Saade 《Multimedia Systems》2000,8(2):118-134
This paper discusses multimedia and hypermedia modeling, authoring and formatting tools, presenting the proposals of the
HyperProp system and comparing them to related work. It also highlights several research challenges that still need to be
addressed. Moreover, it stresses the importance of document logical structuring and considers the use of compositions in order
to represent context relations, synchronization relations, derivation relations and task relations in hypermedia systems.
It discusses temporal and spatial synchronization among multimedia objects and briefly presents the HyperProp graphical authoring
and formatting tools. Integration between the proposed system and the WWW is also addressed. 相似文献
12.
Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout
of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific
models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building
a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics,
images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and
statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels
for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative
page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented
our classification scheme using decision tree classifiers and self-organizing maps.
Received June 15, 2000 / Revised November 15, 2000 相似文献
13.
Transforming paper documents into XML format with WISDOM++ 总被引:1,自引:1,他引:0
Oronzo Altamura Floriana Esposito Donato Malerba 《International Journal on Document Analysis and Recognition》2001,4(1):2-17
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires
solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems.
In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means
of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents
in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps,
namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps:
document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation,
the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general
layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of
the system components implementing these innovative aspects is reported.
Received June 15, 2000 / Revised November 7, 2000 相似文献
14.
Shuhua Wang Yang Cao Shijie Cai 《International Journal on Document Analysis and Recognition》2001,4(1):27-34
The most noticeable characteristic of a construction tender document is that its hierarchical architecture is not obviously
expressed but is implied in the citing information. Currently available methods cannot deal with such documents. In this paper,
the intra-page and inter-page relationships are analyzed in detail. The creation of citing relationships is essential to extracting
the logical structure of tender documents. The hierarchy of tender documents naturally leads to extracting and displaying
the logical structure as tree structure. This method is successfully implemented in VHTender, and is the key to the efficiency
and flexibility of the whole system.
Received February 28, 2000 / Revised October 20, 2000 相似文献
15.
Amer Dawoud Mohamed Kamel 《International Journal on Document Analysis and Recognition》2002,5(1):28-38
Binarization of document images with poor contrast, strong noise, complex patterns, and variable modalities in the gray-scale
histograms is a challenging problem. A new binarization algorithm has been developed to address this problem for personal
cheque images. The main contribution of this approach is optimizing the binarization of a part of the document image that
suffers from noise interference, referred to as the Target Sub-Image (TSI), using information easily extracted from another
noise-free part of the same image, referred to as the Model Sub-Image (MSI). Simple spatial features extracted from MSI are
used as a model for handwriting strokes. This model captures the underlying characteristics of the writing strokes, and is
invariant to the handwriting style or content. This model is then utilized to guide the binarization in the TSI. Another contribution
is a new technique for the structural analysis of document images, which we call “Wavelet Partial Reconstruction” (WPR). The
algorithm was tested on 4,200 cheque images and the results show significant improvement in binarization quality in comparison
with other well-established algorithms.
Received: October 10, 2001 / Accepted: May 7, 2002
This research was supported in part by NCR and NSERC's industrial postgraduate scholarship No. 239464.
A simplified version of this paper has been presented at ICDAR 2001 [3]. 相似文献
16.
Hwan-Chul Park Se-Young Ok Young-Jung Yu Hwan-Gue Cho 《International Journal on Document Analysis and Recognition》2001,4(2):115-130
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer
vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In
this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm
for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation,
and density) of characters and propose a characteristic value for classification using the run-length frequency of the image
component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal
or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved
lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification
and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D
space according to the area of the bounding box and positional information from the document. We conducted tests with more
than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental
results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental
documents.
Received August 3, 2001 / Accepted August 8, 2001 相似文献
17.
B.B. Chaudhuri U. Garain 《International Journal on Document Analysis and Recognition》2001,3(3):138-149
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered.
It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital
style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these
styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination
of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized
text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i)
different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach
on a large number of good quality, as well as degraded, document images are presented.
Received July 12, 2000 / Revised October 1, 2000 相似文献
18.
This paper looks from an ethnographic viewpoint at the case of two information systems in a multinational engineering consultancy.
It proposes using the rich findings from ethnographic analysis during requirements discovery. The paper shows how context
– organisational and social – can be taken into account during an information system development process. Socio-technical
approaches are holistic in nature and provide opportunities to produce information systems utilising social science insights,
computer science technical competence and psychological approaches. These approaches provide fact-finding methods that are
appropriate to system participants’ and organisational stakeholders’ needs.
The paper recommends a method of modelling that results in a computerised information system data model that reflects the
conflicting and competing data and multiple perspectives of participants and stakeholders, and that improves interactivity
and conflict management. 相似文献
19.
Shared memory provides a convenient programming model for parallel applications. However, such a model is provided on physically
distributed memory systems at the expense of efficiency of execution of the applications. For this reason, applications can
give minimum consistency requirements on the memory system, thus allowing alternatives to the shared memory model to be used
which exploit the underlying machine more efficiently. To be effective, these requirements need to be specified in a precise
way and to be amenable to formal analysis. Most approaches to formally specifying consistency conditions on memory systems
have been from the viewpoint of the machine rather than from the application domain.
In this paper we show how requirements on memory systems can be given from the viewpoint of the application domain formally
in a first-order theory MemReq, to improve the requirements engineering process for such systems. We show the general use of MemReq in expressing major classes of requirements for memory systems and conduct a case study of the use of MemReq in a real-life parallel system out of which the formalism arose. 相似文献
20.
With regard to the design of information content in information display, it is often claimed that the abstraction hierarchy
(AH) of the work domain should be considered as a basis for identifying and structuring the information content. The primary
advantage of AH-based analysis and design is that functionally abstracted information can systematically be identified and
provided to the operator, which has rarely been presented in traditional displays. This study evaluated the effectiveness
of providing functional information, which was abstracted and represented based on goal–means analysis along the AH, to the
operator in two task situations (fault diagnosis and operation). The results showed that the operator’s performance was improved
with the high-level information, and the latter’s utility became greater when the goal–means relations between information
at different abstraction levels were exhibited. From the results, three design principles for information display can be drawn.
First, information should be identified and displayed at multiple abstraction levels. Second, the goal–means relations among
the abstraction levels should be explicitly presented, especially for analytical cognitive tasks. Third, information layout
should support information integration along decomposition structure within an abstraction level as well as along abstraction
levels. 相似文献