共查询到20条相似文献,搜索用时 15 毫秒
1.
Sreangsu Acharyya Sumit Negi L. Venkata Subramaniam Shourya Roy 《International Journal on Document Analysis and Recognition》2009,12(3):175-184
Noise in textual data such as those introduced by multilinguality, misspellings, abbreviations, deletions, phonetic spellings,
non-standard transliteration, etc. pose considerable problems for text-mining. Such corruptions are very common in instant
messenger and short message service data and they adversely affect off-the-shelf text mining methods. Most techniques address
this problem by supervised methods by making use of hand labeled corrections. But they require human generated labels and
corrections that are very expensive and time consuming to obtain because of multilinguality and complexity of the corruptions.
While we do not champion unsupervised methods over supervised when quality of results is the singular concern, we demonstrate
that unsupervised methods can provide cost effective results without the need for expensive human intervention that is necessary
to generate a parallel labeled corpora. A generative model based unsupervised technique is presented that maps non-standard
words to their corresponding conventional frequent form. A hidden Markov model (HMM) over a “subsequencized” representation
of words is used, where a word is represented as a bag of weighted subsequences. The approximate maximum likelihood inference
algorithm used is such that the training phase involves clustering over vectors and not the customary and expensive dynamic
programming (Baum–Welch algorithm) over sequences that is necessary for HMMs. A principled transformation of maximum likelihood
based “central clustering” cost function of Baum–Welch into a “pairwise similarity” based clustering is proposed. This transformation
makes it possible to apply “subsequence kernel” based methods that model delete and insert corruptions well. The novelty of
this approach lies in that the expensive (Baum–Welch) iterations required for HMM, can be avoided through an approximation
of the loglikelihood function and by establishing a connection between the loglikelihood and a pairwise distance. Anecdotal
evidence of efficacy is provided on public and proprietary data. 相似文献
2.
Research in automatic text plagiarism detection focuses on algorithms that compare suspicious documents against a collection
of reference documents. Recent approaches perform well in identifying copied or modified foreign sections, but they assume
a closed world where a reference collection is given. This article investigates the question whether plagiarism can be detected
by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available
in digital form. We call this problem class intrinsic plagiarism analysis; it is closely related to the problem of authorship verification. Our contributions are threefold. (1) We organize the algorithmic
building blocks for intrinsic plagiarism analysis and authorship verification and survey the state of the art. (2) We show
how the meta learning approach of Koppel and Schler, termed “unmasking”, can be employed to post-process unreliable stylometric
analysis results. (3) We operationalize and evaluate an analysis chain that combines document chunking, style model computation,
one-class classification, and meta learning. 相似文献
3.
A method to obtain a code representation of handwritten signatures is described and an algorithm for signature verification
based on such representations is proposed. Results of tests to determine efficient methods of image compression for the purpose
of signature verification are presented.
Konstantin Alekseev. Born 1979. Received Master’s degree in engineering and technology (Radioengineering) in 2002. Currently post-graduate student
at St. Petersburg State Electrotechnical University “LETI”, chair of television and video. Scientific interests: digital image
processing and pattern recognition. Author of three papers.
Svetlana Egorova. Born 1931. Graduated from St. Petersburg State Electrotechnical University “LETI” in 1955, received Candidates degree (Eng.)
in 1965; since 1968 a senior lecturer at the chair of television and video, St. Petersburg State Electrotechnical University
“LETI”. Scientific interests: optical and digital image processing and compression methods in signal processing. Author of
141 papers. 相似文献
4.
G. G. Stetsyura 《Automation and Remote Control》2012,73(5):852-861
Methods of synchronizing interaction of the digital devices of distributed systems with the use of a common center relaying the signals from the devices were proposed. They are mostly intended to perform operations like “all-to-all,” “all-to-one,” and “one-to-all.” The center substantially accelerates synchronization and improves efficiency of the communication facilities interconnecting the devices. 相似文献
5.
Martin Grohe Yuri Gurevich Dirk Leinders Nicole Schweikardt Jerzy Tyszkiewicz Jan Van den Bussche 《Theory of Computing Systems》2009,44(4):533-560
We introduce a new abstract model of database query processing, finite cursor machines, that incorporates certain data streaming aspects. The model describes quite faithfully what happens in so-called “one-pass”
and “two-pass query processing”. Technically, the model is described in the framework of abstract state machines. Our main
results are upper and lower bounds for processing relational algebra queries in this model, specifically, queries of the semijoin
fragment of the relational algebra. 相似文献
6.
7.
Ulrich Reffle Annette Gotscharek Christoph Ringlstetter Klaus U. Schulz 《International Journal on Document Analysis and Recognition》2009,12(3):165-174
The detection and correction of false friends—also called real-word errors—is a notoriously difficult problem. On realistic
data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections
outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the
given text. During the correction process, the profile (1) helps to restrict attention to a small set of “suspicious” lexical
tokens of the input text where it is “plausible” to assume that the token represents a false friend. In this way, recognition
of false friends is improved. Furthermore, the profile (2) helps to isolate the “most promising” correction suggestion for
“suspicious” tokens. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can
be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully
automatic correction of false friends. 相似文献
8.
V. V. Kassandrov 《Gravitation and Cosmology》2009,15(3):213-219
Making use of the Kerr theorem for shear-free null congruences and of Newman’s representation for a virtual charge “moving”
in complex space-time, we obtain an axisymmetric time-dependent generalization of the Kerr congruence, with a singular ring
uniformly contracting to a point and expanding then to infinity. Electromagnetic and complex eikonal field distributions are
naturally associated with the obtained congruence, with electric charge being necessarily unit (“elementary”). 相似文献
9.
In order to be able to draw inferences about real world phenomena from a representation expressed in a digital computer, it
is essential that the representation should have a rigorously correct algebraic structure. It is also desirable that the underlying
algebra be familiar, and provide a close modelling of those phenomena. The fundamental problem addressed in this paper is
that, since computers do not support real-number arithmetic, the algebraic behaviour of the representation may not be correct,
and cannot directly model a mathematical abstraction of space based on real numbers. This paper describes a basis for the
robust geometrical construction of spatial objects in computer applications using a complex called the “Regular Polytope”.
In contrast to most other spatial data types, this definition supports a rigorous logic within a finite digital arithmetic.
The definition of connectivity proves to be non-trivial, and alternatives are investigated. It is shown that these alternatives
satisfy the relations of a region connection calculus (RCC) as used for qualitative spatial reasoning, and thus introduce
the rigor of that reasoning to geographical information systems. They also form what can reasonably be termed a “Finite Boolean
Connection Algebra”. The rigorous and closed nature of the algebra ensures that these primitive functions and predicates can
be combined to any desired level of complexity, and thus provide a useful toolkit for data retrieval and analysis. The paper
argues for a model with two and three-dimensional objects that have been coded in Java and which implement a full set of topological
and connectivity functions which is shown to be complete and rigorous. 相似文献
10.
P. Howard Patrick 《Language Resources and Evaluation》1974,8(5-6):321-331
Conclusion The program is adequate testimony that the I.M.L.—M.I.R. system can handle complicated musical procedures, and that furthermore,
the present computer staff-format can be easily modified to print “normal” music symbols once music type-bars can be added
to the printer. 相似文献
11.
“There will always (I hope) be print books, but just as the advent of photography changed the role of painting or film changed
the role of theater in our culture, electronic publishing is changing the world of print media. To look for a one-to-one transposition
to the new medium is to miss the future until it has passed you by.”—Tim O’Reilly (2002). It is not hard to envisage that publishers will leverage subscribers’ information, interest groups’ shared knowledge and
others sources to enhance their publications. While this enhances the value of the publication through more accurate and personalized
content, it also brings a new set of challenges to the publisher. Content is now driven by web and in a truly automated system,
that is, no designer “re-touch” intervention is envisaged. This paper introduces an exploratory mapping strategy to allocate
web driven content in a highly graphical publication like a traditional magazine. Two major aspects of the mapping are covered,
those enable different level of flexibility and address different content flowing strategies. The last contribution is an
evaluation of existing standards, which potentially can leverage this work to incorporate flexible mapping, and subsequently,
composition capabilities. The work published here is an extended version of the article presented at the Eight ACM Symposium
on Document Engineering in fall 2008 (Giannetti 2008). 相似文献
12.
Ntovros Vasileios 《Nexus Network Journal》2009,11(3):471-488
This paper proposes a “reading” of the church of San Lorenzo in Turin, designed by Guarino Guarini, through the philosophical
notion of “fold” introduced by Gilles Deleuze. The paper consists of two parts. The first part contains an exploration of
the notion of “fold” in architecture and in philosophy and examines the use of the fold in the theory of Baroque architecture
as well as the range of this new tool in architectural practise in contemporary architecture and in philosophy and examines
the use of the fold as fundamental condition for understanding Baroque era. The second part contains the application of the
notion of fold as a philosophical and conceptual framework for the “reading” of the chapel. 相似文献
13.
Dimitrios K. Tsolis Spyros Sioutas Theodore S. Papatheodorou 《Multimedia Tools and Applications》2010,47(3):581-597
The current work is focused on the implementation of a robust multimedia application for watermarking digital images, which
is based on an innovative spread spectrum analysis algorithm for watermark embedding and on a content-based image retrieval
technique for watermark detection. The existing highly robust watermark algorithms are applying “detectable watermarks” for
which a detection mechanism checks if the watermark exists or not (a Boolean decision) based on a watermarking key. The problem
is that the detection of a watermark in a digital image library containing thousands of images means that the watermark detection
algorithm is necessary to apply all the keys to the digital images. This application is non-efficient for very large image
databases. On the other hand “readable” watermarks may prove weaker but easier to detect as only the detection mechanism is
required. The proposed watermarking algorithm combine’s the advantages of both “detectable” and “readable” watermarks. The
result is a fast and robust multimedia application which has the ability to cast readable multibit watermarks into digital
images. The watermarking application is capable of hiding 214 different keys into digital images and casting multiple zero-bit watermarks onto the same coefficient area while maintaining
a sufficient level of robustness. 相似文献
14.
Vincent C. Müller 《Minds and Machines》2007,17(1):101-115
This paper investigates the prospects of Rodney Brooks’ proposal for AI without representation. It turns out that the supposedly
characteristic features of “new AI” (embodiment, situatedness, absence of reasoning, and absence of representation) are all
present in conventional systems: “New AI” is just like old AI. Brooks proposal boils down to the architectural rejection of
central control in intelligent agents—Which, however, turns out to be crucial. Some of more recent cognitive science suggests
that we might do well to dispose of the image of intelligent agents as central representation processors. If this paradigm
shift is achieved, Brooks’ proposal for cognition without representation appears promising for full-blown intelligent agents—Though
not for conscious agents. 相似文献
15.
A general formal model for trust in dynamic networks is presented. The model is based on the trust structures of Carbone,
Nielsen and Sassone: a domain theoretic generalisation of Weeks’ framework for credential based trust management systems,
e.g., KeyNote and SPKI. Collections of mutually referring trust policies (so-called “webs” of trust) are given a precise meaning
in terms of an abstract domain-theoretic semantics. A complementary concrete operational semantics is provided using the well-known
I/O-automaton model. The operational semantics is proved to adhere to the abstract semantics, effectively providing a distributed
algorithm allowing principals to compute the meaning of a “web” of trust policies. Several techniques allowing sound and efficient
distributed approximation of the abstract semantics are presented and proved correct.
BRICS: Basic Research in Computer Science (www.brics.dk) funded by the Panish National Research Foundation. 相似文献
16.
A labelling approach for the automatic recognition of tables of contents (ToC) is described in this paper. A prototype is
used for the electronic consulting of scientific papers in a digital library system named Calliope. This method operates on
a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labelling without using any a
priori model. Labelling is based on part-of-speech tagging (PoS) which is initiated by a primary labelling of text components
using some specific dictionaries. Significant tags are first grouped into homogeneous classes according to their grammar categories
and then reduced in canonical forms corresponding to article fields: “title” and “authors”. Non-labelled tokens are integrated
in one or another field by either applying PoS correction rules or using a structure model generated from well-detected articles.
The designed prototype operates very well on different ToC layouts and character recognition qualities. Without manual intervention,
a 96.3% rate of correct segmentation was obtained on 38 journals, including 2,020 articles, accompanied by a 93.0% rate of
correct field extraction.
Received April 5, 2000 / Revised February 19, 2001 相似文献
17.
Michael L. Nelson Frank McCown Joan A. Smith Martin Klein 《International Journal on Digital Libraries》2007,6(4):327-349
To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved
from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject
to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications).
For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation. However,
due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach
is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what
is not saved. We provide an overview of our ongoing research projects that focus on using the “web infrastructure” to provide
preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval.
The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation
capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they
focus on providing at least some form of archival capability for the mass of web pages that may prove to have value in the
future. We characterize the preservation approaches by the level of effort required by the web administrator: web sites are
reconstructed from the caches of search engines (“lazy preservation”); lexical signatures are used to find the same or similar
pages elsewhere on the web (“just-in-time preservation”); resources are pushed to other sites using NNTP newsgroups and SMTP
email attachments (“shared infrastructure preservation”); and an Apache module is used to provide OAI-PMH access to MPEG-21
DIDL representations of web pages (“web server enhanced preservation”). 相似文献
18.
Summary Equivalence is a fundamental notion for the semantic analysis of algebraic specifications. In this paper the notion of “crypt-equivalence”
is introduced and studied w.r.t. two “loose” approaches to the semantics of an algebraic specificationT: the class of all first-order models ofT and the class of all term-generated models ofT. Two specifications are called crypt-equivalent if for one specification there exists a predicate logic formula which implicitly
defines an expansion (by new functions) of every model of that specification in such a way that the expansion (after forgetting
unnecessary functions) is homologous to a model of the other specification, and if vice versa there exists another predicate
logic formula with the same properties for the other specification. We speak of “first-order crypt-equivalence” if this holds
for all first-order models, and of “inductive crypt-equivalence” if this holds for all term-generated models. Characterizations
and structural properties of these notions are studied. In particular, it is shown that firstorder crypt-equivalence is equivalent
to the existence of explicit definitions and that in case of “positive definability” two first-order crypt-equivalent specifications
admit the same categories of models and homomorphisms. Similarly, two specifications which are inductively crypt-equivalent
via sufficiently complete implicit definitions determine the same associated categories. Moreover, crypt-equivalence is compared
with other notions of equivalence for algebraic specifications: in particular, it is shown that first-order cryptequivalence
is strictly coarser than “abstract semantic equivalence” and that inductive crypt-equivalence is strictly finer than “inductive
simulation equivalence” and “implementation equivalence”. 相似文献
19.
Matteo Colombo 《Minds and Machines》2010,20(2):183-202
According to John Haugeland, the capacity for “authentic intentionality” depends on a commitment to constitutive standards
of objectivity. One of the consequences of Haugeland’s view is that a neurocomputational explanation cannot be adequate to
understand “authentic intentionality”. This paper gives grounds to resist such a consequence. It provides the beginning of
an account of authentic intentionality in terms of neurocomputational enabling conditions. It argues that the standards, which
constitute the domain of objects that can be represented, reflect the statistical structure of the environments where brain
sensory systems evolved and develop. The objection that I equivocate on what Haugeland means by “commitment to standards”
is rebutted by introducing the notion of “florid, self-conscious representing”. Were the hypothesis presented plausible, computational
neuroscience would offer a promising framework for a better understanding of the conditions for meaningful representation. 相似文献
20.
Nian-Shing Chen Daniel Chia-En Teng Cheng-Han Lee Kinshuk 《Computers & Education》2011,57(2):1705-1715
Comprehension is the goal of reading. However, students often encounter reading difficulties due to the lack of background knowledge and proper reading strategy. Unfortunately, print text provides very limited assistance to one’s reading comprehension through its static knowledge representations such as symbols, charts, and graphs. Integrating digital materials and reading strategy into paper-based reading activities may bring opportunities for learners to make meaning of the print material. In this study, QR codes were adopted in association with mobile technology to deliver supplementary materials and questions to support students’ reading. QR codes were printed on paper prints to provide direct access to digital materials and scaffolded questions. Smartphones were used to scan the printed QR codes to fetch pre-designed digital resources and scaffolded questions over the Internet. A quasi-experiment was conducted to evaluate the effectiveness of direct access to the digital materials prepared by the instructor using QR codes and that of scaffolded questioning in improving students’ reading comprehension. The results suggested that direct access to digital resources using QR codes does not significantly influence students’ reading comprehension; however, the reading strategy of scaffolded questioning significantly improves students’ understanding about the text. The survey showed that most students agreed that the integrated print-and-digital-material- based learning system benefits English reading comprehension but may not be as efficient as expected. The implications of the findings shed light on future improvement of the system. 相似文献