期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Language independent unsupervised learning of short message service dialect

Sreangsu Acharyya Sumit Negi L. Venkata Subramaniam Shourya Roy 《International Journal on Document Analysis and Recognition》2009,12(3):175-184

Noise in textual data such as those introduced by multilinguality, misspellings, abbreviations, deletions, phonetic spellings, non-standard transliteration, etc. pose considerable problems for text-mining. Such corruptions are very common in instant messenger and short message service data and they adversely affect off-the-shelf text mining methods. Most techniques address this problem by supervised methods by making use of hand labeled corrections. But they require human generated labels and corrections that are very expensive and time consuming to obtain because of multilinguality and complexity of the corruptions. While we do not champion unsupervised methods over supervised when quality of results is the singular concern, we demonstrate that unsupervised methods can provide cost effective results without the need for expensive human intervention that is necessary to generate a parallel labeled corpora. A generative model based unsupervised technique is presented that maps non-standard words to their corresponding conventional frequent form. A hidden Markov model (HMM) over a “subsequencized” representation of words is used, where a word is represented as a bag of weighted subsequences. The approximate maximum likelihood inference algorithm used is such that the training phase involves clustering over vectors and not the customary and expensive dynamic programming (Baum–Welch algorithm) over sequences that is necessary for HMMs. A principled transformation of maximum likelihood based “central clustering” cost function of Baum–Welch into a “pairwise similarity” based clustering is proposed. This transformation makes it possible to apply “subsequence kernel” based methods that model delete and insert corruptions well. The novelty of this approach lies in that the expensive (Baum–Welch) iterations required for HMM, can be avoided through an approximation of the loglikelihood function and by establishing a connection between the loglikelihood and a pairwise distance. Anecdotal evidence of efficacy is provided on public and proprietary data. 相似文献

2.

Intrinsic plagiarism analysis

Benno Stein Nedim Lipka Peter Prettenhofer 《Language Resources and Evaluation》2011,45(1):63-82

Research in automatic text plagiarism detection focuses on algorithms that compare suspicious documents against a collection of reference documents. Recent approaches perform well in identifying copied or modified foreign sections, but they assume a closed world where a reference collection is given. This article investigates the question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form. We call this problem class intrinsic plagiarism analysis; it is closely related to the problem of authorship verification. Our contributions are threefold. (1) We organize the algorithmic building blocks for intrinsic plagiarism analysis and authorship verification and survey the state of the art. (2) We show how the meta learning approach of Koppel and Schler, termed “unmasking”, can be employed to post-process unreliable stylometric analysis results. (3) We operationalize and evaluate an analysis chain that combines document chunking, style model computation, one-class classification, and meta learning. 相似文献

3.

Handwritten signature verification based on code representation

K. V. Alekseev S. D. Egorova 《Pattern Recognition and Image Analysis》2007,17(4):487-492

A method to obtain a code representation of handwritten signatures is described and an algorithm for signature verification based on such representations is proposed. Results of tests to determine efficient methods of image compression for the purpose of signature verification are presented. Konstantin Alekseev. Born 1979. Received Master’s degree in engineering and technology (Radioengineering) in 2002. Currently post-graduate student at St. Petersburg State Electrotechnical University “LETI”, chair of television and video. Scientific interests: digital image processing and pattern recognition. Author of three papers. Svetlana Egorova. Born 1931. Graduated from St. Petersburg State Electrotechnical University “LETI” in 1955, received Candidates degree (Eng.) in 1965; since 1968 a senior lecturer at the chair of television and video, St. Petersburg State Electrotechnical University “LETI”. Scientific interests: optical and digital image processing and compression methods in signal processing. Author of 141 papers. 相似文献

4.

Synchronization of interacting digital devices with the use of a signal relay center

G. G. Stetsyura 《Automation and Remote Control》2012,73(5):852-861

Methods of synchronizing interaction of the digital devices of distributed systems with the use of a common center relaying the signals from the devices were proposed. They are mostly intended to perform operations like “all-to-all,” “all-to-one,” and “one-to-all.” The center substantially accelerates synchronization and improves efficiency of the communication facilities interconnecting the devices. 相似文献

5.

Database Query Processing Using Finite Cursor Machines

Martin Grohe Yuri Gurevich Dirk Leinders Nicole Schweikardt Jerzy Tyszkiewicz Jan Van den Bussche 《Theory of Computing Systems》2009,44(4):533-560

We introduce a new abstract model of database query processing, finite cursor machines, that incorporates certain data streaming aspects. The model describes quite faithfully what happens in so-called “one-pass” and “two-pass query processing”. Technically, the model is described in the framework of abstract state machines. Our main results are upper and lower bounds for processing relational algebra queries in this model, specifically, queries of the semijoin fragment of the relational algebra. 相似文献

6.

GraSSML: accessible smart schematic diagrams for all

Z. Ben Fredj D. A. Duce 《Universal Access in the Information Society》2007,6(3):233-247

相似文献

7.

Successfully detecting and correcting false friends using channel profiles

Ulrich Reffle Annette Gotscharek Christoph Ringlstetter Klaus U. Schulz 《International Journal on Document Analysis and Recognition》2009,12(3):165-174

The detection and correction of false friends—also called real-word errors—is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile (1) helps to restrict attention to a small set of “suspicious” lexical tokens of the input text where it is “plausible” to assume that the token represents a false friend. In this way, recognition of false friends is improved. Furthermore, the profile (2) helps to isolate the “most promising” correction suggestion for “suspicious” tokens. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends. 相似文献

8.

Connectivity in the regular polytope representation

Rodney?James?Thompson Email author Peter?van?Oosterom 《GeoInformatica》2011,15(2):223-246

In order to be able to draw inferences about real world phenomena from a representation expressed in a digital computer, it is essential that the representation should have a rigorously correct algebraic structure. It is also desirable that the underlying algebra be familiar, and provide a close modelling of those phenomena. The fundamental problem addressed in this paper is that, since computers do not support real-number arithmetic, the algebraic behaviour of the representation may not be correct, and cannot directly model a mathematical abstraction of space based on real numbers. This paper describes a basis for the robust geometrical construction of spatial objects in computer applications using a complex called the “Regular Polytope”. In contrast to most other spatial data types, this definition supports a rigorous logic within a finite digital arithmetic. The definition of connectivity proves to be non-trivial, and alternatives are investigated. It is shown that these alternatives satisfy the relations of a region connection calculus (RCC) as used for qualitative spatial reasoning, and thus introduce the rigor of that reasoning to geographical information systems. They also form what can reasonably be termed a “Finite Boolean Connection Algebra”. The rigorous and closed nature of the algebra ensures that these primitive functions and predicates can be combined to any desired level of complexity, and thus provide a useful toolkit for data retrieval and analysis. The paper argues for a model with two and three-dimensional objects that have been coded in Java and which implement a full set of topological and connectivity functions which is shown to be complete and rigorous. 相似文献

9.

A nonstationary generalization of the Kerr congruence

V. V. Kassandrov 《Gravitation and Cosmology》2009,15(3):213-219

Making use of the Kerr theorem for shear-free null congruences and of Newman’s representation for a virtual charge “moving” in complex space-time, we obtain an axisymmetric time-dependent generalization of the Kerr congruence, with a singular ring uniformly contracting to a point and expanding then to infinity. Electromagnetic and complex eikonal field distributions are naturally associated with the obtained congruence, with electric charge being necessarily unit (“elementary”). 相似文献

10.

A computer study of a suspension-formation in the masses of Josquin Desprez

P. Howard Patrick 《Language Resources and Evaluation》1974,8(5-6):321-331

Conclusion The program is adequate testimony that the I.M.L.—M.I.R. system can handle complicated musical procedures, and that furthermore, the present computer staff-format can be easily modified to print “normal” music symbols once music type-bars can be added to the printer. 相似文献

11.

Unfolding San Lorenzo

Ntovros Vasileios 《Nexus Network Journal》2009,11(3):471-488

This paper proposes a “reading” of the church of San Lorenzo in Turin, designed by Guarino Guarini, through the philosophical notion of “fold” introduced by Gilles Deleuze. The paper consists of two parts. The first part contains an exploration of the notion of “fold” in architecture and in philosophy and examines the use of the fold in the theory of Baroque architecture as well as the range of this new tool in architectural practise in contemporary architecture and in philosophy and examines the use of the fold as fundamental condition for understanding Baroque era. The second part contains the application of the notion of fold as a philosophical and conceptual framework for the “reading” of the chapel. 相似文献

12.

Mapping strategy for web-driven magazines with personalized advertisement and content

Fabio?Giannetti Email author 《Multimedia Tools and Applications》2009,43(3):327-343

“There will always (I hope) be print books, but just as the advent of photography changed the role of painting or film changed the role of theater in our culture, electronic publishing is changing the world of print media. To look for a one-to-one transposition to the new medium is to miss the future until it has passed you by.”—Tim O’Reilly (2002). It is not hard to envisage that publishers will leverage subscribers’ information, interest groups’ shared knowledge and others sources to enhance their publications. While this enhances the value of the publication through more accurate and personalized content, it also brings a new set of challenges to the publisher. Content is now driven by web and in a truly automated system, that is, no designer “re-touch” intervention is envisaged. This paper introduces an exploratory mapping strategy to allocate web driven content in a highly graphical publication like a traditional magazine. Two major aspects of the mapping are covered, those enable different level of flexibility and address different content flowing strategies. The last contribution is an evaluation of existing standards, which potentially can leverage this work to incorporate flexible mapping, and subsequently, composition capabilities. The work published here is an extended version of the article presented at the Eight ACM Symposium on Document Engineering in fall 2008 (Giannetti 2008). 相似文献

13.

A multimedia application for watermarking digital images based on a content based image retrieval technique

Dimitrios K. Tsolis Spyros Sioutas Theodore S. Papatheodorou 《Multimedia Tools and Applications》2010,47(3):581-597

The current work is focused on the implementation of a robust multimedia application for watermarking digital images, which is based on an innovative spread spectrum analysis algorithm for watermark embedding and on a content-based image retrieval technique for watermark detection. The existing highly robust watermark algorithms are applying “detectable watermarks” for which a detection mechanism checks if the watermark exists or not (a Boolean decision) based on a watermarking key. The problem is that the detection of a watermark in a digital image library containing thousands of images means that the watermark detection algorithm is necessary to apply all the keys to the digital images. This application is non-efficient for very large image databases. On the other hand “readable” watermarks may prove weaker but easier to detect as only the detection mechanism is required. The proposed watermarking algorithm combine’s the advantages of both “detectable” and “readable” watermarks. The result is a fast and robust multimedia application which has the ability to cast readable multibit watermarks into digital images. The watermarking application is capable of hiding 2¹⁴ different keys into digital images and casting multiple zero-bit watermarks onto the same coefficient area while maintaining a sufficient level of robustness. 相似文献

14.

Is There a Future for AI Without Representation?

Vincent C. Müller 《Minds and Machines》2007,17(1):101-115

This paper investigates the prospects of Rodney Brooks’ proposal for AI without representation. It turns out that the supposedly characteristic features of “new AI” (embodiment, situatedness, absence of reasoning, and absence of representation) are all present in conventional systems: “New AI” is just like old AI. Brooks proposal boils down to the architectural rejection of central control in intelligent agents—Which, however, turns out to be crucial. Some of more recent cognitive science suggests that we might do well to dispose of the image of intelligent agents as central representation processors. If this paradigm shift is achieved, Brooks’ proposal for cognition without representation appears promising for full-blown intelligent agents—Though not for conscious agents. 相似文献

15.

Using the web infrastructure to preserve web pages

Michael L. Nelson Frank McCown Joan A. Smith Martin Klein 《International Journal on Digital Libraries》2007,6(4):327-349

To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation. However, due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what is not saved. We provide an overview of our ongoing research projects that focus on using the “web infrastructure” to provide preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval. The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they focus on providing at least some form of archival capability for the mass of web pages that may prove to have value in the future. We characterize the preservation approaches by the level of effort required by the web administrator: web sites are reconstructed from the caches of search engines (“lazy preservation”); lexical signatures are used to find the same or similar pages elsewhere on the web (“just-in-time preservation”); resources are pushed to other sites using NNTP newsgroups and SMTP email attachments (“shared infrastructure preservation”); and an Apache module is used to provide OAI-PMH access to MPEG-21 DIDL representations of web pages (“web server enhanced preservation”). 相似文献

16.

Recognition of table of contents for electronic library consulting

A. Belaïd 《International Journal on Document Analysis and Recognition》2001,4(1):35-45

A labelling approach for the automatic recognition of tables of contents (ToC) is described in this paper. A prototype is used for the electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labelling without using any a priori model. Labelling is based on part-of-speech tagging (PoS) which is initiated by a primary labelling of text components using some specific dictionaries. Significant tags are first grouped into homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to article fields: “title” and “authors”. Non-labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected articles. The designed prototype operates very well on different ToC layouts and character recognition qualities. Without manual intervention, a 96.3% rate of correct segmentation was obtained on 38 journals, including 2,020 articles, accompanied by a 93.0% rate of correct field extraction. Received April 5, 2000 / Revised February 19, 2001 相似文献

17.

Original, Authentic, Copy: Conceptual Issues in Digital Texts

Barwell Graham 《Literary and Linguistic Computing》2005,20(4):415-424

This article focuses on the conceptual issues faced by scholarlyeditors and textual studies specialists. Theoretical debatein this general field is still active as digital texts presentspecial problems and magnify others. Older theory and methodologyare hampered by unacknowledged, sometimes inappropriate culturalvalues and other limitations, and are not always useful in connectionwith digital texts. Nevertheless, the distinction between theabstract work and its concrete expression is influential bothwithin and outside the field. In this approach, the conceptof authenticity relates to the degree of change a work undergoesor the accuracy of the ‘instructions’ for its reconstitution.Whether the digital text is best thought of as immaterial ormaterial is not as crucial as might first appear. The way adigital text is made visible is important, though potentiallyparadoxical. In order to be workable, the concept of authenticationby instructions needs further technical assistance, like thatprovided by the Just-in-Time Markup System. But, despite itslimitations, traditional textual scholarship still has muchto offer textual studies in digital environments. 相似文献

18.

How “Authentic Intentionality” can be Enabled: a Neurocomputational Hypothesis

Matteo Colombo 《Minds and Machines》2010,20(2):183-202

According to John Haugeland, the capacity for “authentic intentionality” depends on a commitment to constitutive standards of objectivity. One of the consequences of Haugeland’s view is that a neurocomputational explanation cannot be adequate to understand “authentic intentionality”. This paper gives grounds to resist such a consequence. It provides the beginning of an account of authentic intentionality in terms of neurocomputational enabling conditions. It argues that the standards, which constitute the domain of objects that can be represented, reflect the statistical structure of the environments where brain sensory systems evolved and develop. The objection that I equivocate on what Haugeland means by “commitment to standards” is rebutted by introducing the notion of “florid, self-conscious representing”. Were the hypothesis presented plausible, computational neuroscience would offer a promising framework for a better understanding of the conditions for meaningful representation. 相似文献

19.

ISO/IEC standardization of identity management and privacy technologies

Kai Rannenberg 《Datenschutz und Datensicherheit - DuD》2011,35(1):27-29

To cover projects in the area of identity management, privacy, and biometrics ISO/IEC JTC 1/ SC 27 “IT Security techniques” in 2006 established Working Group 5 “Identity Management and Privacy Technologies”. This text describes the reasoning to have this Working Group within SC 27 and introduces WG 5 and its projects. 相似文献

20.

Augmenting paper-based reading activity with direct access to digital materials and scaffolded questioning

Nian-Shing Chen Daniel Chia-En Teng Cheng-Han Lee Kinshuk 《Computers & Education》2011,57(2):1705-1715

Comprehension is the goal of reading. However, students often encounter reading difficulties due to the lack of background knowledge and proper reading strategy. Unfortunately, print text provides very limited assistance to one’s reading comprehension through its static knowledge representations such as symbols, charts, and graphs. Integrating digital materials and reading strategy into paper-based reading activities may bring opportunities for learners to make meaning of the print material. In this study, QR codes were adopted in association with mobile technology to deliver supplementary materials and questions to support students’ reading. QR codes were printed on paper prints to provide direct access to digital materials and scaffolded questions. Smartphones were used to scan the printed QR codes to fetch pre-designed digital resources and scaffolded questions over the Internet. A quasi-experiment was conducted to evaluate the effectiveness of direct access to the digital materials prepared by the instructor using QR codes and that of scaffolded questioning in improving students’ reading comprehension. The results suggested that direct access to digital resources using QR codes does not significantly influence students’ reading comprehension; however, the reading strategy of scaffolded questioning significantly improves students’ understanding about the text. The survey showed that most students agreed that the integrated print-and-digital-material- based learning system benefits English reading comprehension but may not be as efficient as expected. The implications of the findings shed light on future improvement of the system. 相似文献