We present an optimization-based unsupervised approach to automatic document summarization. In the proposed approach, text summarization is modeled as a Boolean programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. The approach proposed in this paper is applicable to both tasks: single- and multi-document summarization. In both tasks, documents are split into sentences in preprocessing. We select some salient sentences from document(s) to generate a summary. Finally, the summary is generated by threading all the selected sentences in the order that they appear in the original document(s). We implemented our model on multi-document summarization task. When comparing our methods to several existing summarization methods on an open DUC2005 and DUC2007 data sets, we found that our method improves the summarization results significantly. This is because, first, when extracting summary sentences, this method not only focuses on the relevance scores of sentences to the whole sentence collection, but also the topic representative of sentences. Second, when generating a summary, this method also deals with the problem of repetition of information. The methods were evaluated using ROUGE-1, ROUGE-2 and ROUGE-SU4 metrics. In this paper, we also demonstrate that the summarization result depends on the similarity measure. Results of the experiment showed that combination of symmetric and asymmetric similarity measures yields better result than their use separately. 相似文献
It is important to verify assumptions and methods of image retrieval against actual human behavior. A study was conducted to compare similarity methods of color histograms against human assessment of similarity. The similarity methods tested include basic histogram intersection, center histogram matching, locality histogram matching, and size-weighted histogram matching. 161 subjects participated in the empirical study. The findings, based on Spearman correlation analysis, showed that both the basic histogram intersection method and size-weighted histogram are very close to human assessment of similarity (Spearman correlation coefficient of 0.915). The other two are not close to human judgment on similarity. This study illustrates an alternative approach to evaluating matching algorithms. Unlike the usual measures of recall and precision, this approach emphasizes human validation. Fewer images are required with the use of statistical testing. 相似文献
Studies on odor mixture perception suggest that although odor components can often be identified in mixtures, mixtures can also give rise to novel perceptual qualities that are not present in the components. Using an olfactory habituation task, the authors evaluated how the perceptual similarity between components in a mixture affects the perceptual quality of the mixture itself. Rats perceived binary mixtures composed of similar components as different from their 2 components, whereas binary mixtures composed of dissimilar components were perceived as very similar to their components. Results show that for both types of mixtures, pretraining to Component A reduces subsequent learning about Component B in rats trained in the presence of A. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
Understanding how aging influences cognition across different cultures has been hindered by a lack of standardized, cross-referenced verbal stimuli. This study introduces a database of such item-level stimuli for both younger and older adults, in China and the United States, and makes 3 distinct contributions. First, the authors specify which item categories generalize across age and/or cultural groups, rigorously quantifying differences among them. Second, they introduce novel, powerful methods to measure between-group differences in freely generated ranked data, the rank-ordered logit model and Hellinger Affinity. Finally, a broad archive of tested, cross-linguistic stimuli is now freely available to researchers: data, similarity measures, and all stimulus materials for 105 categories and 4 culture-by-age groups, comprising over 10,000 fully translated unique item responses. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
Through a series of model tests of five scales for 2-D free hydraulic jump, the da-ta of fluctuating pressure acting on the floor level within the hydraulic jump were obtained. Dur-ing the experiments, Froude number varied from 2.94 to 8.61, and Reynolds number rangedfrom 2×10~4 to 6×10~5. Experiment results indicate that the amplitude scale of fluctuating pres-sure is the length scale of model, i. e. P'=L, which agrees with gravity similarity law: Thefrequency scale of the fluctuating pressure is a unity i. e., f=1, which does not satisfy thegravity similarity law. 相似文献
This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept.
A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system. 相似文献