排序方式: 共有12条查询结果,搜索用时 15 毫秒
1.
Data-Intensive Web Sites: Design and Maintenance 总被引:1,自引:0,他引:1
2.
Valter Crescenzi Alvaro A. A. Fernandes Paolo Merialdo Norman W. Paton 《Knowledge and Information Systems》2017,50(1):1-26
The discovery of informative itemsets is a fundamental building block in data analytics and information retrieval. While the problem has been widely studied, only few solutions scale. This is particularly the case when (1) the data set is massive, calling for large-scale distribution, and/or (2) the length k of the informative itemset to be discovered is high. In this paper, we address the problem of parallel mining of maximally informative k-itemsets (miki) based on joint entropy. We propose PHIKS (Parallel Highly Informative \(\underline{K}\)-ItemSet), a highly scalable, parallel miki mining algorithm. PHIKS renders the mining process of large-scale databases (up to terabytes of data) succinct and effective. Its mining process is made up of only two efficient parallel jobs. With PHIKS, we provide a set of significant optimizations for calculating the joint entropies of miki having different sizes, which drastically reduces the execution time, the communication cost and the energy consumption, in a distributed computational platform. PHIKS has been extensively evaluated using massive real-world data sets. Our experimental results confirm the effectiveness of our proposal by the significant scale-up obtained with high itemsets length and over very large databases. 相似文献
3.
Distributed and Parallel Databases - We present a crowdsourcing system for large-scale production of accurate wrappers to extract data from data-intensive websites. Our approach is based on... 相似文献
4.
Derouault AM Merialdo B 《IEEE transactions on pattern analysis and machine intelligence》1986,(6):742-749
This paper relates different kinds of language modeling methods that can be applied to the linguistic decoding part of a speech recognition system with a very large vocabulary. These models are studied experimentally on a pseudophonetic input arising from French stenotypy. We propose a model which combines the advantages of a statistical modeling with information theoretic tools, and those of a grammatical approach. 相似文献
5.
6.
7.
In this paper we propose several novel algorithms for multi-video summarization. The first and essential algorithm, Video Maximal Marginal Relevance (Video-MMR), mimics the principle of a classical algorithm of text summarization, Maximal Marginal Relevance (MMR). Video-MMR rewards relevant keyframes and penalizes redundant keyframes, only relying on visual features. We extend Video-MMR to Audio Video Maximal Marginal Relevance (AV-MMR) by exploiting audio features. We also propose Balanced AV-MMR, which exploits additional semantic features, the balance between audio information and visual information, and the balance of temporal information in different videos of a set. The proposed algorithms are generic and suitable for summarizing various video genres in multi-video set by using multimodal information. Our series of MMR algorithms for multi-video summarization are proved to be effective by the large-scale subjective and objective evaluation. 相似文献
8.
9.
Partition sampling: an active learning selection strategy for large database annotation 总被引:3,自引:0,他引:3
Souvannavong F. Merialdo B. Huet B. 《Vision, Image and Signal Processing, IEE Proceedings -》2005,152(3):347-355
Annotating a video database requires an intensive, time consuming and error prone human effort. However, this is a mandatory task to efficiently describe multimedia contents and train models for automatic content detection. A new selection strategy for active learning methods to minimise human effort in labelling a large database of video sequences is proposed. Formally, active learning is a process where new unlabelled samples are selected iteratively, then presented to users for annotation, and finally added to the training set. The major problem is to then find the best selection function to quickly reach high classification accuracy. It is shown that existing active learning approaches using selective sampling do not maintain their performances when the number of selected samples per iteration increases. The presented selection strategy attempts to provide a solution to this problem. In practice, selecting many samples offers many advantages when dealing with a large amount of data; among them the possibility to share the annotation effort between several users. Finally an attempt to tackle the more realistic and challenging task of multiple label annotation is made. This would reduce to greater extend the human effort for labelling. 相似文献
10.
The information globalization induced by the rapid development of the Internet and the accompanying adoption of the Web throughout the society leads to Websites which reach large audiences. The diversity of the audiences and the need of customer retention require active Websites, which expose themselves in a customized or personalized way: We call those sites User-adapted Websites. New technologies are necessary to personalize and customize content. Information filtering can be used for the discovery of important content and is therefore a key-technology for the creation of user-adapted Websites.
In this article, we focus on the application of collaborative filtering for user-adapted Websites. We studied techniques for combining and integrating content-based filtering with collaborative filtering in order to address typical problems in collaborative filtering systems and to improve the performance. Other issues which are mentioned but only lightly covered include user interface challenges. To validate our approaches we developed a prototype user-adapted Website, the Active WebMuseum, a museum Website, which exposes its collection in a personalized way by the use of collaborative filtering. 相似文献