期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Content-based image indexing and searching using Daubechies' wavelets 总被引：8，自引：0，他引：8

James Ze Wang Gio Wiederhold Oscar Firschein Sha Xin Wei 《International Journal on Digital Libraries》1998,1(4):311-328

This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10 000 general-purpose images, the best 100 matches were found in 3.3 seconds. 相似文献

2.

Using object and trajectory analysis to facilitate indexing and retrieval of video

Carlos Lopez Yi-Ping Phoebe Chen 《Knowledge》2006,19(8):639-646

This paper aims to show that by using low level feature extraction, motion and object identifying and tracking methods, features can be extracted and indexed for efficient and effective retrieval for video; such as an awards ceremony video. Video scene/shot analysis and key frame extraction are used as a foundation to identify objects in video and be able to find spatial relationships within the video. The compounding of low level features such as colour, texture and abstract object identification lead into higher level real object identification and tracking and scene detection. The main focus is on using a video style that is different to the heavily used sports and news genres. Using different video styles can open the door to creating methods that could encompass all video types instead of specialized methods for each specific style of video. 相似文献

3.

基于相似图片聚类的Web文本特征算法

方爽殷俊杰徐武平《计算机工程》2014,(12)

对于图文不符的低质量网页,现有基于文本关键词的图片搜索引擎得到的结果相关性较差。针对该问题,将图片的相似性聚类信息和网页质量因素融入文本分析过程,提出一种基于相似图片聚类的Web文本特征算法。根据网页Page Rank值、关键词HTML标签类别和关键词词性类别的不同,分别赋予其不同的权重并代入计算公式,综合计算得到整个聚类中全部关键词的文本特征值,并通过设置阈值提取高相关文本。对随机选取的15个图片聚类进行实验分析,结果表明,与百度和谷歌目前所用图片搜索算法相比,该算法能够准确地找到反映图片内容的真实文本,提高图片检索的精度。相似文献

4.

Multimedia indexing and retrieval: ever great challenges

Chabane Djeraba Moncef Gabbouj Patrick Bouthemy 《Multimedia Tools and Applications》2006,30(3):221-228

In this introduction, we present a brief state of the art of multimedia indexing and retrieval as well as highlight some notions explored in the special issue. We hope that the contributions of this special issue will present ingredients for further investigations on this ever challenging domain. The special issue is actually situated between old problems and new challenges, and contribute to understand the next multimedia indexing and retrieval generation. The contributions explore wide range of fields such as signal processing, data mining and information retrieval. 相似文献

5.

Image database indexing using JPEG coefficients

Sharlee ClimerAuthor VitaeSanjiv K. BhatiaAuthor Vitae 《Pattern recognition》2002,35(11):2479-2488

Image database indexing is used for efficient retrieval of images in response to a query expressed as an example image. The query image is processed to extract information that is matched against the index to provide pointers to similar images. We present a technique that facilitates content similarity-based retrieval of jpeg-compressed images without first having to uncompress them. The technique is based on an index developed from a subset of jpeg coefficients and a similarity measure to determine the difference between the query image and the images in the database. This method offers substantial efficiency as images are processed in compressed format, information that was derived during the original compression of the images is reused, and extensive early pruning is possible. Initial experiments with the index have provided encouraging results. The system outputs a set of ranked images in the database with respect to the query using the similarity measure, and can be limited to output a specified number of matched images by changing the threshold match. 相似文献

6.

Audio indexing: primary components retrieval

Julien Pinquier Régine André-Obrecht 《Multimedia Tools and Applications》2006,30(3):313-330

This work addresses the soundtrack indexing of multimedia documents. Our purpose is to detect and locate sound unity to structure the audio dataflow in program broadcasts (reports). We present two audio classification tools that we have developed. The first one, a speech music classification tool, is based on three original features: entropy modulation, stationary segment duration (with a Forward–Backward Divergence algorithm) and number of segments. They are merged with the classical 4 Hz modulation energy. It is divided into two classifications (speech/non-speech and music/non-music) and provides more than 90% of accuracy for speech detection and 89% for music detection. The other system, a jingle identification tool, uses an Euclidean distance in the spectral domain to index the audio data flow. Results show that is efficient: among 132 jingles to recognize, we have detected 130. Systems are tested on TV and radio corpora (more than 10 h). They are simple, robust and can be improved on every corpus without training or adaptation.

Régine André-ObrechtEmail:

相似文献

7.

Searchable words on the Web

Hugh E. Williams Justin Zobel 《International Journal on Digital Libraries》2005,5(2):99-105

In designing data structures for text databases, it is valuable to know how many different words are likely to be encountered in a particular collection. For example, vocabulary accumulation is central to index construction for text database systems; it is useful to be able to estimate the space requirements and performance characteristics of the main-memory data structures used for this task. However, it is not clear how many distinct words will be found in a text collection or whether new words will continue to appear after inspecting large volumes of data. We propose practical definitions of a word and investigate new word occurrences under these models in a large text collection. We inspected around two billion word occurrences in 45 GB of World Wide Web documents and found just over 9.74 million different words in 5.5 million documents; overall, 1 word in 200 was new. We observe that new words continue to occur, even in very large datasets, and that choosing stricter definitions of what constitutes a word has only limited impact on the number of new words found. 相似文献

8.

Web图像检索系统平台

张华忠《现代计算机》2013,(12):64-67

Web图像检索系统平台,可以使图像以文本方式被索引和检索,类似于Google中的图像文本检索一样。该系统采用目前比较流行的J2EE架构：Strust2＋Spring＋Hibernate来开发,共分三层结构：表现层、业务逻辑层和持久层。搭建本系统的目的就是串通表现图像的相关研究工作和为以后图像的相关研究工作带来方便,同时,也是向图像商业化应用的学习尝试。相似文献

9.

On indexing metric spaces using cut-regions

《Information Systems》2014

After two decades of research, the techniques for efficient similarity search in metric spaces have combined virtually all the available tricks resulting in many structural index designs. As the representative state-of-the-art metric access methods (also called metric indexes) that vary in the usage of filtering rules and in structural designs, we could mention the M-tree, the M-Index and the List of Clusters, to name a few. In this paper, we present the concept of cut-regions that could heavily improve the performance of metric indexes that were originally designed to employ simple ball-regions. We show that the shape of cut-regions is far more compact than that of ball-regions, yet preserving simple and concise representation. We present three re-designed metric indexes originating from the above-mentioned ones but utilizing cut-regions instead of ball-regions. We show that cut-regions can be fully utilized in the index structure, positively affecting not only query processing but also the index construction. In the experiments we show that the re-designed metric indexes significantly outperform their original versions. 相似文献

10.

在线旅游业务中Web页面主体块提取方法研究

白鹤赵志强王劲林《微计算机信息》2010,(15)

Web信息提取是在线旅游业务的重要技术。页面的主体语义块集中了最主要的信息量,它的正确提取是Web信息提取的基础。本文在对现有页面分割方案总结的基础上,提出了结合机器学习方法的Web页面主体语义块节点识别算法,并基于启发式规则对正结果集进行后续的校验,以定位最佳的主体语义块节点。通过实验,本文提出的方案达到了比较理想的准确率。相似文献

11.

Web image retrieval using majority-based ranking approach 总被引：1，自引：0，他引：1

Gunhan Park Yunju Baek Heung-Kyu Lee 《Multimedia Tools and Applications》2006,31(2):195-219

Web image retrieval has characteristics different from typical content-based image retrieval; web images have associated textual cues. However, a web image retrieval system often yields undesirable results, because it uses limited text information such as surrounding text, URLs, and image filenames. In this paper, we propose a new approach to retrieval, which uses the image content of retrieved results without relying on assistance from the user. Our basic hypothesis is that more popular images have a higher probability of being the ones that the user wishes to retrieve. According to this hypothesis, we propose a retrieval approach that is based on a majority of the images under consideration. We define four methods for finding the visual features of majority of images; (1) majority-first method, (2) centroid-of-all method, (3) centroid-of-top K method, and (4) centroid-of-largest-cluster method. In addition, we implement a graph/picture classifier for improving the effectiveness of web image retrieval. We evaluate the retrieval effectiveness of both our methods and conventional ones by using precision and recall graphs. Experimental results show that the proposed methods are more effective than conventional keyword-based retrieval methods. 相似文献

12.

一个Web社区搜索引擎系统*

刘务华罗铁坚王文杰《计算机应用研究》2007,24(2):275-278

在分析Web社区搜索资源分散特点的基础上,运用Web抓取器、向量空间模型和相关性排序等技术设计了Web社区搜索引擎的体系结构,实现了一个Web社区搜索引擎系统--ChinalabSearch.根据对系统的性能评估,系统满足Web社区的搜索要求,提高了在社区内查找信息的效率,为组织间的合作提供了方便. 相似文献

13.

基于Web服务统一检索系统的设计 总被引：3，自引：0，他引：3

孙素云《现代计算机》2007,(4):79-81

通过分析目前数字图书馆统一检索方法,利用Web Services技术,对传统异构数据源集成方法Mediator/WrapPer进行改进,提出一个基于Web Services统一检索方案,以Web服务注册机制代替虚拟视图,在包装器上增加web服务封装及发布功能,构建资源透明访问框架,实现对分布式异构数字图书馆资源的统一检索. 相似文献

14.

Knowledge fusion framework based on Web page texts

Sikang HU Yuanda CAO 《Frontiers of Computer Science》2009,3(4):457

相似文献

15.

Polyhedral object recognition by indexing 总被引：1，自引：0，他引：1

Radu Humberto 《Pattern recognition》1995,28(12):1855-1870

In computer vision, the indexing problem is the problem of recognizing a few objects in a large database of objects while avoiding the help of the classical image-feature-to-object-feature matching paradigm. In this paper we address the problem of recognizing three-dimensional (3-D) polyhedral objects from 2-D images by indexing. Both the objects to be recognized and the images are represented by weighted graphs. The indexing problem is therefore the problem of determining whether a graph extracted from the image is present or absent in a database of model graphs. We introduce a novel method for performing this graph indexing process which is based both on polynomial characterization of binary and weighted graphs and on hashing. We describe in detail this polynomial characterization and then we show how it can be used in the context of polyhedral object recognition. Next we describe a practical recognition-by-indexing system that includes the organization of the database, the representation of polyhedral objects in terms of 2-D characteristic views, the representation of this views in terms of weighted graphs and the associated image processing. Finally, some experimental results allow the evaluation of the system performance. 相似文献

16.

Automated classification and localization of daily deal content from the Web

《Applied Soft Computing》2015

Websites offering daily deal offers have received widespread attention from the end-users. The objective of such Websites is to provide time limited discounts on goods and services in the hope of enticing more customers to purchase such goods or services. The success of daily deal Websites has given rise to meta-level daily deal aggregator services that collect daily deal information from across the Web. Due to some of the unique characteristics of daily deal Websites such as high update frequency, time sensitivity, and lack of coherent information representation, many deal aggregators rely on human intervention to identify and extract deal information. In this paper, we propose an approach where daily deal information is identified, classified and properly segmented and localized. Our approach is based on a semi-supervised method that uses sentence-level features of daily deal information on a given Web page. Our work offers (i) a set of computationally inexpensive discriminative features that are able to effectively distinguish Web pages that contain daily deal information; (ii) the construction and systematic evaluation of machine learning techniques based on these features to automatically classify daily deal Web pages; and (iii) the development of an accurate segmentation algorithm that is able to localize and extract individual deals from within a complex Web page. We have extensively evaluated our approach from different perspectives, the results of which show notable performance. 相似文献

17.

TREC2002中的WEB信息检索

杨志峰刘悦杨哲王斌程学旗《计算机工程与应用》2003,39(26):37-39,80

文本检索会议(TextREtrievalConference,TREC)是目前国际上信息检索领域最重要的学术交流与系统评测活动。会议为参加者提供标准的数据集合、评测问题和标准答案,从而使参加者以共同的标准进行系统运行和评测。作者代表中国科学院参加了文本检索会议的WEB信息检索任务。在TREC2002中,作者发现了适合不同数据集合的较高性能的内容检索算法,并综合考虑了文本内容、链接文字、文档结构等因素对WEB信息检索效果的影响,取得了较好的成绩。该方法在两届会议的不同任务中均表现了较高的性能。相似文献

18.

基于语义Web的信息系统建模与应用

王晓东高宏卿《计算机工程》2005,31(21):209-211

论述了基于语义Web进行信息系统建模的方法，利用该方法对一个应用系统进行了设计。讨论了系统的知识库与核心模块的设计，给出了一种检索算法。对Web的深入应用具有重要意义。相似文献

19.

网页净化及去重研究综述 总被引：1，自引：0，他引：1

罗元《现代计算机》2013,(10)

随着互联网的快速发展与搜索引擎的广泛使用,网页数据已经成为各种应用与研究的重要数据源之一。然而由于网页的特殊性,它所包含的信息并非都是各种应用所必需,例如：广告,导航条等。它们的存在会对各种应用产生不利影响。此外,网页检索结果中经常出现内容相同的冗余页面的问题。所以在网页数据的应用过程中网页净化、网页去重是一个基础问题,也是目前研究的一个热点问题。所以很有必要对网页净化和网页去重领域进行总结,以便更好地深入研究。从网页净化、去重的必要性出发,对它们进行定义和分类,概述多种网页净化、去重的方法和框架,并对其进行总结。相似文献

20.

Approximate retrieval of high-dimensional data withL 1 metric by spatial indexing

Takeshi Shinohara Jiyuan An Hiroki Ishizaka 《New Generation Computing》2000,18(1):39-47

High-dimensional data, such as documents, digital images, and audio clips, can be considered as spatial objects, which induce a metric space where the metric can be used to measure dissimilarities between objects. We investigate a method for retrieving objects within some distance from a given object by utilizing a spatial indexing/access method R-tree, which usually assumes Euclidean metric. First, we prove that objects in discreteL ₁ (or Manhattan distance) metric space can be embedded into vertices of a unit hyper-cube in Euclidean space when the square root ofL ₁ distance is used as the distance. To take fully advantage of R-tree spatial indexing, we have to project objects into space of relatively lower dimension. We adopt FastMap by Faloutsos and Lin to reduce the dimension of object space. The range corresponding to a query (Q, h) for retrieving objects within distanceh from a objectQ is naturally considered as a hyper-sphere even after FastMap projection, which is an orthogonal projection in Euclidean space. However, it is turned out that the query range is contracted into a smaller hyper-box than the hyper-sphere by applying FastMap to objects embedded in the above mentioned way. Finally, we give a brief summary of experiments in applying our method to Japanese chess boards. Takeshi Shinohara, Dr.Sci.: He is a Professor in the Department of Artificial Intelligence at Kyushu Institute of Technology. He obtained his bachelors degree in Mathematics from Kyoto University in 1980, and his Dr. Sci. from Kyushu University in 1986. His research interests are in Computational/Algorithmic Learning Theory, Information Retrieval, and Approximate Retrieval of Multimedia Data. Hiroki Ishizaka, Dr.Sci.: He is an Associate Professor in the Department of Artificial Intelligence at Kyushu Institute of Technology. He obtained his bachelors degree in Mathematics from Kyushu University in 1984, and his Dr.Sci. from Kyushu University in 1993. His research interests are in Computational/Algorithmic Learning Theory. 相似文献