共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper discusses techniques for improving the performance of keyword-based web image queries. Firstly, a web page is segmented
into several text blocks based on semantic cohesion. The text blocks which contain web images are taken as the associated
texts of corresponding images and TF*IDF model is initially used to index those web images. Then, for each keyword, both relevant web image set and irrelevant web
image set are selected according to their TF*IDF values. And visual feature distributions of both positive image and negative image are modeled using Gaussian Mixture Model.
An image’s relevance to the keyword with respect to visual feature is thus defined as the ratio of positive distribution density
over negative distribution density. We combine the text-based relevance model with visual feature relevance model to improve
the performance. Thirdly, a query expansion model is used to improve the performance further. Expansion terms are selected
according to their cooccurrences with the query terms in the top-relevant set of the original query. Our experiments show
that our approach yield significant improvement over the traditional keyword based query model. 相似文献
2.
Annotating images by mining image search results 总被引:3,自引:0,他引:3
Xin-Jing Wang Lei Zhang Xirong Li Wei-Ying Ma 《IEEE transactions on pattern analysis and machine intelligence》2008,30(11):1919-1932
3.
《Interacting with computers》2003,15(4):479-495
Small handheld devices—mobile phones, Pocket PCs etc.—are increasingly being used to access the web. Search engines are the most used web services and are an important factor of user support. Search engine providers have begun to offer their services on the small screen. This paper presents a detailed evaluation of the how easy to use such services are in these new contexts. An experiment was carried out to compare users' abilities to complete search tasks using a mobile phone-sized, handheld computer-sized and conventional, desktop interface to the full Google™ index. With all three interfaces, when users succeed in completing a task, they do so quickly (within 2–3 min) and using few interactions with the search engine. When they fail, though, they fail badly. The paper examines the causes of failures in small screen searching and proposes guidelines for improving these interfaces. In addition, we present and discuss novel interaction schemes that put these guidelines into practice. 相似文献
4.
Algorithms for clustering Web search results have to be efficient and robust. Furthermore they must be able to cluster a data set without using any kind of a priori information, such as the required number of clusters. Clustering algorithms inspired by the behavior of real ants generally meet these requirements. In this article we propose a novel approach to ant‐based clustering, based on fuzzy logic. We show that it improves existing approaches and illustrates how our algorithm can be applied to the problem of Web search results clustering. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 455–474, 2007. 相似文献
5.
Complex queries are widely used in current Web applications. They express highly specific information needs, but simply aggregating the meanings of primitive visual concepts does not perform well. To facilitate image search of complex queries, we propose a new image reranking scheme based on concept relevance estimation, which consists of Concept-Query and Concept-Image probabilistic models. Each model comprises visual, web and text relevance estimation. Our work performs weighted sum of the underlying relevance scores, a new ranking list is obtained. Considering the Web semantic context, we involve concepts by leveraging lexical and corpus-dependent knowledge, such as Wordnet and Wikipedia, with co-occurrence statistics of tags in our Flickr corpus. The experimental results showed that our scheme is significantly better than the other existing state-of-the-art approaches. 相似文献
6.
E. Lefever Author Vitae T. Fayruzov Author Vitae M. De Cock Author Vitae 《Information Sciences》2010,180(17):3192-4672
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not. 相似文献
7.
This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space. 相似文献
8.
Web spam denotes the manipulation of web pages with the sole intent to raise their position in search engine rankings. Since
a better position in the rankings directly and positively affects the number of visits to a site, attackers use different
techniques to boost their pages to higher ranks. In the best case, web spam pages are a nuisance that provide undeserved advertisement
revenues to the page owners. In the worst case, these pages pose a threat to Internet users by hosting malicious content and
launching drive-by attacks against unsuspecting victims. When successful, these drive-by attacks then install malware on the
victims’ machines. In this paper, we introduce an approach to detect web spam pages in the list of results that are returned
by a search engine. In a first step, we determine the importance of different page features to the ranking in search engine
results. Based on this information, we develop a classification technique that uses important features to successfully distinguish
spam sites from legitimate entries. By removing spam sites from the results, more slots are available to links that point
to pages with useful content. Additionally, and more importantly, the threat posed by malicious web sites can be mitigated,
reducing the risk for users to get infected by malicious code that spreads via drive-by attacks. 相似文献
9.
Ioannis Kitsos Kostas Magoutis Yannis Tzitzikas 《Distributed and Parallel Databases》2014,32(3):405-446
Although Web Search Engines index and provide access to huge amounts of documents, user queries typically return only a linear list of hits. While this is often satisfactory for focalized search, it does not provide an exploration or deeper analysis of the results. One way to achieve advanced exploration facilities exploiting the availability of structured (and semantic) data in Web search, is to enrich it with entity mining over the full contents of the search results. Such services provide the users with an initial overview of the information space, allowing them to gradually restrict it until locating the desired hits, even if they are low ranked. This is especially important in areas of professional search such as medical search, patent search, etc. In this paper we consider a general scenario of providing such services as meta-services (that is, layered over systems that support keywords search) without a-priori indexing of the underlying document collection(s). To make such services feasible for large amounts of data we use the MapReduce distributed computation model on a Cloud infrastructure (Amazon EC2). Specifically, we show how the required computational tasks can be factorized and expressed as MapReduce functions. A key contribution of our work is a thorough evaluation of platform configuration and tuning, an aspect that is often disregarded and inadequately addressed in prior work, but crucial for the efficient utilization of resources. Finally we report experimental results about the achieved speedup in various settings. 相似文献
10.
The problem of obtaining relevant results in web searching has been tackled with several approaches. Although very effective techniques are currently used by the most popular search engines when no a priori knowledge on the user's desires beside the search keywords is available, in different settings it is conceivable to design search methods that operate on a thematic database of web pages that refer to a common body of knowledge or to specific sets of users. We have considered such premises to design and develop a search method that deploys data mining and optimization techniques to provide a more significant and restricted set of pages as the final result of a user search. We adopt a vectorization method based on search context and user profile to apply clustering techniques that are then refined by a specially designed genetic algorithm. In this paper we describe the method, its implementation, the algorithms applied, and discuss some experiments that has been run on test sets of web pages. 相似文献
11.
Yihong Gong 《Multimedia Systems》1999,7(6):449-457
In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination
of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties
of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain
many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image
retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval.
The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under
comparison in terms of not only quantitative measures, but also image retrieval capabilities. 相似文献
12.
Chunlei Yang Jinye Peng Xiaoyi Feng Jianping Fan 《Multimedia Tools and Applications》2014,70(2):661-688
Keyword-based image search engines are now very popular for accessing large amounts of Web images on the Internet. Most existing keyword-based image search engines may return large amounts of junk images (which are irrelevant to the given query word), because the text terms that are loosely associated with the Web images are also used for image indexing. The objective of the proposed work is to effectively filter out the junk images from image search results. Therefore, bilingual image search results for the same keyword-based query are integrated to identify the clusters of the junk images and the clusters of the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. Experiments for a large number of bilingual keyword-based queries (5,000 query words) are simultaneously performed on two keyword-based image search engines (Google Images in English and Baidu Images in Chinese), and our experimental results have shown that integrating bilingual image search results can filter out the junk images effectively. 相似文献
13.
Web检索结果快速聚类方法的研究与实现 总被引:2,自引:0,他引:2
为了帮助Web用户从搜索引擎所返回的大量文档片断中筛选出自己所需要的文档,在对聚类过程研究分析的基础上给出了一种Web检索结果快速聚类方法。它通过分析聚类过程,从建立索引模型、相似性的计算到聚类结果的形成等环节,都做了分析和简化,并利用检索结果的标题、Url以及文档片断3部分所含信息计算返回结果之间的相似度,将首先返回的部分检索结果利用无向图映射法进行部分聚类后,将其余返回结果分配到与之最相近的集簇中最终形成聚类结果。该方法实现简单。实验证明该方法响应速度快,聚类相关性较高,空间占用少。 相似文献
14.
15.
Presenting and browsing image search results play key roles in helping users to find desired images from search results. Most
existing commercial image search engines present them, dependent on a ranked list. However, such a scheme suffers from at
least two drawbacks: inconvenience for consumers to get an overview of the whole result, and high computation cost to find
desired images from the list. In this paper, we introduce a novel search result summarization approach and exploit this approach
to further propose an interactive browsing scheme. The main contribution of this paper includes: (1) a dynamic absorbing random
walk to find diversified representatives for image search result summarization; (2) a local scaled visual similarity evaluation
scheme between two images through inspecting the relation between each image and other images; and (3) an interactive browsing
scheme, based on a tree structure for organizing the images obtained from the summarization approach, to enable users to intuitively
and conveniently browse the image search results. Quantitative experimental results and user study demonstrate the effectiveness
of the proposed summarization and browsing approaches. 相似文献
16.
《Information Systems》2005,30(4):299-316
We present a page clipping synthesis (PCS) search method to extract relevant paragraphs from other web search results. The PCS search method applies a dynamically terminated genetic algorithm to generate a set of best-of-run page clippings in a controlled amount of time. These page clippings provide users the information they are most interested in and therefore save the users time and trouble in browsing lots of hyperlinks. We justify that the dynamically terminated genetic algorithm yields cost-effective solutions compared with solutions reached by conventional genetic algorithms. Meanwhile, effectiveness measure confirmed that PCS performs better than general search engines. 相似文献
17.
Concept-based near-duplicate video clip detection for novelty re-ranking of web video search results
State-of-the-art near-duplicate video clip (NDVC) detection for novelty re-ranking uses non-semantic low-level features (color/texture) to detect and eliminate “content-based NDVC” and increases content level novelty in the top results. However, humans may perceive a video as near duplicate from a semantic perspective as well. In this paper, we propose concept-based near-duplicate video clip (CBNDVC) detection technique for novelty re-ranking. We identify “semantic NDVC”, making use of the semantic features (events/concepts) and re-rank the top results to increase the content as well as semantic novelty. Videos are represented as a multivariate time series of confidence values of relevant concepts and thereafter discovery of CBNDVC clusters is achieved by conceptual clustering. Obtained results show higher precision and recall from the user’s perspective. 相似文献
18.
We present the design and implementation of a web mining system that creates a hierarchical clustering of web documents retrieved by commercial web search engines. The cluster hierarchy is produced by a novel method called the Cluster Hierarchy Construction Algorithm (CHCA) and it can be used to explore the topics of interest related to the search query and their relationships. We discuss important design issues for our system, including stemming and dimensionality reduction, as well as some implementation details. We show examples of system results, compare them with results from similar systems, and analyze the responses to a survey of the system's users. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 607–625, 2005. 相似文献
19.
Giuseppe Fenza Sabrina Senatore 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2010,14(8):811-819
This work describes a system for supporting the user in the discovery of semantic web services, taking into account personal
requirements and preference. Goal is to model an ad-hoc service request by selecting conceptual terms rather than using strict
syntax formats. Through a concept-based navigation mechanism indeed, the user discovers conceptual terminology associated
to the web resources and uses it to generate an appropriate service request which syntactical matches the names of input/output
specifications. The approach exploits the fuzzy formal concept analysis for modeling concepts and relative relationships elicited
from web resources. After the request formulation and submission, the system returns the list of semantic web services that
match the user query. 相似文献
20.
Eyas El-Qawasmeh 《Digital Creativity》2013,24(4):212-224
The storage and retrieval of multimedia has become a requirement for many information systems. This paper presents a comprehensive survey of image search engines, with many clarifying comments. First, we looked at image search engine architecture, followed by the role of the crawler in detecting images. We reviewed the common World Wide Web based systems for image retrieval developed in research institutions and in commercial business. A comparative performance study between the existing engines is also presented. 相似文献