共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper discusses techniques for improving the performance of keyword-based web image queries. Firstly, a web page is segmented
into several text blocks based on semantic cohesion. The text blocks which contain web images are taken as the associated
texts of corresponding images and TF*IDF model is initially used to index those web images. Then, for each keyword, both relevant web image set and irrelevant web
image set are selected according to their TF*IDF values. And visual feature distributions of both positive image and negative image are modeled using Gaussian Mixture Model.
An image’s relevance to the keyword with respect to visual feature is thus defined as the ratio of positive distribution density
over negative distribution density. We combine the text-based relevance model with visual feature relevance model to improve
the performance. Thirdly, a query expansion model is used to improve the performance further. Expansion terms are selected
according to their cooccurrences with the query terms in the top-relevant set of the original query. Our experiments show
that our approach yield significant improvement over the traditional keyword based query model. 相似文献
3.
Complex queries are widely used in current Web applications. They express highly specific information needs, but simply aggregating the meanings of primitive visual concepts does not perform well. To facilitate image search of complex queries, we propose a new image reranking scheme based on concept relevance estimation, which consists of Concept-Query and Concept-Image probabilistic models. Each model comprises visual, web and text relevance estimation. Our work performs weighted sum of the underlying relevance scores, a new ranking list is obtained. Considering the Web semantic context, we involve concepts by leveraging lexical and corpus-dependent knowledge, such as Wordnet and Wikipedia, with co-occurrence statistics of tags in our Flickr corpus. The experimental results showed that our scheme is significantly better than the other existing state-of-the-art approaches. 相似文献
4.
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches ( k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not. 相似文献
5.
This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space. 相似文献
6.
Web spam denotes the manipulation of web pages with the sole intent to raise their position in search engine rankings. Since
a better position in the rankings directly and positively affects the number of visits to a site, attackers use different
techniques to boost their pages to higher ranks. In the best case, web spam pages are a nuisance that provide undeserved advertisement
revenues to the page owners. In the worst case, these pages pose a threat to Internet users by hosting malicious content and
launching drive-by attacks against unsuspecting victims. When successful, these drive-by attacks then install malware on the
victims’ machines. In this paper, we introduce an approach to detect web spam pages in the list of results that are returned
by a search engine. In a first step, we determine the importance of different page features to the ranking in search engine
results. Based on this information, we develop a classification technique that uses important features to successfully distinguish
spam sites from legitimate entries. By removing spam sites from the results, more slots are available to links that point
to pages with useful content. Additionally, and more importantly, the threat posed by malicious web sites can be mitigated,
reducing the risk for users to get infected by malicious code that spreads via drive-by attacks. 相似文献
7.
Although Web Search Engines index and provide access to huge amounts of documents, user queries typically return only a linear list of hits. While this is often satisfactory for focalized search, it does not provide an exploration or deeper analysis of the results. One way to achieve advanced exploration facilities exploiting the availability of structured (and semantic) data in Web search, is to enrich it with entity mining over the full contents of the search results. Such services provide the users with an initial overview of the information space, allowing them to gradually restrict it until locating the desired hits, even if they are low ranked. This is especially important in areas of professional search such as medical search, patent search, etc. In this paper we consider a general scenario of providing such services as meta-services (that is, layered over systems that support keywords search) without a-priori indexing of the underlying document collection(s). To make such services feasible for large amounts of data we use the MapReduce distributed computation model on a Cloud infrastructure (Amazon EC2). Specifically, we show how the required computational tasks can be factorized and expressed as MapReduce functions. A key contribution of our work is a thorough evaluation of platform configuration and tuning, an aspect that is often disregarded and inadequately addressed in prior work, but crucial for the efficient utilization of resources. Finally we report experimental results about the achieved speedup in various settings. 相似文献
8.
The problem of obtaining relevant results in web searching has been tackled with several approaches. Although very effective techniques are currently used by the most popular search engines when no a priori knowledge on the user's desires beside the search keywords is available, in different settings it is conceivable to design search methods that operate on a thematic database of web pages that refer to a common body of knowledge or to specific sets of users. We have considered such premises to design and develop a search method that deploys data mining and optimization techniques to provide a more significant and restricted set of pages as the final result of a user search. We adopt a vectorization method based on search context and user profile to apply clustering techniques that are then refined by a specially designed genetic algorithm. In this paper we describe the method, its implementation, the algorithms applied, and discuss some experiments that has been run on test sets of web pages. 相似文献
9.
In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination
of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties
of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain
many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image
retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval.
The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under
comparison in terms of not only quantitative measures, but also image retrieval capabilities. 相似文献
10.
Keyword-based image search engines are now very popular for accessing large amounts of Web images on the Internet. Most existing keyword-based image search engines may return large amounts of junk images (which are irrelevant to the given query word), because the text terms that are loosely associated with the Web images are also used for image indexing. The objective of the proposed work is to effectively filter out the junk images from image search results. Therefore, bilingual image search results for the same keyword-based query are integrated to identify the clusters of the junk images and the clusters of the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. Experiments for a large number of bilingual keyword-based queries (5,000 query words) are simultaneously performed on two keyword-based image search engines (Google Images in English and Baidu Images in Chinese), and our experimental results have shown that integrating bilingual image search results can filter out the junk images effectively. 相似文献
12.
为了帮助Web用户从搜索引擎所返回的大量文档片断中筛选出自己所需要的文档,在对聚类过程研究分析的基础上给出了一种Web检索结果快速聚类方法。它通过分析聚类过程,从建立索引模型、相似性的计算到聚类结果的形成等环节,都做了分析和简化,并利用检索结果的标题、Url以及文档片断3部分所含信息计算返回结果之间的相似度,将首先返回的部分检索结果利用无向图映射法进行部分聚类后,将其余返回结果分配到与之最相近的集簇中最终形成聚类结果。该方法实现简单。实验证明该方法响应速度快,聚类相关性较高,空间占用少。 相似文献
13.
Presenting and browsing image search results play key roles in helping users to find desired images from search results. Most
existing commercial image search engines present them, dependent on a ranked list. However, such a scheme suffers from at
least two drawbacks: inconvenience for consumers to get an overview of the whole result, and high computation cost to find
desired images from the list. In this paper, we introduce a novel search result summarization approach and exploit this approach
to further propose an interactive browsing scheme. The main contribution of this paper includes: (1) a dynamic absorbing random
walk to find diversified representatives for image search result summarization; (2) a local scaled visual similarity evaluation
scheme between two images through inspecting the relation between each image and other images; and (3) an interactive browsing
scheme, based on a tree structure for organizing the images obtained from the summarization approach, to enable users to intuitively
and conveniently browse the image search results. Quantitative experimental results and user study demonstrate the effectiveness
of the proposed summarization and browsing approaches. 相似文献
14.
We present a page clipping synthesis (PCS) search method to extract relevant paragraphs from other web search results. The PCS search method applies a dynamically terminated genetic algorithm to generate a set of best-of-run page clippings in a controlled amount of time. These page clippings provide users the information they are most interested in and therefore save the users time and trouble in browsing lots of hyperlinks. We justify that the dynamically terminated genetic algorithm yields cost-effective solutions compared with solutions reached by conventional genetic algorithms. Meanwhile, effectiveness measure confirmed that PCS performs better than general search engines. 相似文献
15.
This correspondence describes an approach to reducing the computational cost of document image decoding by viewing it as a heuristic search problem. The kernel of the approach is a modified dynamic programming (DP) algorithm, called the iterated complete path (ICP) algorithm, that is intended for use with separable source models. A set of heuristic functions are presented for decoding formatted text with ICP. Speedups of 3-25 over DP have been observed when decoding text columns and telephone yellow pages using ICP and the proposed heuristics 相似文献
16.
The storage and retrieval of multimedia has become a requirement for many information systems. This paper presents a comprehensive survey of image search engines, with many clarifying comments. First, we looked at image search engine architecture, followed by the role of the crawler in detecting images. We reviewed the common World Wide Web based systems for image retrieval developed in research institutions and in commercial business. A comparative performance study between the existing engines is also presented. 相似文献
18.
This work describes a system for supporting the user in the discovery of semantic web services, taking into account personal
requirements and preference. Goal is to model an ad-hoc service request by selecting conceptual terms rather than using strict
syntax formats. Through a concept-based navigation mechanism indeed, the user discovers conceptual terminology associated
to the web resources and uses it to generate an appropriate service request which syntactical matches the names of input/output
specifications. The approach exploits the fuzzy formal concept analysis for modeling concepts and relative relationships elicited
from web resources. After the request formulation and submission, the system returns the list of semantic web services that
match the user query. 相似文献
19.
针对目前Web聚类效率和准确率不高的问题,提出一种基于Web页面链接结构和标签信息的聚类方法CWPBLT(clustering web pages based on their links and tags),它是通过分析Web页面中的链接结构和重要标签信息来比较页面之间的相似度,从而对Web站点中的Web页面进行聚类,聚类过程同时兼顾了Web页面结构和页面标签提供的内容信息.实验结果表明,该方法有效地提高了聚类的时间效率和准确性,是对以往仅基于页面主题内容或页面结构聚类方法的改进. 相似文献
20.
With the fast increase of multimedia contents, efficient forensics investigation methods for multimedia files have been required. In multimedia files, the similarity means that the identical media (audio and video) data are existing among multimedia files. This paper proposes an efficient multimedia file forensics system based on file similarity search of video contents. The proposed system needs two key techniques. First is a media-aware information detection technique. The first critical step for the similarity search is to find the meaningful keyframes or key sequences in the shots through a multimedia file, in order to recognize altered files from the same source file. Second is a video fingerprint-based technique (VFB) for file similarity search. The byte for byte comparison is an inefficient similarity searching method for large files such as multimedia. The VFB technique is an efficient method to extract video features from the large multimedia files. It also provides an independent media-aware identification method for detecting alterations to the source video file (e.g., frame rates, resolutions, and formats, etc.). In this paper, we focus on two key challenges: to generate robust video fingerprints by finding meaningful boundaries of a multimedia file, and to measure video similarity by using fingerprint-based matching. Our evaluation shows that the proposed system is possible to apply to realistic multimedia file forensics tools. 相似文献
|