期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A dynamic attribute-based data filtering and recovery scheme for web information processing 总被引：2，自引：2，他引：0

Amit Ahuja Yiu-Kai Ng 《Knowledge and Information Systems》2009,18(3):263-291

Web data being transmitted over a network channel on the Internet with excessive amount of data causes data processing problems, which include selectively choosing useful information to be retained for various data applications. In this paper, we present an approach for filtering less-informative attribute data from a source Website. A scheme for filtering attributes, instead of tuples (records), from a Website becomes imperative, since filtering a complete tuple would lead to filtering some informative, as well as less-informative, attribute data in the tuple. Since filtered data at the source Website may be of interest to the user at the destination Website, we design a data recovery approach that maintains the minimal amount of information for data recovery purpose while imposing minimal overhead for data recovery at the source Website. Our data filtering and recovery approach (1) handles a wide range of Web data in different application domains (such as weather, stock exchanges, Internet traffic, etc.), (2) is dynamic in nature, since each filtering scheme adjusts the amount of data to be filtered as needed, and (3) is adaptive, which is appealing in an ever-changing Internet environment. 相似文献

2.

Performing Binary-Categorization on Multiple-Record Web Documents Using Information Retrieval Models and Application Ontologies

Kwong Linus W. Ng Yiu-Kai 《World Wide Web》2003,6(3):281-303

To retrieve Web documents of interest, most of the Web users rely on Web search engines. All existing search engines provide query facility for users to search for the desired documents using search-engine keywords. However, when a search engine retrieves a long list of Web documents, the user might need to browse through each retrieved document in order to determine which document is of interest. We observe that there are two kinds of problems involved in the retrieval of Web documents: (1) an inappropriate selection of keywords specified by the user; and (2) poor precision in the retrieved Web documents. In solving these problems, we propose an automatic binary-categorization method that is applicable for recognizing multiple-record Web documents of interest, which appear often in advertisement Web pages. Our categorization method uses application ontologies and is based on two information retrieval models, the Vector Space Model (VSM) and the Clustering Model (CM). We analyze and cull Web documents to just those applicable to a particular application ontology. The culling analysis (i) uses CM to find a virtual centroid for the records in a Web document, (ii) computes a vector in a multi-dimensional space for this centroid, and (iii) compares the vector with the predefined ontology vector of the same multi-dimensional space using VSM, which we consider the magnitudes of the vectors, as well as the angle between them. Our experimental results show that we have achieved an average of 90% recall and 97% precision in recognizing Web documents belonged to the same category (i.e., domain of interest). Thus our categorization discards very few documents it should have kept and keeps very few it should have discarded. 相似文献

3.

Answering form-based web queries using the data-mining approach

Xiaochun Yang Yiu-Kai Ng 《Journal of Intelligent Information Systems》2008,30(1):1-32

Web users often post queries through form-based interfaces on the Web to retrieve data from the Web; however, answers to these queries are mostly computed according to keywords entered into different fields specified in a query interface, and their precision and recall could be low. The precision and recall ratios in answering this type of query can be improved by considering closely related previous queries submitted through the same interface, along with their answers. In this paper, we present an approach for enhancing the retrieval of relevant answers to a form-based Web query by adopting the data-mining approach using previous, relevant queries and their answers. Experimental results on a randomly selected set of 3,800 documents retrieved from various Web sites show that our data-mining, query-rewriting approach achieves average precision and true positive ratios on rewritten queries in the upper 80% range, whereas the average false positive ratio is less than 2.0%. Work partially done during a visit to BYU and partially supported by National Natural Science Foundation of China No. 60503036 and Fok YingTong Education Foundation No. 104027. 相似文献

4.

A Hybrid Fragmentation Approach for Distributed Deductive Database Systems

Seung-Jin Lim Yiu-Kai Ng 《Knowledge and Information Systems》2001,3(2):198-224

Fragmentation of base relations in distributed database management systems increases the level of concurrency and therefore system throughput for query processing. Algorithms for horizontal and vertical fragmentation of relations in relational, object-oriented and deductive databases exist; however, hybrid fragmentation techniques based on variable bindings appearing in user queries and query-access-rule dependency are lacking for deductive database systems. In this paper, we propose a hybrid fragmentation approach for distributed deductive database systems. Our approach first considers the horizontal partition of base relations according to the bindings imposed on user queries, and then generates vertical fragments of the horizontally partitioned relations and clusters rules using affinity of attributes and access frequency of queries and rules. The proposed fragmentation technique facilitates the design of distributed deductive database systems. Received 4 August 1999 / Revised 30 March 2000 / Accepted in revised form 6 October 2000 相似文献

5.

Exploiting the wisdom of social connections to make personalized recommendations on scholarly articles

Maria Soledad Pera Yiu-Kai Ng 《Journal of Intelligent Information Systems》2014,42(3):371-391

Existing scholarly publication recommenders were designed to aid researchers, as well as ordinary users, in discovering pertinent literature in diverse academic fields. These recommenders, however, often (i) depend on the availability of users’ historical data in the form of ratings or access patterns, (ii) generate recommendations pertaining to users’ (articles included in their) profiles, as oppose to their current research interests, or (iii) fail to analyze valuable user-generated data at social sites that can enhance their performance. To address these design issues, we propose PReSA, a personalized recommender on scholarly articles. PReSA recommends articles bookmarked by the connections of a user U on a social bookmarking site that are not only similar in content to a target publication P currently of interest to U but are also popular among U’s connections. PReSA (i) relies on the content-similarity measure to identify potential academic publications to be recommended and (ii) uses only information readily available on popular social bookmarking sites to make recommendations. Empirical studies conducted using data from CiteULike have verified the efficiency and effectiveness of (the recommendation and ranking strategies of) PReSA, which outperforms a number of existing (scholarly publication) recommenders. 相似文献

6.

Selective-Splitting and Cache-Maintenance Algorithms for Associative-Client Caches

Jiaxin J. Gao Dallan Quass Yiu-Kai Ng 《Distributed and Parallel Databases》2004,16(1):5-43

We propose a number of selective-splitting and cache-maintenance algorithms to reduce the computational complexity of associative-client caches and network load. Our selective-splitting algorithms selectively split query-intersected semantic regions based on the relative region access-latency or relative region size in a semantic data caching and replacement model. Our cache-maintenance algorithms are set up for studying a variety of design issues in synchronizing associative-client caches. We analyzed the performance of our proposed algorithms in a network environment. Results from our study show that the selective-splitting algorithms reduce the number of splitting operations by 80% in most cases, and the avoidance-based maintenance algorithms outperform the detection-based maintenance algorithms not only in reducing the network traffic but also in rendering consistent performance under various experimental variances. 相似文献

7.

Using maximal spanning trees and word similarity to generate hierarchical clusters of non-redundant RSS news articles

Maria Soledad Pera Yiu-Kai Dennis Ng 《Journal of Intelligent Information Systems》2012,39(2):513-534

RSS news articles that are either partially or completely duplicated in content are easily found on the Internet these days, which require Web users to sort through the articles to identify non-redundant information. This manual-filtering process is time-consuming and tedious. In this paper, we present a new filtering and clustering approach, called FICUS, which starts with identifying and eliminating redundant RSS news articles using a fuzzy set information retrieval approach and then clusters the remaining non-redundant RSS news articles according to their degrees of resemblance. FICUS uses a tree hierarchy to organize clusters of RSS news articles. The contents of the respective clusters are captured by the representative keywords from RSS news articles in the clusters so that searching and retrieval of similar RSS news articles is fast and efficient. FICUS is simple, since it uses the pre-defined word-correlation factors to determine related (words in) RSS news articles and filter redundant ones, and is supported by well-known and yet simple mathematical models, such as the standard deviation, vector space model, and probability theory, to generate clusters of non-redundant RSS news articles. Experiments performed on (test sets of) RSS news articles on various topics, which were downloaded from different online sources, verify the accuracy of FICUS on eliminating redundant RSS news articles, clustering similar RSS news articles together, and segregating different RSS news articles in terms of their?contents. In addition, further empirical studies show that FICUS outperforms well-known approaches adopted for clustering RSS news articles. 相似文献

8.

Enhancing web search by using query-based clusters and multi-document summaries

Rani Qumsiyeh Yiu-Kai Ng 《Knowledge and Information Systems》2016,47(2):355-380

相似文献

9.

Assisting web search using query suggestion based on word similarity measure and query modification patterns

Rani Qumsiyeh Yiu-Kai Ng 《World Wide Web》2014,17(5):1141-1160

One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time. 相似文献