首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The tamper-proof of web pages is of great importance. Some watermarking schemes have been reported to solve this problem. However, both these watermarking schemes and the traditional hash methods have a problem of increasing file size. In this paper, we propose a novel watermarking scheme for the tamper-proof of web pages, which is free of this embarrassment. For a web page, the proposed scheme generates watermarks based on the principal component analysis (PCA) technique. The watermarks are then embedded into the web page through the upper and lower cases of letters in HTML tags. When a watermarked web page is tampered, the extracted watermarks can detect the modifications to the web page, thus we can keep the tampered one from being published. Extensive experiments are performed on the proposed scheme and the results show that the proposed scheme can be a feasible and efficient tool for the tamper-proof of web pages.  相似文献   

2.
Non-redundant web services composition based on a two-phase algorithm   总被引:1,自引:0,他引:1  
Recently, there has been growing interest in developing web services composition search systems. Current solutions have the drawback of including redundant web services in the results. In this paper, we proposed a non-redundant web services composition search system called NRC, which is based on a two-phase algorithm. In the NRC system, the Link Index is built over web services according to their connectivity. In the forward phase, the candidate compositions are efficiently found by searching the Link Index. In the backward phase, the candidate compositions decomposed into several non-redundant web services compositions by using the concept of tokens. Results of experiments involving data sets with different characteristics show the performance benefits of the NRC techniques in comparison to state-of-the-art composition approaches.  相似文献   

3.
唐琳 《微型电脑应用》2002,18(7):28-30,50
本文以德州电业局创一流管理信息系统为例,详细介绍了企业Web站点资料上传,查询的设计与实现方法。  相似文献   

4.
对HITS(Hyperlink-Induced Topic Search)算法进行研究,克服其主体漂移和扩大化等缺点,改进并提出建立一个新的基于链接分析的Web检索结果的相关度排序方法,并把它应用到一个专题搜索引擎中.  相似文献   

5.
This paper provides a transparent and speculative algorithm for content based web page prefetching. The algorithm relies on a profile based on the Internet browsing habits of the user. It aims at reducing the perceived latency when the user requests a document by clicking on a hyperlink. The proposed user profile relies on the frequency of occurrence for selected elements forming the web pages visited by the user. These frequencies are employed in a mechanism for the prediction of the user’s future actions. For the anticipation of an adjacent action, the anchored text around each of the outbound links is used and weights are assigned to these links. Some of the linked documents are then prefetched and stored in a local cache according to the assigned weights. The proposed algorithm was tested against three different prefetching algorithms and yield improved cache–hit rates given a moderate bandwidth overhead. Furthermore, the precision of accurately inferring the user’s preference is evaluated through the recall–precision curves. Statistical evaluation testifies that the achieved recall–precision performance improvement is significant.  相似文献   

6.
移动手持设备因其屏幕小,有限的计算及存储能力而不便浏览普通Web页面;另一方面,对于PDA、手机用户,本着用户个性定制以及降低费用的原则也有必要对现有Web页面进行“裁减”。就以上问题,提出一种面向移动设备网页切割的解决方案:首先对半结构化的HTML文档进行结构化处理,接着基于DOM规范将HTML转化为DOM树并对其噪音清洗,然后对网页进行基于内容和基于链接的分块并对分块结果按照分层和用户定制的思想进行切割、重构,最后在开源项目HTMLParser基础上开发了原型系统并对系统执行效率和切割效果进行了评估。结果表明该方案切实可行,具有可观的应用价值。  相似文献   

7.
We present Juxtaposed approximate PageRank (JXP), a distributed algorithm for computing PageRank-style authority scores of Web pages on a peer-to-peer (P2P) network. Unlike previous algorithms, JXP allows peers to have overlapping content and requires no a priori knowledge of other peers’ content. Our algorithm combines locally computed authority scores with information obtained from other peers by means of random meetings among the peers in the network. This computation is based on a Markov-chain state-lumping technique, and iteratively approximates global authority scores. The algorithm scales with the number of peers in the network and we show that the JXP scores converge to the true PageRank scores that one would obtain with a centralized algorithm. Finally, we show how to deal with misbehaving peers by extending JXP with a reputation model. Partially supported by the EU within the 6th Framework Programme under contract 001907 “Dynamically Evolving, Large Scale Information Systems” (DELIS).  相似文献   

8.
Device-aware desktop web page transformation for rendering on handhelds   总被引:1,自引:0,他引:1  
This paper illustrates a new approach to automatic re-authoring of web pages for rendering on small-screen devices. The approach is based on automatic detection of the device type and screen size from the HTTP request header to render a desktop web page or a transformed one for display on small screen devices, for example, PDAs. Known algorithms (transforms) are employed to reduce the size of page elements, to hide parts of the text, and to transform tables into text while preserving the structural format of the web page. The system comprises a preprocessor that works offline and a just-in-time handler that responds to HTTP requests. The preprocessor employs Cascading Style Sheets (CSS) to set default attributes for the page and prepares it for the handler. The latter is responsible for downsizing graphical elements in the page, converting tables to text, and inserting visibility attributes and JavaScript code to allow the user of the client device to interact with the page and cause parts of the text to disappear or reappear. A system was developed that implements the approach and was used it to collect performance results and conduct usability testing. The importance of the approach lies in its ability to display hidden parts of the web page without having to revisit the server, thus reducing user wait times considerably, saving battery power, and cutting down on wireless network traffic.  相似文献   

9.
To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation. However, due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what is not saved. We provide an overview of our ongoing research projects that focus on using the “web infrastructure” to provide preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval. The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they focus on providing at least some form of archival capability for the mass of web pages that may prove to have value in the future. We characterize the preservation approaches by the level of effort required by the web administrator: web sites are reconstructed from the caches of search engines (“lazy preservation”); lexical signatures are used to find the same or similar pages elsewhere on the web (“just-in-time preservation”); resources are pushed to other sites using NNTP newsgroups and SMTP email attachments (“shared infrastructure preservation”); and an Apache module is used to provide OAI-PMH access to MPEG-21 DIDL representations of web pages (“web server enhanced preservation”).  相似文献   

10.
11.
This study reported an investigation of eighth graders’ (14-year-olds) web searching strategies and outcomes, and then analyzed their correlations with students’ web experiences, epistemological beliefs, and the nature of searching tasks. Eighty-seven eighth graders were asked to fill out a questionnaire for probing epistemological beliefs (from positivist to constructivist-oriented views) and finished three different types of searching tasks. Their searching process was recorded by screen capture software and answers were reviewed by two expert teachers based on their accuracy, richness and soundness. Five quantitative indicators were used to assess students’ searching strategies: number of keywords, visited pages, maximum depth of exploration, refinement of keyword, and number of words used in the first keyword. The main findings derived from this study suggested that, students with richer web experiences could find more correct answers in “close-ended” search tasks. In addition, students with better metacognitive skills such as keyword refinement tended to achieve more successful searching outcomes in such tasks. However, in “open-ended” tasks, where questions were less certain and answers were more elaborated, students who had more advanced epistemological beliefs, concurring with a constructivist view, had better searching outcomes in terms of their soundness and richness. This study has concluded that epistemological beliefs play an influential role in open-ended Internet learning environments.  相似文献   

12.
This study presents an analysis of users' queries directed at different search engines to investigate trends and suggest better search engine capabilities. The query distribution among search engines that includes spawning of queries, number of terms per query and query lengths is discussed to highlight the principal factors affecting a user's choice of search engines and evaluate the reasons of varying the length of queries. The results could be used to develop long to short term business plans for search engine service providers to determine whether or not to opt for more focused topic specific search offerings to gain better market share.  相似文献   

13.
The limited display size of current small Internet devices is becoming a serious obstacle to information access. In this paper, we introduce a Document REpresentation for Scalable Structures (DRESS) to help information providers make composite documents, typically web pages, scalable in both logic and layout structure to support effective information acquisition in heterogeneous environments. Through this novel document representation structure based on binary slicing trees, the document can dynamically adapt its presentation according to display sizes by maximizing the information throughput to users. We discuss the details of this structure with its key attributes. An automatic approach for generating this structure for existing web pages is also presented. A branch-and-bound algorithm and a capacity ratio-based slicing method are proposed to select proper content representation and aesthetic document layouts respectively. A set of user study experiments have been carried out and the results show that compared with the thumbnail-based approach, the DRESS-based interface can reduce browsing time by 23.5%. This work was performed when the second and the third authors were visiting students at Microsoft Research Asia.  相似文献   

14.
穆万军  游志胜  赵明华  余静 《计算机应用》2005,25(10):2310-2311
利用Grover量子搜索算法和概率论给出了挖掘网络数据的关联规则挖掘、权威页面挖掘和Weblog记录挖掘的一种新方法,最后说明该方法比任何经典方法要快得多。  相似文献   

15.
16.
The recent increase in HyperText Transfer Protocol (HTTP) traffic on the World Wide Web (WWW) has generated an enormous amount of log records on Web server databases. Applying Web mining techniques on these server log records can discover potentially useful patterns and reveal user access behaviors on the Web site. In this paper, we propose a new approach for mining user access patterns for predicting Web page requests, which consists of two steps. First, the Minimum Reaching Distance (MRD) algorithm is applied to find the distances between the Web pages. Second, the association rule mining technique is applied to form a set of predictive rules, and the MRD information is used to prune the results from the association rule mining process. Experimental results from a real Web data set show that our approach improved the performance over the existing Markov-model approach in precision, recall, and the reduction of user browsing time. Mei-Ling Shyu received her Ph.D. degree from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN in 1999, and three Master's degrees from Computer Science, Electrical Engineering, and Restaurant, Hotel, Institutional, and Tourism Management from Purdue University. She has been an Associate Professor in the Department of Electrical and Computer Engineering (ECE) at the University of Miami (UM), Coral Gables, FL, since June 2005, Prior to that, she was an Assistant Professor in ECE at UM dating from January 2000. Her research interests include data mining, multimedia database systems, multimedia networking, database systems, and security. She has authored and co-authored more than 120 technical papers published in various prestigious journals, refereed conference/symposium/workshop proceedings, and book chapters. She is/was the guest editor of several journal special issues. Choochart Haruechaiyasak received his Ph.D. degree from the Department of Electrical and Computer Engineering, University of Miami, in 2003 with the Outstanding Departmental Graduating Student award from the College of Engineering. After receiving his degree, he has joined the National Electronics and Computer Technology Center (NECTEC), located in Thailand Science Park, as a researcher in Information Research and Development Division (RDI). His current research interests include data/ text/ Web mining, Natural Language Processing, Information Retrieval, Search Engines, and Recommender Systems. He is currently leading a small group of researchers and programmer to develop an open-source search engine for Thai language. One of his objectives is to promote the use of data mining technology and other advanced applications in Information Technology in Thailand. He is also a visiting lecturer for Data Mining, Artificial Intelligence and Decision Support Systems courses in many universities in Thailand. Shu-Ching Chen received his Ph.D. from the School of Electrical and Computer Engineering at Purdue University, West Lafayette, IN, USA in December, 1998. He also received Master's degrees in Computer Science, Electrical Engineering, and Civil Engineering from Purdue University. He has been an Associate Professor in the School of Computing and Information Sciences (SCIS), Florida International University (FIU) since August, 2004. Prior to that, he was an Assistant Professor in SCIS at FIU dating from August, 1999. His main research interests include distributed multimedia database systems and multimedia data mining. Dr. Chen has authored and co-authored more than 140 research papers in journals, refereed conference/symposium/workshop proceedings, and book chapters. In 2005, he was awarded the IEEE Systems, Man, and Cybernetics Society's Outstanding Contribution Award. He was also awarded a University Outstanding Faculty Research Award from FIU in 2004, Outstanding Faculty Service Award from SCIS in 2004 and Outstanding Faculty Research Award from SCIS in 2002.  相似文献   

17.
This study investigates Chinese students' gender differences in their actual use of the web for online information seeking. One hundred and seven Chinese university students responded to questionnaires regarding their perceptions about the use of the web for learning purposes. Afterwards, all the participants were asked to search online to answer two questions about bees' decision for hive location. As they searched, the online system logged participants' search activities during the search, including the type of activities during search, the frequency of each activity and the time spent on each activity. Participants were compared by gender in terms of their web search efficacy, web search anxiety, frequency counts of different web search activities, time spent on each search activity and search task performance. Web search efficacy levels varied by gender but not by performance levels. Anxiety did not vary by gender or performance levels. The interaction effect between gender and performance level was found in several search process variables: significant gender differences were only found in medium-performing students wherein males were engaged in more search activities than females, as seen in the larger number of searches, search queries, and times male students updated the search queries. One factor that could explain the significant gender differences in the medium-level group was their web search efficacy. The more confident medium-performing male students were in web search, the less need they perceived to access information to solve the task. This pattern was reversed for medium-performing females. The high- and low-performing males did not differ much from females in their search activities. It appeared that students' perceptions of their web search ability did not contribute much to their search activities in these two groups. Implications of the findings were also discussed.  相似文献   

18.
Search engines are increasingly efficient at identifying the best sources for any given keyword query, and are often able to identify the answer within the sources. Unfortunately, many web sources are not trustworthy, because of erroneous, misleading, biased, or outdated information. In many cases, users are not satisfied with the results from any single source. In this paper, we propose a framework to aggregate query results from different sources in order to save users the hassle of individually checking query-related web sites to corroborate answers. To return the best answers to the users, we assign a score to each individual answer by taking into account the number, relevance and originality of the sources reporting the answer, as well as the prominence of the answer within the sources, and aggregate the scores of similar answers. We conducted extensive qualitative and quantitative experiments of our corroboration techniques on queries extracted from the TREC Question Answering track and from a log of real web search engine queries. Our results show that taking into account the quality of web pages and answers extracted from the pages in a corroborative way results in the identification of a correct answer for a majority of queries.  相似文献   

19.
Predicting the goals of internet users can be extremely useful in e-commerce, online entertainment, and many other internet-based applications. One of the crucial steps to achieve this is to classify internet queries based on available features, such as contextual information, keywords and their semantic relationships. Beyond these methods, in this paper we propose to mine user interaction activities to predict the intent of the user during a navigation session. However, since in practice it is necessary to use a suitable mix of all such methods, it is important to exploit all the mentioned features in order to properly classify users based on their common intents. To this end, we have performed several experiments aiming to empirically derive a suitable classifier based on the mentioned features.  相似文献   

20.
In the present article an approach to automatic determination of a user’s sphere of interests is proposed. The approach is based on a method involving clustering of documents which the user is interested in. The process of clustering of documents is reduced to a problem of discrete optimization for which quadratic-and linear-type models are proposed. Identification of interests makes it possible to determine the context of a request without any effort on the user’s part. Different methods are proposed for determining the context of a request. An ant algorithm for solving a quadratic-type discrete optimization problem is also proposed in the present study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号