首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper we address the issue of continuous keyword queries on multiple textual streams and explore techniques for extracting useful information from them. The paper represents, to our best knowledge, the first approach that performs keyword search on a multiplicity of textual streams. The scenario that we consider is quite intuitive; let’s assume that a research or financial analyst is searching for information on a topic, continuously polling data from multiple (and possibly heterogeneous) text streams, such as RSS feeds, blogs, etc. The topic of interest can be described with the aid of several keywords. Current filtering approaches would just identify single text streams containing some of the keywords. However, it would be more flexible and powerful to search across multiple streams, which may collectively answer the analyst’s question. We present such model that takes in consideration the continuous flow of text in streams and uses efficient pipelined algorithms such that results are output as soon as they are available. The proposed model is evaluated analytically and experimentally, where the Enron dataset and a variety of blog datasets are used for our experiments.  相似文献   

2.
With the rocket development of the Internet, WWW(World Wide Web), mobile computing and GPS (Global Positioning System) services, location-based services like Web GIS (Geographical Information System) portals are becoming more and more popular. Spatial keyword queries over GIS spatial data receive much more attention from both academic and industry communities than ever before. In general, a spatial keyword query containing spatial location information and keywords is to locate a set of spatial objects that satisfy the location condition and keyword query semantics. Researchers have proposed many solutions to various spatial keyword queries such as top-K keyword query, reversed kNN keyword query, moving object keyword query, collective keyword query, etc. In this paper, we propose a density-based spatial keyword query which is to locate a set of spatial objects that not only satisfies the query’s textual and distance condition, but also has a high density in their area. We use the collective keyword query semantics to find in a dense area, a group of spatial objects whose keywords collectively match the query keywords. To efficiently process the density based spatial keyword query, we use an IR-tree index as the base data structure to index spatial objects and their text contents and define a cost function over the IR-tree indexing nodes to approximately compute the density information of areas. We design a heuristic algorithm that can efficiently prune the region according to both the distance and region density in processing a query over the IR-tree index. Experimental results on datasets show that our method achieves desired results with high performance.  相似文献   

3.
Micro-blogging networks have become the most influential online social networks in recent years, more and more people are used to obtain and diffuse information in them. Detecting topics from a great number of tweets in micro-blogging is important for information propagation and business marketing, especially detecting emerging topics in the early period could strongly support these real-time intelligent systems, such as real-time recommendation, ad-targeting, marketing strategy. However, most of previous researches are useful to detect emerging topic on a large scale, but they are not so effective for the early detection due to less informative properties in a relatively small size. To solve this problem, we propose a new early detection method for emerging topics based on Dynamic Bayesian Networks in micro-blogging networks. We first analyze the topic diffusion process and find two main characteristics of emerging topic which are attractiveness and key-node. Then based on this finding, we select features from the topology properties of topic diffusion, and build a DBN-based model by the conditional dependencies between features to identify the emerging keywords. An emerging keyword not only occurs in a given time period with frequency properties, but also diffuses with specific topology properties. Finally, we cluster the emerging keywords into emerging topics by the co-occurrence relations between keywords. Based on the real data of Sina micro-blogging, the experimental results demonstrate that our method is effective and capable of detecting the emerging topics one to two hours earlier than the other methods.  相似文献   

4.
Keyword search is the most popular technique of searching information from XML (eXtensible markup language) document. It enables users to easily access XML data without learning the structure query language or studying the complex data schemas. Existing traditional keyword query methods are mainly based on LCA (lowest common ancestor) semantics, in which the returned results match all keywords at the granularity of elements. In many practical applications, information is often uncertain and vague. As a result, how to identify useful information from fuzzy data is becoming an important research topic. In this paper, we focus on the issue of keyword querying on fuzzy XML data at the granularity of objects. By introducing the concept of “object tree”, we propose the query semantics for keyword query at object-level. We find the minimum whole matching result object trees which contain all keywords and the partial matching result object trees which contain partial keywords, and return the root nodes of these result object trees as query results. For effectively and accurately identifying the top-K answers with the highest scores, we propose a score mechanism with the consideration of tf*idf document relevance, users’ preference and possibilities of results. We propose a stack-based algorithm named object-stack to obtain the top-K answers with the highest scores. Experimental results show that the object-stack algorithm outperforms the traditional XML keyword query algorithms significantly, and it can get high quality of query results with high search efficiency on the fuzzy XML document.  相似文献   

5.
Ping-I Chen  Shi-Jen Lin 《Knowledge》2011,24(3):393-405
In recent years, finding the most relevant documents or search results in a search engine has become an important issue. Most previous research has focused on expanding the keyword into a more meaningful sequence or using a higher concept to form the semantic search. All of those methods need predictive models, which are based on the training data or Web log of the users’ browsing behaviors. In this way, they can only be used in a single knowledge domain, not only because of the complexity of the model construction but also because the keyword extraction methods are limited to certain areas. In this paper, we describe a new algorithm called “Word AdHoc Network” (WANET) and use it to extract the most important sequences of keywords to provide the most relevant search results to the user. Our method needs no pre-processing, and all the executions are real-time. Thus, we can use this system to extract any keyword sequence from various knowledge domains. Our experiments show that the extracted sequence of the documents can achieve high accuracy and can find the most relevant information in the top 1 search results, in most cases. This new system can increase users’ effectiveness in finding useful information for the articles or research papers they are reading or writing.  相似文献   

6.
社交网络中隐式事件突发性检测   总被引:2,自引:0,他引:2  
介飞  谢飞  李磊  吴信东 《自动化学报》2018,44(4):730-742
社交网络与人们的生活息息相关,其上的用户行为可用于检测社交网络中的事件突发性,进而准确定位事件的发生区间.但用户行为易受主观及外部因素的影响,有时会出现隐式事件突发性,给事件突发性检测带来困难.本文针对社交网络中的隐式事件突发性问题,在以社交行为特征进行事件突发性检测的基础上,引入关键词特征,动态调整各个时间窗口的候选关键词,将不同事件与不同的关键词特征绑定,避免事件之间及噪音带来的干扰,实现对隐式事件突发性的准确识别.相关实验表明,本文提出的算法可有效改善现有社交网络中事件突发性检测任务的效果.  相似文献   

7.
With the flood and popularity of various multimedia contents on the Internet, searching for appropriate contents and representing them effectively has become an essential part for user satisfaction. So far, many contents recommendation systems have been proposed for this purpose. A popular approach is to select hot or popular contents for recommendation using some popularity metric. Recently, various social network services (SNSs) such as Facebook and Twitter have become a widespread social phenomenon owing to the smartphone boom. Considering the popularity and user participation, SNS can be a good source for finding social interests or trends. In this study, we propose a platform called TrendsSummary for retrieving trendy multimedia contents and summarizing them. To identify trendy multimedia contents, we select candidate keywords from raw data collected from Twitter using a syntactic feature-based filtering method. Then, we merge various keyword variants based on several heuristics. Next, we select trend keywords and their related keywords from the merged candidate keywords based on term frequency and expand them semantically by referencing portal sites such as Wikipedia and Google. Based on the expanded trend keywords, we collect four types of relevant multimedia contents—TV programs, videos, news articles, and images—from various websites. The most appropriate media type for the trend keywords is determined based on a naïve Bayes classifier. After classification, appropriate contents are selected from among the contents of the selected media type. Finally, both trend keywords and their related multimedia contents are displayed for effective browsing. We implemented a prototype system and experimentally demonstrated that our scheme provides satisfactory results.  相似文献   

8.
学术文献数据的爆炸增长使得找出重要的科研团队变得十分困难.如何评估团队的科研竞争力,比较不同的科研团队的科研成果,发现核心团队等问题变得越来越重要.为此,基于文献数据,构建了科研团队竞争力分级指标体系;通过引入派系强度改进了派系过滤算法,用于寻找合作网络中的科研团队;基于科研团队竞争力分级指标体系,设计了可视化对比分析方法,用于对比同一领域中不同的科研团队的科研竞争力.以天文领域真实的科研团队数据为例,实验结果和专家验证,证实了方法的实用性和有效性.  相似文献   

9.
Keyword search enables inexperienced users to easily search XML database with no specific knowledge of complex structured query languages and XML data schemas. Existing work has addressed the problem of selecting data nodes that match keywords and connecting them in a meaningful way, e.g., SLCA and ELCA. However, it is time-consuming and unnecessary to serve all the connected subtrees to the users because in general the users are only interested in part of the relevant results. In this paper, we propose a new keyword search approach which basically utilizes the statistics of underlying XML data to decide the promising result types and then quickly retrieves the corresponding results with the help of selected promising result types. To guarantee the quality of the selected promising result types, we measure the correlations between result types and a keyword query by analyzing the distribution of relevant keywords and their structures within the XML data to be searched. In addition, relevant result types can be efficiently computed without keyword query evaluation and any schema information. To directly return top-k keyword search results that conform to the suggested promising result types, we design two new algorithms to adapt to the structural sensitivity of the keyword nodes over the keyword search results. Lastly, we implement all proposed approaches and present the relevant experimental results to show the effectiveness of our approach.  相似文献   

10.
In an attempt to develop an understanding of existing research trends and to inform the development of new research in the field of telecommunications, literature reviews are being conducted. As an effort for investigating research trend, our research suggests the application of a text mining analysis technique to identify the knowledge structures of academic research in the field of telecommunications policy and to pinpoint future research opportunities. In this study, three analytical techniques were employed: a productivity analysis; a contents analysis based on topic modeling and word co-occurrence; and an author co-citation analysis based on a hierarchical clustering algorithm, multidimensional scaling, and a factor analysis. The findings from the research productivity analysis imply that the journal ‘Telecommunications Policy’ has greatly contributed to the publication of studies related to telecommunications policy. Moreover, our research institution analysis results indicate that telecommunication policy studies are undertaken by experts in various research fields. The contents and citation analysis results demonstrate that many studies related to telecommunications policy cover infrastructure-related topics, including the design, arrangement, and distribution of telecommunications networks. By contrast, recent studies are found to focus on the privacy and digital divide issues that may arise in connection with the application of telecommunications networks to other information technologies or industrial areas. However, the area of policy research that focuses on the application of information technologies still concentrates on the methods for the application of existing services—such as broadcasting and multimedia—without paying sufficient attention to the policy issues that may arise from the application of cloud computing, the Internet of Things, or big data analytics, services that have emerged with the recent expansion of wireless communications networks. In this sense, there is a need for discussions about the policies to respond to the increasing use of radio frequencies owing to the expansion of the Internet of Things, and to promote the efficient and safe control of data transmitted in real time on the wireless Internet. Studies of new technologies in the telecommunications policy field should be carried out in view of local and national characteristics. At the same time, further studies should consider efficient and reasonable ways to export telecommunications and networking technologies to countries that seek to invest in or expand their telecommunications networks to new information technologies. Expanding on this research, more text mining techniques for analyzing large amounts of text data and for clustering and visualizing them need to be considered.  相似文献   

11.
12.
文本的关键词识别是文本挖掘中的基本问题之一。在研究现有基于复杂网络的关键词识别方法的基础上,从整个复杂网络拓扑结构特征的信息缺失角度来考察各节点的重要程度。提出强度熵测度来量化评估各节点重要程度,用于解决中文关键词识别问题。实验结果表明,该评估方法简单有效,特别适用于带权复杂网络的节点重要性评估。  相似文献   

13.
网络节点预测研究目前主要集中在源头节点和隐藏节点预测方面,缺少新生节点预测方向的研究。以论文和关键词关系网为研究对象,利用关键词组合情况预测新论文的产生,开展新生节点预测研究。首先将论文—关键词二分网络加权投影成关键词关系网络,然后利用关键词组合在未来出现的可能性预测新论文的产生。计算这种可能性需考虑两方面影响:一种是相似性,表示关键词共同出现的倾向;一种是互斥性,描述关键词彼此排斥的倾向,如内涵高度一致的两个关键词极少同时出现。采集期刊的论文和关键词信息构建数据集,对提出的论文预测算法进行验证,并与已有算法作对比,结果显示该算法预测效果更好。  相似文献   

14.
基于Spark Streaming计算框架的分布式Top-K关键字查询是统计流数据中所有关键字的热点研究问题。多数研究通过限定存储空间来实现Top-K关键字查询,并假设关键字集合已知。针对这个问题,提出一种可应用于关键字集合未知情况的分布式Top-K关键字查询算法,根据监测到的关键字动态地调整存储空间,通过更新策略的优化提升其精度。实验结果表明,该算法的性能在关键字集合未知的情况下比现有算法更优。  相似文献   

15.
Aggregate keyword search on large relational databases   总被引:2,自引:1,他引:1  
Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem. We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods.  相似文献   

16.
Based on a scope of 10,120 articles on ANNs, this paper uses data mining including association rules and cluster analysis, to survey these ANNs papers through keyword classification and clustering of articles from 1995 to 2005, exploring the ANNs methodologies and application developments during that period. The four decision variables of keywords, author’s nationality, research category, and year of publication, are implemented for data mining with total of 110,080 data items. The research findings show that some specific ANNs methodologies and applications pattern can be extracted from the mining results, and these describe the ANNs development over this period. In addition, using more data mining approaches for analysis could provide different explanations for ANNs development. Finally, discussion and brief conclusion are presented.  相似文献   

17.
POLYPHONET: An advanced social network extraction system from the Web   总被引:1,自引:0,他引:1  
Social networks play important roles in the Semantic Web: knowledge management, information retrieval, ubiquitous computing, and so on. We propose a social network extraction system called POLYPHONET, which employs several advanced techniques to extract relations of persons, to detect groups of persons, and to obtain keywords for a person. Search engines, especially Google, are used to measure co-occurrence of information and obtain Web documents.

Several studies have used search engines to extract social networks from the Web, but our research advances the following points: first, we reduce the related methods into simple pseudocodes using Google so that we can build up integrated systems. Second, we develop several new algorithms for social network mining such as those to classify relations into categories, to make extraction scalable, and to obtain and utilize person-to-word relations. Third, every module is implemented in POLYPHONET, which has been used at four academic conferences, each with more than 500 participants. We overview that system. Finally, a novel architecture called Iterative Social Network Mining is proposed. It utilizes simple modules using Google and is characterized by scalability and relate–identify processes: identification of each entity and extraction of relations are repeated to obtain a more precise social network.  相似文献   


18.
Mining the interests of Chinese microbloggers via keyword extraction   总被引:1,自引:0,他引:1  
Microblogging provides a new platform for communicating and sharing information among Web users. Users can express opinions and record daily life using microblogs. Microblogs that are posted by users indicate their interests to some extent. We aim to mine user interests via keyword extraction from microblogs. Traditional keyword extraction methods are usually designed for formal documents such as news articles or scientific papers. Messages posted by microblogging users, however, are usually noisy and full of new words, which is a challenge for keyword extraction. In this paper, we combine a translation-based method with a frequency-based method for keyword extraction. In our experiments, we extract keywords for microblog users from the largest microblogging website in China, Sina Weibo. The results show that our method can identify users’ interests accurately and efficiently.  相似文献   

19.
随着数据量的上涨、计算机运算力的提升和深度学习算法的出现,人工智能受到越来越多的关注.文中以美国核心期刊数据库Web of Science收录的与人工智能相关的6879篇期刊论文为研究对象,以时空知识图谱及内容知识图谱分析为主要研究方法,基于信息可视化软件CiteSpace从合作国家、研究机构、引用文献、关键词和突现词五个方面对文献大数据进行可视化比较和分析,明确了人工智能领域的研究现状及重要文献,揭示了人工智能领域的研究热点和前沿.最后,通过对五个可视化分析方面的总结,给出了在人工智能领域中选择科研方向、探测学科前沿、辅助科技决策等方面的重要参考.  相似文献   

20.
In a time where the training of new machine learning models is extremely time-consuming and resource-intensive and the sale of these models or the access to them is more popular than ever, it is important to think about ways to ensure the protection of these models against theft. In this paper, we present a method for estimating the similarity or distance between two black-box models. Our approach does not depend on the knowledge about specific training data and therefore may be used to identify copies of or stolen machine learning models. It can also be applied to detect instances of license violations regarding the use of datasets. We validate our proposed method empirically on the CIFAR-10 and MNIST datasets using convolutional neural networks, generative adversarial networks and support vector machines. We show that it can clearly distinguish between models trained on different datasets. Theoretical foundations of our work are also given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号