期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multi-feature hierarchical topic models for human behavior recognition

LI HePing ZHANG Feng ZHANG ShuWu 《中国科学:信息科学(英文版)》2014,(9):83-97

Human behavior recognition is one important task of image processing and surveillance system. One main challenge of human behavior recognition is how to effectively model behaviors on condition of unconstrained videos due to tremendous variations from camera motion,background clutter,object appearance and so on. In this paper,we propose two novel Multi-Feature Hierarchical Latent Dirichlet Allocation models for human behavior recognition by extending the bag-of-word topic models such as the Latent Dirichlet Allocation model and the Multi-Modal Latent Dirichlet Allocation model. The two proposed models with three hierarchies including low-level visual features,feature topics,and behavior topics can effectively fuse two different types of features including motion and static visual features,avoid detecting or tracking the motion objects,and improve the recognition performance even if the features are extracted with a great amount of noise. Finally,we adopt the variational EM algorithm to learn the parameters of these models. Experiments on the YouTube dataset demonstrate the effectiveness of our proposed models. 相似文献

2.

在线零售站点的自适应和商业智能的发现

王实高文郎金文李锦涛《计算机科学》2002,29(1):30-35

There are two important problems in online retail:1)The conflict between the different interest of all customers to the different commodities and the commodity classification structure of Web site;2)Many customers will simultaneously buy both the beer and the diaper that are classified in different classes and levels in the Web site,which is the typical problem in data mining.The two problems will make majority customers access overabundant Web pages.To sove these problems,we mine the Web page data,server data,and marketing data to build an adaptive model.In this model,the frequently purchased commodities and their association commodity sets that are discovered by the association rule discovery will be put into the suitable Web page according to the placing method and the backing off method.At last the navigation Web pages become the navigation content Web pages.The Web site can be adaptive according to the users‘‘‘‘‘‘‘‘accesa and purchase information.In online retail,the designers require to understand the latent users‘‘‘‘‘‘‘‘interest in order to convert the latent users to purchase users.In this paper,we give the approach to discover the Internet marketing intelligence through OLAP in order to help the designers to improve their service. 相似文献

3.

Topic-aware pivot language approach for statistical machine translation

Jin-song SU Xiao-dong SHI Yan-zhou HUANG Yang LIU Qing-qiang WU Yi-dong CHEN Huai-lin DONG 《浙江大学学报:C卷英文版》2014,15(4):241-253

The pivot language approach for statistical machine translation （SMT） is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot- side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT. 相似文献

4.

Analyzing market performance via social media: a case study of a banking industry crisis

CuiQing Jiang Kun Liang Hsinchun Chen Yong Ding 《中国科学:信息科学(英文版)》2014,57(5):1-18

Analyzing market performance via social media has attracted a great deal of attention in the finance and machine-learning disciplines.However,the vast majority of research does not consider the enormous influence a crisis has on social media that further affects the relationship between social media and the stock market.This article aims to address these challenges by proposing a multistage dynamic analysis framework.In this framework,we use an authorship analysis technique and topic model method to identify stakeholder groups and topics related to a special firm.We analyze the activities of stakeholder groups and topics in different periods of a crisis to evaluate the crisis’s influence on various social media parameters.Then,we construct a stock regression model in each stage of crisis to analyze the relationships of changes among stakeholder groups/topics and stock behavior during a crisis.Finally,we discuss some interesting and significant results,which show that a crisis affects social media discussion topics and that different stakeholder groups/topics have distinct effects on stock market predictions during each stage of a crisis. 相似文献

5.

A representative model based algorithm for maximal contractions 总被引：1，自引：0，他引：1

JIANG DongChen LOU YiHua JIN Yi LUO Jie LI Wei 《中国科学:信息科学(英文版)》2013,(1):5-17

In this paper,we propose a representative model based algorithm to calculate maximal contractions.For a formal theory Γ and a fact set Δ,the algorithm begins by accepting all models of refutation by facts of Γ with respect to Δ and filters these models to obtain the models of R-refutation.According to the completeness of R-calculus,the relevant maximal contraction is obtained simultaneously.In order to improve the efficiency,we divide the models into different classes according to the assignments of atomic propositions and only select relevant representative models to verify the satisfiability of each proposition.The algorithm is correct,and all maximal contractions of a given problem can be calculated by it.Users could make a selection according to their requirements.A benchmark algorithm is also provided.Experiments show that the algorithm has a good performance on normal revision problems. 相似文献

6.

The changes on synchronizing ability of coupled networks from ring networks to chain networks

Han XiuPing Lu JunAn 《中国科学F辑(英文版)》2007,50(4):615-624

In this paper, two different ring networks with unidirectional couplings and with bidirectional couplings were discussed by theoretical analysis. It was found that the effects on synchronizing ability of the two different structures by cutting a link are completely opposite. The synchronizing ability will decrease if the change is from bidirectional ring to bidirectional chain. Moreover, the change on synchro- nizing ability will be four times if the number of N is large enough. However, it will increase obviously from unidirectional ring to unidirectional chain. It will be N 2/(2π2) times if the number of N is large enough. The numerical simulations confirm the conclusion in quality. This paper also discusses the effects on synchronization by adding one link with different length d to these two different structures. It can be seen that the effects are different. Theoretical results are accordant to numerical simulations. Synchronization is an essential physics problem. These results pro- posed in this paper have some important reference meanings on the real world networks, such as the bioecological system networks, the designing of the circuit, etc. 相似文献

7.

Frequent Pattern Mining on Personal Trajectory from Telecom Base Stations

下载免费PDF全文

Shuai Qi Guihua Shan Dong Tian Shuang Xi Jun Liu 《International Journal of Software and Informatics》2016,10(3)

Mining frequent patterns from people’s trajectory has become a hot topic in big data research. Previously, these data mostly come from GPS. Compared with GPS data which is more densely sampled, base station data is extremely sparse in both time and space. Trajectory discovery from base station data becomes much more challenging. In this paper, we propose a new method to effectively solve this problem. In our method, we assume that the locations of objects are sampled over a long time period. First, sequential pattern mining algorithm is employed to find frequent passing areas of a person’s route every day. Second, frequent paths are pieced together by points of records which pass through frequent passing area. Finally, to ensure credibility and efficiency, we depend on the location information provided by scattered points which piece together frequent paths to mine frequent road paths. 相似文献

8.

Computation on Sentence Semantic Distance for Novelty Detection 总被引：1，自引：0，他引：1

下载免费PDF全文

Hua-Ping Zhang Jian Sun Bing Wang and Shuo Bai 《计算机科学技术学报》2005,20(3):331-337

Novelty detection is to retrieve new information and filter redundancy from given sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach to novelty detection with semantic distance computation. The motivation is to expand a sentence by introducing semantic information. Computation on semantic distance between sentences incorporates WordNet with statistical information. The novelty detection is treated as a binary classification problem: new sentence or not. The feature vector, used in the vector space model for classification, consists of various factors, including the semantic distance from the sentence to the topic and the distance from the sentence to the previous relevant context occurring before it. New sentences are then detected with Winnow and support vector machine classifiers, respectively. Several experiments are conducted to survey the relationship between different factors and performance. It is proved that semantic computation is promising in novelty detection. The ratio of new sentence size to relevant size is further studied given different relevant document sizes. It is found that the ratio reduced with a certain speed (about 0.86). Then another group of experiments is performed supervised with the ratio. It is demonstrated that the ratio is helpful to improve the novelty detection performance. 相似文献

9.

The analysis and improvement of Apriori algorithm

HAN Feng ZHANG Shu-mao DU Ying-shuang 《通讯和计算机》2008,5(9):12-18

The data mining of association rules is an essential research aspect in the data mining fields. Association rules reflect the inner relationship of data. Discovering these associations is beneficial to the correct and appropriate decision made by decision-makers. Association rules is an important subject of data mining study. The association rules provide an effective means to found the potential link between the data, reflecting a built-in association between the data. In this paper, from the study of data mining technology, we make a in-depth study of the mining association rules, and on the basis, we analyzing the classic method of mining association rules-Apriori algorithm, pointing out its weaknesses, and putting tbrward a new improved algorithm-AprioriMend algorithm. 相似文献

10.

Bifurcation Control,Manufacturing Planning and Formation Control

Wei Kang Mumin Song Ning Xi 《自动化学报》2005,31(1):84-91

The paper consists of three topics on control theory and engineering applications, namely bifurcation control, manufacturing planning, and formation control. For each topic, we summarize the control problem to be addressed and some key ideas used in our recent research. Interested readers are referred to related publications for more details. Each of the three topics in this paper is technically independent from the other ones. However, all three parts together reflect the recent research activities of the first author, jointly with other researchers in different fields. 相似文献

11.

基于子主题概念的Web主题挖掘 总被引：1，自引：0，他引：1

熊朝松甘岚《计算机与现代化》2006,(4):63-65,68

为了帮助用户在Web上查找和编辑具体主题知识，本文给出一种基于子主题概念的挖掘算法。基本思想是：给定一个主题，通过搜索引擎返回的页面集合找出主题的子主题或核心概念，得到包含具体主题及子主题概念的页面，使用户无需浏览所有页面就能获取查询主题系统的、全面的知识。相似文献

12.

Beyond Single-Page Web Search Results

Varadarajan R. Hristidis V. Tao Li 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(3):411-424

Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit for search and retrieval is an individual page, even though information on a topic is often spread across multiple pages. This degrades the quality of search results, especially for long or uncorrelated (multitopic) queries (in which individual keywords rarely occur together in the same document), where a single page is unlikely to satisfy the user's information need. We propose a technique that, given a keyword query, on the fly generates new pages, called composed pages, which contain all query keywords. The composed pages are generated by extracting and stitching together relevant pieces from hyperlinked Web pages and retaining links to the original Web pages. To rank the composed pages, we consider both the hyperlink structure of the original pages and the associations between the keywords within each page. Furthermore, we present and experimentally evaluate heuristic algorithms to efficiently generate the top composed pages. The quality of our method is compared to current approaches by using user surveys. Finally, we also show how our techniques can be used to perform query-specific summarization of Web pages. 相似文献

13.

基于爬虫和网站分类的主题信息源发现方法

邓厚平武刚《计算机工程与应用》2016,52(3):59-65

如何发现主题信息源是主题Web信息整合的前提。提出了一种主题信息源发现方法,将主题信息源发现转化为网站主题分类问题,并利用站外链接发现新的信息源。从网站中提取出能反映网站主题的内容特征词和结构特征词,建立描述网站主题的改进的向量空间模型。以该模型为基础,通过类中心向量法与SVM相结合对网站主题进行分类。提出一种能尽量少爬取网页的网络搜索策略,在发现站外链接的同时爬取最能代表网站主题的页面。将该主题信息源发现方法应用于林业商务信息源,通过实验验证了该方法的有效性。相似文献

14.

Mining Web informative structures and contents based on entropy analysis 总被引：3，自引：0，他引：3

Hung-Yu Kao Shian-Hua Lin Jan-Ming Ho Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(1):41-55

We study the problem of mining the informative structure of a news Web site that consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of contents, pages) and a set of article pages linked by these TOC pages. Based on the Hyperlink Induced Topics Search (HITS) algorithm, we propose an entropy-based analysis (LAMIS) mechanism for analyzing the entropy of anchor texts and links to eliminate the redundancy of the hyperlinked structure so that the complex structure of a Web site can be distilled. However, to increase the value and the accessibility of pages, most of the content sites tend to publish their pages with intrasite redundant information, such as navigation panels, advertisements, copy announcements, etc. To further eliminate such redundancy, we propose another mechanism, called InfoDiscoverer, which applies the distilled structure to identify sets of article pages. InfoDiscoverer also employs the entropy information to analyze the information measures of article sets and to extract informative content blocks from these sets. Our result is useful for search engines, information agents, and crawlers to index, extract, and navigate significant information from a Web site. Experiments on several real news Web sites show that the precision and the recall of our approaches are much superior to those obtained by conventional methods in mining the informative structures of news Web sites. On the average, the augmented LAMIS leads to prominent performance improvement and increases the precision by a factor ranging from 122 to 257 percent when the desired recall falls between 0.5 and 1. In comparison with manual heuristics, the precision and the recall of InfoDiscoverer are greater than 0.956. 相似文献

15.

一种基于支持向量机的专业中文网页分类器 总被引：4，自引：1，他引：4

李亮刘万春徐泉清朱玉文《计算机应用》2004,24(4):58-61

文中提出了一种基于支持向量机的专业中文网页分类算法，利用支持向量机对网页进行二类分类，找出所需专业的中文网页；然后利用向量空间模型，对分类好的专业网页进行多类分类。在构造支持向量机的过程中，为了提高分类的召回率，采用了一种偏移因子。该算法只需要计算二类SVM分类器，实验表明，它不仅具有较高的训练效率，同时能得到很高的分类精确率和召回率。相似文献

16.

A topic-specific crawling strategy based on semantics similarity

《Data & Knowledge Engineering》2013

With the Internet growing exponentially, search engines are encountering unprecedented challenges. A focused search engine selectively seeks out web pages that are relevant to user topics. Determining the best strategy to utilize a focused search is a crucial and popular research topic. At present, the rank values of unvisited web pages are computed by considering the hyperlinks (as in the PageRank algorithm), a Vector Space Model and a combination of them, and not by considering the semantic relations between the user topic and unvisited web pages. In this paper, we propose a concept context graph to store the knowledge context based on the user's history of clicked web pages and to guide a focused crawler for the next crawling. The concept context graph provides a novel semantic ranking to guide the web crawler in order to retrieve highly relevant web pages on the user's topic. By computing the concept distance and concept similarity among the concepts of the concept context graph and by matching unvisited web pages with the concept context graph, we compute the rank values of the unvisited web pages to pick out the relevant hyperlinks. Additionally, we constitute the focused crawling system, and we retrieve the precision, recall, average harvest rate, and F-measure of our proposed approach, using Breadth First, Cosine Similarity, the Link Context Graph and the Relevancy Context Graph. The results show that our proposed method outperforms other methods. 相似文献

17.

基于动态隧道技术的主题爬行策略

姜琨朱磊王一川《计算机系统应用》2020,29(3):253-260

互联网网页所形成的主题孤岛严重影响了搜索引擎系统的主题爬虫性能,通过人工增加大量的初始种子链接来发现新主题的方法无法保证主题网页的全面性.在分析传统基于内容分析、基于链接分析和基于语境图的主题爬行策略的基础上,提出了一种基于动态隧道技术的主题爬虫爬行策略.该策略结合页面主题相关度计算和URL链接相关度预测的方法确定主题孤岛之间的网页页面主题相关性,并构建层次化的主题判断模型来解决主题孤岛之间的弱链接问题.同时,该策略能有效防止主题爬虫因采集过多的主题无关页面而导致的主题漂移现象,从而可以实现在保持主题语义信息的爬行方向上的动态隧道控制.实验过程利用主题网页层次结构检测页面主题相关性并抽取“体育”主题关键词,然后以此对采集的主题网页进行索引查询测试.结果表明,基于动态隧道技术的爬行策略能够较好的解决主题孤岛问题,明显提升了“体育”主题搜索引擎的准确率和召回率. 相似文献

18.

一种面向主题的关键词查询扩展方法

王力李培峰朱巧明《计算机应用与软件》2011,28(12)

提出一种基于局部统计和语义扩展相结合,面向主题的关键词查询扩展方法。该方法通过对给定主题的初始关键词搜索反馈网页进行分析,采用TF*PSF语义加权方法计算主题候选词的权重来进一步筛选主题关键词。在此基础上,设计了面向Web的主题关键词迭代查询扩展算法,采用主题关键词的组合查询策略,迭代扩展出主题的关键词集合。实验证明该方法是有效的。相似文献

19.

Topic-specific crawling on the Web with the measurements of the relevancy context graph

《Information Systems》2006,31(4-5):232-246

One of the major problems for automatically constructed portals and information discovery systems is how to assign proper order to unvisited web pages. Topic-specific crawlers and information seeking agents should try not to traverse the off-topic areas and concentrate on links that lead to documents of interest. In this paper, we propose an effective approach based on the relevancy context graph to solve this problem. The graph can estimate the distance and the relevancy degree between the retrieved document and the given topic. By calculating the word distributions of the general and topic-specific feature words, our method will preserve the property of the relevancy context graph and reflect it on the word distributions. With the help of topic-specific and general word distribution, our crawler can measure a page's expected relevancy to a given topic and determine the order in which pages should be visited first. Simulations are also performed, and the results show that our method outperforms than the breath-first and the method using only the context graph. 相似文献

20.

基于超链接和内容相关度的检索算法

张娜张化祥《计算机应用》2006,26(5):1171-1173

在网络环境下，经典的链接分析方法（HITS算法）过多的关注网页的权威性，忽视了其主题相关度，易产生主题漂移现象。文本在简要介绍HITS算法的基础上，分析了其产生主题漂移的原因，并结合内容相关度评价方法，提出了一种新的搜索算法——WHITS算法。实验表明，该算法挖掘了超链接间的潜在语义关系，能有效的引导主题挖掘。相似文献