首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
In this work we study how people navigate the information network of Wikipedia and investigate (i) free-form navigation by studying all clicks within the English Wikipedia over an entire month and (ii) goal-directed Wikipedia navigation by analyzing wikigames, where users are challenged to retrieve articles by following links. To study how the organization of Wikipedia articles in terms of layout and links affects navigation behavior, we first investigate the characteristics of the structural organization and of hyperlinks in Wikipedia and then evaluate link selection models based on article structure and other potential influences in navigation, such as the generality of an article's topic. In free-form Wikipedia navigation, covering all Wikipedia usage scenarios, we find that click choices can be best modeled by a bias towards article structure, such as a tendency to click links located in the lead section. For the goal-directed navigation of wikigames, our findings confirm the zoom-out and the homing-in phases identified by previous work, where users are guided by generality at first and textual similarity to the target later. However, our interpretation of the link selection models accentuates that article structure is the best explanation for the navigation paths in all except these initial and final stages. Overall, we find evidence that users more frequently click on links that are located close to the top of an article. The structure of Wikipedia articles, which places links to more general concepts near the top, supports navigation by allowing users to quickly find the better-connected articles that facilitate navigation. Our results highlight the importance of article structure and link position in Wikipedia navigation and suggest that better organization of information can help make information networks more navigable.  相似文献   

2.
Using Wikipedia knowledge to improve text classification   总被引:7,自引:7,他引:0  
Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the “Bag of Words” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.
Pu WangEmail:
  相似文献   

3.
基于维基百科的语义知识库及其构建方法研究*   总被引:1,自引:0,他引:1  
维基百科(Wikipedia)是规模最大的在线网络百科全书之一,采用群体在线合作编辑的Wiki机制,具有质量高、覆盖广、实时演化和半结构化等特点,是用来构建语义知识库的优质语料来源。分析了维基百科语料库的基本情况,综述了目前基于维基百科所构建的多种语义知识库及其概念抽取和关系抽取方法,讨论了各类方法的优缺点、开放问题和可能的研究方向。  相似文献   

4.
Recently, Wikipedia has garnered increasing public attention. However, few studies have examined the intentions of individuals who edit Wikipedia content. Furthermore, previous studies ascribed a ‘knowledge sharing’ label to Wikipedia content editors. However, in this work, Wikipedia can be viewed as a platform that allows individuals to show their expertise. This study investigates the underlying reasons that drive individuals to edit Wikipedia content. Based on expectation-confirmation theory and expectancy-value theory for achievement motivations, we propose an integrated model that incorporates psychological and contextual perspectives. Wikipedians from the English-language Wikipedia site were invited to survey. Partial least square was applied to test our proposed model. Analytical results indicated and confirmed that subjective task value, commitment, and procedural justice were significant to satisfaction of Wikipedians; and satisfaction significantly influenced continuance intention to edit Wikipedia content.  相似文献   

5.
A proper semantic representation of textual information underlies many natural language processing tasks. In this paper, a novel semantic annotator is presented to generate conceptual features for text documents. A comprehensive conceptual network is automatically constructed with the aid of Wikipedia that has been represented as a Markov chain. Furthermore, semantic annotator gets a fragment of natural language text and initiates a random walk to generate conceptual features that represent topical semantic of the input text. The generated conceptual features are applicable to many natural language processing tasks where the input is textual information and the output is a decision based on its context. Consequently, the effectiveness of the generated features is evaluated in the task of document clustering and classification. Empirical results demonstrate that representing text using conceptual features and considering the relations between concepts can significantly improve not only the bag of words representation but also other state‐of‐the‐art approaches.  相似文献   

6.
维基百科(Wikipedia)提供了海量的描述著名概念的高质量文章,丰富的图片使它们有更高的价值。但大部分Wikipedia文章都没有图片或图很少,为此给出了综合的框架WIMAGE来为Wikipedia文章发现高精度、高召回度和高多样性图片。WIMAGE包括生成查询的方法及两种图片排序方法。采用Wikipedia中4个常见类别的40篇文章进行实验,结果显示WIMAGE能有效地为Wikipedia文章发现高精度、高召回度以及高多样性的图片,且同时考虑了视觉相似度和文本相似度的排序方法效果最好。  相似文献   

7.
We present Wiser, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking.Wiser indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author’s publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author’s expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia.At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author’s documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author’s expertise and the query topic via the above graph-based profiles.The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that Wiser achieves better performance than all the other competitors, thus proving the effectiveness of modeling author’s profile via our “semantic” graph of entities. Finally, we comment on the use of Wiser for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University.  相似文献   

8.
随着微博、照片分享等社会化媒体的快速发展,每天产生了大量的短文本内容如评论、微博等,对其进行深入挖掘有重大的应用价值和学术意义。该文选取微博作为例子,详细阐述我们提出的方法。微博信息流因其简短和实时的特性而具有非常大的价值,已经成为市场营销,股票预测、舆情监控等应用的重要信息源。尽管如此,微博内容特征极其稀疏、上下文语境提取困难,使得微博信息的挖掘面临着很大挑战。因此,我们提出一种基于Wikipedia的微博语义概念扩展方法,通过自动识别那些与微博信息语义相关的Wikipedia概念来丰富它的内容特征,从而有效提高微博信息数据挖掘和分析的效果。该文工作首先通过可链接性剪枝、概念关联和消歧,发现微博信息中重要的n-gram所对应的Wikipedia概念;其次,采用基于概念-文档关联矩阵的NMF分解(非负矩阵分解)方法获取Wikipedia概念之间的语义近邻,为微博信息扩展相关的语义概念。基于TREC 2011的微博数据集和Wikipedia 2011数据集进行实验,与已有两个相关研究工作比较,该文提出的方法取得了较好的效果。  相似文献   

9.
Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet’s noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.  相似文献   

10.
In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot be compensated for even by sophisticated machine learning algorithms. It confirms the thesis that proper data representation is a prerequisite for achieving high-quality results of data analysis. Evaluation of the text representations was performed within the Wikipedia repository by examination of classification parameters observed during automatic reconstruction of human-made categories. For that purpose, we use a classifier based on a support vector machines method, extended with multilabel and multiclass functionalities. During classifier construction we observed parameters such as learning time, representation size, and classification quality that allow us to draw conclusions about text representations. For the experiments presented in the article, we use data sets created from Wikipedia dumps. We describe our software, called Matrix’u, which allows a user to build computational representations of Wikipedia articles. The software is the second contribution of our research, because it is a universal tool for converting Wikipedia from a human-readable form to a form that can be processed by a machine. Results generated using Matrix’u can be used in a wide range of applications that involve usage of Wikipedia data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号