共查询到20条相似文献,搜索用时 93 毫秒
1.
《Knowledge and Data Engineering, IEEE Transactions on》2008,20(11):1550-1565
Nowadays, searches for webpages of a person with a given name constitute a notable fraction of queries to web search engines. Such a query would normally return webpages related to several namesakes, who happened to have the queried name, leaving the burden of disambiguating and collecting pages relevant to a particular person (from among the namesakes) on the user. In this article we develop a Web People Search approach that clusters webpages based on their association to different people. Our method exploits a variety of semantic information extracted from Web pages, such as named entities and hyperlinks, to disambiguate among namesakes referred to on the Web pages. We demonstrate the effectiveness of our approach by testing the efficacy of the disambiguation algorithms and its impact on person search. 相似文献
2.
3.
4.
在系统中搜索某一姓名时,会返回该同名作者的所有文档(如论文、网页),严重影响用户体验,姓名消歧可提高检索精度.因此,文中提出基于异质网络表示学习的姓名消歧方法.首先为每个歧义姓名构造一个论文异质网络.然后使用异质网络表示学习并结合词向量化语义表征学习方法,获取网络中每个论文节点的表征向量.最后使用具有噪声的基于密度的聚类方法与规则匹配结合的聚类方法将论文划分给不同的作者实体.文中方法在OAG-WholsWho比赛数据集上的性能较优,结果验证方法的有效性. 相似文献
5.
针对知识库中存在单条实体定义特征稀疏和人工设置相似度阈值适用性不强的问题,本文提出了一种基于分步聚类的人名消歧算法。首先,将知识库中人名实体定义的人物属性特征作为查询特征,利用文本检索的方式实现基于知识库的初次聚类,弥补了知识库中单条实体定义中特征稀疏的问题;然后,利用初次聚类的结果,采用基于自适应阈值的凝聚层次聚类算法实现知识库人名消歧;最后,采用条件随机场进行Other类识别,利用基于自适应阈值的凝聚层次聚类完成S类聚类,从而实现非知识库人名消歧。在CLP2012的中文人名消歧评测语料上进行实验,结果表明本文的算法能够有效地对人名进行消歧。 相似文献
6.
人名歧义是一种身份不确定的现象,指的是文本中具有相同姓名的字符串指向现实世界中的不同实体人物。人名消歧很长时间一直是一个具有挑战性的问题,关注网页里的人名消歧的问题。因为经典的K-means算法如果选择了一个差的随机初始聚类中心,算法会遇到局部收敛的问题,所以文章提出一种基于最大最小原则的改进的K-means算法来进行人名消歧。同时使用了WePS的训练数据作为实验的语料。实验结果表明,改进的方法比层次聚类方法有着更好的性能。 相似文献
7.
Ruben Verborgh Davy Van Deursen Erik Mannens Chris Poppe Rik Van de Walle 《Multimedia Tools and Applications》2012,61(1):105-129
Automatic generation of metadata, facilitating the retrieval of multimedia items, potentially saves large amounts of manual work. However, the high specialization degree of feature extraction algorithms makes them unaware of the context they operate in, which contains valuable and often necessary information. In this paper, we show how Semantic Web technologies can provide a context that algorithms can interact with. We propose a generic problem-solving platform that uses Web services and various knowledge sources to find solutions to complex requests. The platform employs a reasoner-based composition algorithm, generating an execution plan that combines several algorithms as services. It then supervises the execution of this plan, intervening in case of errors or unexpected behavior. We illustrate our approach by a use case in which we annotate the names of people depicted in a photograph. 相似文献
8.
E. Lefever Author Vitae T. Fayruzov Author Vitae M. De Cock Author Vitae 《Information Sciences》2010,180(17):3192-4672
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not. 相似文献
9.
One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named entities, such as persons, companies, geographic locations, in Web documents. The names of these extracted entities are, however, non-unique; the same name on different Web pages might or might not refer to the same entity. The entity matching problem concerns of identifying the entities, which are referring to the same real-world entity. This problem is very similar to the entity resolution problem studied in relational databases, however, there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the entities.Similarity functions try to capture the degree of belief about the equivalence of two entities, thus they play a crucial role in entity matching. The accuracy of the similarity functions highly depends on the applied assessment techniques, but also on some specific features of the entities. We propose systematic design strategies for combined similarity functions in this context. Our method relies on the combination of multiple evidences, with the help of estimated quality of the individual similarity values and with particular attention to missing information that is common in Web context. We study the effectiveness of our method in two specific instances of the general entity matching problem, namely the person name disambiguation and the Twitter message classification problem. In both cases, using our techniques in a very simple algorithmic framework we obtained better results than the state-of-the-art methods. 相似文献
10.
11.
12.
Yuhua Li Aiming Wen Quan Lin Ruixuan Li Zhengding Lu 《Artificial Intelligence Review》2014,41(4):563-578
Name disambiguation is a very critical problem in scientific cooperation network. Ambiguous author names may occur due to the existence of multiple authors with the same name. Despite much research work has been conducted, the problem is still not resolved and becomes even more serious. In this paper, we focus ourselves on such problem. A method of exploiting user feedback for name disambiguation in scientific cooperation network is proposed, which can make use of user feedback to enhance the performance. Furthermore, to make the user feedback more effective, we divide user feedback into three types and assign different weights to them. To evaluate the effectiveness of our proposed method, experiments are conducted with standard public collections. We compare the performance of our proposal with baseline methods. Results show that the proposed algorithm outperforms the previous methods without introducing user interactions. Besides, we investigate into how different types of user feedback can affect the disambiguation results. 相似文献
13.
Hsin-Tsung Peng Cheng-Yu Lu William Hsu Jan-Ming Ho 《Expert systems with applications》2012,39(12):10521-10532
Members of the academic community have increasingly turned to digital libraries to search for the latest work of their peers. On account of their role in the academic community, it is very important that these digital libraries collect citations in a consistent, accurate, and up-to-date manner, yet they do not correctly compile citations for myriads of authors for various reasons including authors with the same name, a problem known as the “name ambiguity problem.” This problem occurs when multiple authors share the same name and particularly when names are simplified as in cases where names merely contain the first initial and the last name. This paper proposes a reliable and accurate pair-wise similarities approach to disambiguate names using supervised classification on Web correlations and authorship correlations. This approach makes use of Web correlations among citations assuming citations that co-refer on publication lists on the Web should to refer to the same author. This approach also makes use of authorship correlations assuming citations with the same rare author name refer to the same author, and furthermore, citations with the same full names of authors or e-mail addresses likely refer to the same author. These two types of correlations are measured in our approach using pair-wise similarity metrics. In addition, a binary classifier, as part of supervised classification, is applied to label matching pairs of citations using pair-wise similarity metrics, and these labels are then used to group citations into different clusters such that each cluster represents an individual author. Results show our approach greatly improves upon the name disambiguation accuracy and performance of other proposed approaches, especially in some name clusters with high degree of ambiguity. 相似文献
14.
15.
Xueping Su Jinye Peng Xiaoyi Feng Jun Wu Jianping Fan Li Cui 《Multimedia Tools and Applications》2014,73(3):1643-1661
For automatically mining the underlying relationships between different famous persons in daily news, for example, building a news person based network with the faces as icons to facilitate face-based person finding, we need a tool to automatically label faces in new images with their real names. This paper studies the problem of linking names with faces from large-scale news images with captions. In our previous work, we proposed a method called Person-based Subset Clustering which is mainly based on face clustering for all face images derived from the same name. The location where a name appears in a caption, as well as the visual structural information within a news image provided informative cues such as who are really in the associated image. By combining the domain knowledge from the captions and the corresponding image we propose a novel cross-modality approach to further improve the performance of linking names with faces. The experiments are performed on the data sets including approximately half a million news images from Yahoo! news, and the results show that the proposed method achieves significant improvement over the clustering-only methods. 相似文献
16.
17.
人名识别常被作为命名实体识别任务的一部分,与其他类型的实体同时进行识别。当前使用NER方法的人名识别依赖于训练语料对特定类型人名的覆盖,在遇到新类型人名时性能显著下降。针对上述问题,该文提出了一种基于数据增强(data augmentation)的方法,使用新类型人名实体替换的策略来生成伪训练数据,该方法能够有效提升系统对新类型人名的识别性能。为了选择有代表性的特定类型人名实体,该文提出了贪心的代表性子类型人名选择算法。在使用1998年《人民日报》数据自动生成的伪测试数据和人工标注的新闻数据的测试结果中,多个模型上人名识别的F1值分别提升了至少12个百分点和6个百分点。 相似文献
18.
跨文本人名消歧是判断出现在不同文本的相同人名是否指称现实中相同实体的过程。跨文本人名消歧是准确获取感兴趣人物相关信息的基础,对多文本摘要、信息融合等具体应用也有重要的作用。该文运用社会网络分析法消歧中文不同文本同名歧义问题,思想是先使用谱聚类对社会网络中的人名聚类,然后根据不同社会网络边权值和不同图划分准则对人名消歧效果的影响,引入了模块度阈值作为社会网络划分的停止条件。在CLP 2010的中文人名消歧数据上进行测试,显示了社会网络分析对人名消歧的有效性。 相似文献
19.
In this paper, we propose to use Harman, Croft and Okapi measures with Lesk algorithm to develop a system for Arabic word sense disambiguation, that combines unsupervised and knowledge based methods. This system must solve the lexical semantic ambiguity in Arabic language. The information retrieval measures are used to estimate the most relevant sense of the ambiguous word, by returning a semantic coherence score corresponding to the context that is semantically closest to the original sentence containing the ambiguous word. The Lesk algorithm is used to assign and select the adequate sense from those proposed by the information retrieval measures mentioned above. This selection is based on a comparison between the glosses of the word to be disambiguated, and its different contexts of use extracted from a corpus. Our experimental study proves that using of Lesk algorithm with Harman, Croft, and Okapi measures allows us to obtain an accuracy rate of 73%. 相似文献
20.
词义消歧要解决如何让计算机理解多义词在上下文中的具体含义,对信息检索、机器翻译、文本分类和自动文摘等自然语言处理问题有着十分重要的作用。通过引入句法信息,提出了一种新的词义消歧方法。构造歧义词汇上下文的句法树,提取句法信息、词性信息和词形信息作为消歧特征。利用贝叶斯模型来建立词义消歧分类器,并将其应用到测试数据集上。实验结果表明:消歧的准确率有所提升,达到了65%。 相似文献