首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
电子商务蓬勃发展的大环境下,广告主具有强烈的电商广告投放意愿,显然他们并未达到电商广告的核心业务SEM (搜索引擎营销优化)的专业要求。所以广告主希望借助第三方工具来进行搜索引擎广告投放的一站式服务来满足其业务需求。基于此,本文将提供一整套的竞价词托管式服务的解决方案。以淘宝直通车这一全新的搜索竞价模式作为研究对象,从语义抽取、关键词扩展、竞价词生成、模型化出价、广告效果正向反馈监控模型几方面进行分析和统计,为直通车广告主提供最优投放策略整体解决方案。第一阶段针对商品信息进行数据挖掘,实现关键词推荐引擎。第二阶段实现投放优化模块,实施定价策略,建立的点击量与PPC(“平均点击花费”)模型,实现在预算资金的约束下对不同竞价组合进行ROI(投入产出比)最大化的投资决策。以实际效果改善直通车竞价搜索用户体验。  相似文献   

2.
针对英文短文本的内容精悍、格式多变的特点,提出了基于多线程多重因子加权的文本关键词提取算法.该算法利用词频-逆向文档频率(TF-IDF)算法计算文本集中单词的词频因子,及代表单词出现位置、长度和同现关系的位置因子、词长因子和同现因子,采用基于Future模式多线程并发计算4个因子的权值.再计算单词的4个因子累积权值并排序提取关键词.实验结果表明,基于多线程多重因子加权的关键词提取算法能够有效提高短文本关键词提取的准确率和召回率.  相似文献   

3.
软件的开发及维护过程中经常要对代码进行搜索。基于关键字匹配的代码搜索面临与传统信息检索一样的问题,即用户查询关键字与代码文本用词不匹配。为提高代码搜索精度,需要挖掘软件中的语义相关词进行查询扩展。本文针对软件工程领域设计了一种基于Word Embedding的语义相关词挖掘方法,并且采用IT技术问答网站Stack Overflow的文档作为语料库训练得到了共包含19332个单词的语义相关词表。与前人工作的对比实验验证了本文方法挖掘的语义相关词能有效提高代码搜索精度。  相似文献   

4.
作为一种经典的文本关键字提取和自动生成算法,TextRank将文本看作若干单词组成的集合,并通过对单词节点图的节点权值进行迭代计算,挖掘单词之间的潜在语义关系。在TextRank节点图模型的基础上,将马尔可夫状态转移模型与节点图相结合,提出节点间边权为条件概率的新模型生成算法TextRank_Revised。通过对有标记和无标记的验证集进行验证,证明新的算法在不提升时间复杂度的前提下,通过计算单文本得出的单词排序结果相较于原TextRank算法更加吻合人工对文档的关键字提取结果。  相似文献   

5.
杨朝举  葛维益  王羽  徐建 《计算机应用研究》2021,38(4):1022-1026,1032
关键词提取在众多文本挖掘任务中扮演着重要的角色,其提取效果直接影响了文本挖掘任务的质量。以文本为研究对象,提出了一种基于k-truss图分解的关键词提取方法,名为KEK(keyword extraction based on k-truss)。该方法首先借助空间向量模型理论,以文本中的词为节点,通过词语之间的共现关系来构建文本图,接着利用k-truss图分解技术来获取文本语义特征,并结合词频、单词位置特征、复杂网络特征等构造无参评分函数,最终根据评分结果来提取关键词。通过在基准数据集上进行实验验证,结果表明KEK算法在提取短文本关键词上的F1值性能指标优于其他基于文本图的关键词提取方法。  相似文献   

6.
在广告推荐系统中,页面与广告的相关性是用户是否点击广告的重要因素,一般利用点击率计算相关性,但是广告展示位置的不同会影响页面-广告相关性计算的准确性,从而导致相关性低的广告被当成相关性高的广告进行错误推荐。针对该问题,提出一种无位置偏见的广告协同推荐算法。利用贝叶斯定理改进位置模型,排除历史数据中的位置影响,计算页面-广告相关性。通过协同过滤技术,为页面找到与其相似的其他邻居页面,实现准确的广告推荐。在腾讯搜搜广告日志数据上进行实验,结果表明,与传统协同过滤算法相比,该算法的推荐准确率、召回率以及F度量值均提高了40%以上,具有较好的广告推荐效果。  相似文献   

7.
文本分类是将自由文本自动划分到若干预先定义类别的方法,在信息检索等领域有很重要的作用。其中,如何选择有效的文本特征是影响文本分类器分类性能的一个重要步骤 。很多应用中需要处理的文本信息包含了很多的命名实体,如某个行业的名人,往往能够在很大程度上影响着文本所属的类别。然而,现阶段的文本特征方法都只利用关键词
词的统计意义,而没有考虑关键词作为命名实体所含有的分类特征。针对这一问题,本文提出了一种将命名实体识别方法NER集成到文本分类特征选择中的方法,在保留关键 词统计特征之外,还保留了单词作为命名实体的分类特征。实验结果表明,相对于其他特征选择方法而言,本文提出的方法在一定程度上提高了文本分类的分类准确率。  相似文献   

8.
随着互联网的高速发展,网络聊天(IM)软件中的上下文广告成为了网络主要赢利模式之一,也是网络营销中的一种重要方式.要精准地提供此类上下文广告就需要正确地提取聊天过程中的关键词.聊天文本不同于普通的文章,它是一种简短的文本,对于这种文本,传统的TFIDF算法存在着缺陷.本文针对传统TFIDF在处理此类文本时的不足之处,利用EFCM聚类算法来提高TFIDF算法对于这类文本的处理能力.  相似文献   

9.
文本自动摘要技术在网页搜索和网页内容推荐等多个领域都有着非常广阔的应用前景。经典的文本摘要算法采用统计学的方法来提取文章关键字,进而提取主题句。这种方法在一定程度上忽略了文本的语义和语法信息。近年来,分布式词向量嵌入技术已经应用到文本检索当中,基于该技术提出了一种词向量化的自动文本摘要方法,该方法主要分为4个步骤:词向量生成、基于词向量的段向量生成、关键词提取和主题句抽取,最终实现文本段落的自动摘要。实验结果表明,改进的文本自动摘要方法能够有效提取主题句。  相似文献   

10.
针对新浪、腾讯等微博平台出现大量广告的问题,提出一个微博广告过滤模型。通过对数据的预处理,将采集到的微博原始数据转换成干净且计算机易处理的数据。在预处理阶段,根据微博文本的特点,对停用词表进行改进,以提高查准率,然后基于支持向量机构建一个训练分类器对数据进行训练,经过不断的学习和反馈,取得较好的分类效果。实验结果表明,该模型进行广告过滤时准确率超过90%,效果优于基于关键字的方法。  相似文献   

11.
With the flood and popularity of various multimedia contents on the Internet, searching for appropriate contents and representing them effectively has become an essential part for user satisfaction. So far, many contents recommendation systems have been proposed for this purpose. A popular approach is to select hot or popular contents for recommendation using some popularity metric. Recently, various social network services (SNSs) such as Facebook and Twitter have become a widespread social phenomenon owing to the smartphone boom. Considering the popularity and user participation, SNS can be a good source for finding social interests or trends. In this study, we propose a platform called TrendsSummary for retrieving trendy multimedia contents and summarizing them. To identify trendy multimedia contents, we select candidate keywords from raw data collected from Twitter using a syntactic feature-based filtering method. Then, we merge various keyword variants based on several heuristics. Next, we select trend keywords and their related keywords from the merged candidate keywords based on term frequency and expand them semantically by referencing portal sites such as Wikipedia and Google. Based on the expanded trend keywords, we collect four types of relevant multimedia contents—TV programs, videos, news articles, and images—from various websites. The most appropriate media type for the trend keywords is determined based on a naïve Bayes classifier. After classification, appropriate contents are selected from among the contents of the selected media type. Finally, both trend keywords and their related multimedia contents are displayed for effective browsing. We implemented a prototype system and experimentally demonstrated that our scheme provides satisfactory results.  相似文献   

12.
Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit for search and retrieval is an individual page, even though information on a topic is often spread across multiple pages. This degrades the quality of search results, especially for long or uncorrelated (multitopic) queries (in which individual keywords rarely occur together in the same document), where a single page is unlikely to satisfy the user's information need. We propose a technique that, given a keyword query, on the fly generates new pages, called composed pages, which contain all query keywords. The composed pages are generated by extracting and stitching together relevant pieces from hyperlinked Web pages and retaining links to the original Web pages. To rank the composed pages, we consider both the hyperlink structure of the original pages and the associations between the keywords within each page. Furthermore, we present and experimentally evaluate heuristic algorithms to efficiently generate the top composed pages. The quality of our method is compared to current approaches by using user surveys. Finally, we also show how our techniques can be used to perform query-specific summarization of Web pages.  相似文献   

13.
Keyword queries have long been popular to search engines and to the information retrieval community and have recently gained momentum for its usage in the expert systems community. The conventional semantics for processing a user query is to find a set of top-k web pages such that each page contains all user keywords. Recently, this semantics has been extended to find a set of cohesively interconnected pages, each of which contains one of the query keywords scattered across these pages. The keyword query having the extended semantics (i.e., more than a list of keywords hyperlinked with each other) is referred to the graph query. In case of the graph query, all the query keywords may not be present on a single Web page. Thus, a set of Web pages with the corresponding hyperlinks need to be presented as the search result. The existing search systems reveal serious performance problem due to their failure to integrate information from multiple connected resources so that an efficient algorithm for keyword query over graph-structured data is proposed. It integrates information from multiple connected nodes of the graph and generates result trees with the occurrence of all the query keywords. We also investigate a ranking measure called graph ranking score (GRS) to evaluate the relevant graph results so that the score can generate a scalar value for keywords as well as for the topology.  相似文献   

14.
李勇  相中启 《计算机应用》2019,39(1):245-250
针对云计算环境下已有的密文检索方案不支持检索关键词语义扩展、精确度不够、检索结果不支持排序的问题,提出一种支持检索关键词语义扩展的可排序密文检索方案。首先,使用词频逆文档频率(TF-IDF)方法计算文档中关键词与文档之间的相关度评分,并对文档不同域中的关键词设置不同的位置权重,使用域加权评分方法计算位置权重评分,将相关度评分与位置权重评分的乘积设置为关键词在文档索引向量上相应位置的取值;其次,根据WordNet语义网对授权用户输入的检索关键词进行语义扩展,得到语义扩展检索关键词集合,使用编辑距离公式计算语义扩展检索关键词集合中关键词之间的相似度,并将相似度值设置为检索关键词在文档检索向量上相应位置的取值;最后,加密产生安全索引和文档检索陷门,在向量空间模型(VSM)下进行内积运算,以内积运算的结果为密文检索文档的排序依据。理论分析和实验仿真表明,所提方案在已知密文模型和已知背景知识模型下是安全的,且具备对检索结果的排序能力;与多关键字密文检索结果排序(MRSE)方案相比,所提方案支持关键词语义扩展,查询准确率比MRSE方案更加准确可靠,而检索时间则与MRSE方案相差不大。  相似文献   

15.
According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users’ attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users’ queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements’ keywords is high. Thus, users who pay attention to advertisements’ keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.  相似文献   

16.
针对钓鱼攻击者常用的伪造HTTPS网站以及其他混淆技术,借鉴了目前主流基于机器学习以及规则匹配的检测钓鱼网站的方法RMLR和PhishDef,增加对网页文本关键字和网页子链接等信息进行特征提取的过程,提出了Nmap-RF分类方法。Nmap-RF是基于规则匹配和随机森林方法的集成钓鱼网站检测方法。根据网页协议对网站进行预过滤,若判定其为钓鱼网站则省略后续特征提取步骤。否则以文本关键字置信度,网页子链接置信度,钓鱼类词汇相似度以及网页PageRank作为关键特征,以常见URL、Whois、DNS信息和网页标签信息作为辅助特征,经过随机森林分类模型判断后给出最终的分类结果。实验证明,Nmap-RF集成方法可以在平均9~10 μs的时间内对钓鱼网页进行检测,且可以过滤掉98.4%的不合法页面,平均总精度可达99.6%。  相似文献   

17.
网页广告与当前页面内容不匹配使得广告的投放效果降低。本文使用基于站点的贝叶斯模型扩展和基于维基百科的语义扩展两种方法,精确提取网页的标签信息,用更加精确的标签去匹配网络广告,增强了广告效果。本文实现了一个基于语义扩展的网页标签推荐系统,实验证实效果良好。  相似文献   

18.
针对中文新闻网页的特点,使用了包括统计特征、位置特征和词性特征等在内的多种特征综合评定候选关键词的权重大小。对于部分分词结果不能良好地反映主题的问题,提出了一种基于有向图的组合词生成方法,旨在找出高频次的相邻词作为组合词。实验结果表明,该方法较传统的TF-IDF方法效率有较大提升,能够有效提取出新闻网页关键词。  相似文献   

19.
McDonald  D.W. 《Computer》2003,36(10):111-112
In many popular visions of ubiquitous computing, the environment proactively responds to individuals who inhabit the space. For example, a display magically presents a personalized advertisement, the most relevant video feed, or the desired page from a secret government document. Such capability requires more than an abundance of networked displays, devices, and sensors; it relies implicitly on recommendation systems that either directly serve the end user or provide critical services to some other application. As recommendation systems evolve to exploit new advances in ubiquitous computing technology, researchers and practitioners from technical and social science disciplines must collaborate to address the challenges to their effective implementation. Although it may be impossible to perfectly anticipate each individual's needs at any place or time, ubiquitous computing will enable such systems to help people cope with an expanding array of choices.  相似文献   

20.
吴代文  詹海生 《微机发展》2011,(10):121-124
通过LuceneAPI实现对PDF文档的一次全文检索,为了更精确地定位搜索关键词,设计并实现了一种新的二次索引算法,该二次索引带有关键词的页码、坐标及其上下文等信息。利用该二次索引可将检索结果定位到PDF文档的具体页,然后在页面上标示出关键字的具体位置,使对PDF文档的二次检索达到了类似GoogleBook的图书检索效果。系统测试结果说明系统具有良好检索性能,有较高的查全率和查准率,能够满足用户快速检索的需求。系统作为西安市数字方志全文检索平台投入使用已有2年,取得了较好的应用成果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号