首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
针对目前数据库知识发现模型系统中传统文本信息抽取算法无法满足用户业务需求的问题,提出了一种基于用户需求描述的文本信息特征抽取模型。通过用户的业务需求模型进行特征化描述,将数据库中存储的原始本文信息进行预处理加工,计算的词频、权重,初步选取文本特征,根据用户需求描述计算特征相似度,过滤不相关的"噪声"信息,进而保留能够精确描述文本信息的特征。  相似文献   

2.
《现代电子技术》2018,(4):143-146
传统智能问答系统能够进行简单的问题答复,但对于问句信息比较相似的问题不能准确判别。基于此设计基于包装产业大数据知识图谱的智能问答系统。通过知识图谱问句映射进行问句信息框架设计,建立大数据知识图谱数据库,为问句信息的判别提供稳定的数据环境;将问句信息转化为e FAQ的判别语句,进行问句信息的理解;利用相似度计算判别相似问句信息的相似度,实现近似问句信息的判别问答。实验数据表明,设计的智能问答系统能够对相似问句信息进行精准的判别,并实现智能问答。  相似文献   

3.
针对个性化网络广告中网页与广告匹配的问题,通过将基于关键词扩展的语义分析技术引入到协同过滤系统中,提出一种基于协同过滤与语义分析结合的个性化网络广告投放方法(CFKE)。该方法首先提取网页与广告文本的关键词,并对关键词扩展同义词;然后,计算网页扩展词与广告扩展词的相似度,并与扩展词的权重进行拟合抽取,得到网页与广告最终的相似度,将三维模型降维成二维模型;最后,再利用协同过滤方法进行匹配。仿真表明,与其他算法相比,该算法不仅具有较高的准确度,同时具有较好的系统响应能力。  相似文献   

4.
通过分析文本挖掘中的2个关键步骤——文本特征空间构造和相似距离度量,指出流行的文本挖掘过程中存在着大量同义和关联噪声。大量存在的同义词和关联词,造成文本特征空间无法准确表达文本语义以及高维计算复杂性问题。利用潜在语义分析和关联规则挖掘构造同义和关联词集,用于减少文本特征空间中的同义词和关联词,降低信息冗余,改进挖掘效率。文中对相应的算法进行了描述,实验结果令人满意。  相似文献   

5.
黄名选 《电子学报》2021,49(7):1305-1313
针对自然语言处理中查询主题漂移和词不匹配问题,提出基于CSC(Copulas-based Support and Confidence)框架的关联模式挖掘与规则扩展算法,并将基于统计学分析的关联模式与具有上下文语义信息的词向量融合,提出关联模式挖掘与词向量学习融合的伪相关反馈查询扩展模型.该模型对伪相关反馈文档集挖掘规则扩展词,对初检文档集进行词嵌入学习训练得到词向量,计算规则扩展词与原查询的向量相似度,提取向量相似度不低于阈值的规则扩展词作为最终扩展词.实验结果表明,所提扩展模型能有效地减少查询主题漂移和词不匹配问题,提高检索性能,与现有基于关联模式的和基于词向量的查询扩展方法比较,MAP(Mean Average Precision)平均增幅最大可达17.52%,对短查询更有效.所提挖掘方法可用于其他文本挖掘任务和推荐系统,以提高其性能.  相似文献   

6.
针对信息增益模型在文本分类中的不足之处,提出了一种基于灰关系与信息增益的文本分类算法.首先基于改进的χ2统计进行类别特征选择用于类内文本表示,提高类别中心向量的表示能力;其次针对IG模型对低频词赋权过大问题,提出了基于频数和位置的改进加权方法;最后提出了基于灰关系的文本相似度计算途径,改善了基于距离的相似度计算模式的不足.试验表明,此算法提高了文本分类效率.  相似文献   

7.
在分布式检索中,基于主题的语言模型集合选择方法首先引入Relevance Model计算用户查询和信息集合中文档的相似度,在此基础上通过文本聚类得到集合中文档的主题信息,加入语言模型计算得到各个信息集合的查询相关度排名,以此完成集合选择.实验表明,与ODRI、CRCS和基于传统语言模型的集合选择算法相比,该方法的检索效果得到了显著提高.  相似文献   

8.
针对当前人机交互系统机器人存在背景知识缺乏、回复连贯性不高的问题,该文提出一种基于知识图谱波纹网络的人机交互模型.为实现更加自然且智能化的人机交互系统,该文模型模拟了真实的人与人交流过程.首先,通过计算人机交互情感评估值和人机交互情感确信度得到人机交互情感友好度.然后,引入外部知识图谱作为机器人的背景知识,将对话实体嵌...  相似文献   

9.
通过分析中文报道的特点,提出了一种改进相似度计算的话题检测算法。该算法以Single-Pass聚类策略为基础,结合新闻报道中的地点信息,分别对新闻报道进行文本内容相似度和地点相似度计算,并将两者结合进行话题检测。实验结果表明,算法性能优于传统的文本相似度算法。  相似文献   

10.
针对基于语义的短文本相似度计算方法在短文本分类中准确率较低这一问题,提出了结合词性的短文本相似度算法( GCSSA)。该方法在基于hownet(“知网”)语义的短文本相似度计算方法的基础上,结合类别特征词并添加关键词词性分析,对类别特征词和其他关键词的词性信息给定不同关键词以不同的权值系数,以此区别各种贡献度词项在短文本相似度计算中的重要程度。实验表明,该算法进行文本相似度计算后应用于短文本分类中较基于hownet的短文本分类算法在准确率宏平均和微平均上提升4%左右,有效提高了短文本分类的准确性。  相似文献   

11.
介绍了中文文本分类系统的原理,在特征提取上采用了文档频率法(DF)与潜在语义分析法(K认)相结合的方法,先采用DF法过滤掉DF值低的词条,降低文本矩阵的稀疏性,然后使用LSA法进行词语间的语义分析,消除同义词和多义词的影响,提高文本分类的速度与精确度。实验结果表明使用此种降维方法取得了良好的效果。  相似文献   

12.
This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query con-text based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm  相似文献   

13.
周国娟 《通信技术》2010,43(11):74-77
为了研究并提高文本的聚类算法的性能,根据蚁群算法在TSP问题中的应用方法,将其改进引用到文本的聚类处理的研究中。在文本的聚类处理研究中,改变蚂蚁的信息素释放机制,道路节点的聚合方式,从而最终将相似文本进行聚合。对改进的算法进行实验后的结果证明,这种新的算法可以使文本聚类的准确度提高,具有良好的聚类效果,能有效提高查询的文本召回率。蚁群算法在文本聚类中的应用是可行的。  相似文献   

14.
汪少敏  杨迪  任华 《电信科学》2018,34(12):117-124
大数据时代,文本分类是文本数据挖掘和文本价值探索领域的重要工作。传统的文本分类系统存在特征提取能力弱、分类准确率不高的问题。相对于传统的文本分类技术,深度学习技术具有准确率高、特征提取有效等诸多优势,有必要将深度学习技术引入文本分类系统,以解决传统文本分类系统存在的问题。在分析传统文本分类系统的基础上,提出了基于深度学习的文本分类系统的体系架构和关键技术,同时对传统分类模型、TextCNN、CNN+LSTM多种分类模型进行了验证比对。  相似文献   

15.
We present a two-pass image retrieval system in which retrieval techniques for text and image documents are combined in a novel approach. In the first pass, the text-based initial query is matched against the text captions of the images in the database to obtain the initial retrieved set. In the second pass, text and image features obtained from this initial retrieved set are used to expand the initial query. Additional images from the database are then retrieved based on the expanded query. The image features that we have used are color histograms, DC coefficients from the discrete cosine transform, and two texture features: multiresolution simultaneous autoregressive model and local binary pattern. These are low-level statistical image features that can be easily computed. Extensive experiments have been performed on 1019 color pictures of mixed variety with captions, relevance judgments and queries supplied by a national archives agency. Objective precision-recall results have been obtained with various combinations of text and image features. The results show that the image features do not perform well when used on their own. However, when image features are used in query expansion, they increase the average precision more significantly than text annotations. Moreover, these findings are valid at all precision levels and are not sensitive to the image feature acquisition parameters.  相似文献   

16.
BilVideo: a video database management system   总被引:1,自引:0,他引:1  
The BilVideo video database management system provides integrated support for spatiotemporal and semantic queries for video. A knowledge base, consisting of a fact base and a comprehensive rule set implemented in Prolog, handles spatio-temporal queries. These queries contain any combination of conditions related to direction, topology, 3D relationships, object appearance, trajectory projection, and similarity-based object trajectories. The rules in the knowledge base significantly reduce the number of facts representing the spatio-temporal relations that the system needs to store. A feature database stored in an object-relational database management system handles semantic queries. To respond to user queries containing both spatio-temporal and semantic conditions, a query processor interacts with the knowledge base and object-relational database and integrates the results returned from these two system components. Because of space limitations, we only discuss the Web-based visual query interface and its fact-extractor and video-annotator tools. These tools populate the system's fact base and feature database to support both query types.  相似文献   

17.
A prototype, content-based image retrieval system has been built employing a client/server architecture to access supercomputing power from the physician's desktop. The system retrieves images and their associated annotations from a networked microscopic pathology image database based on content similarity to user supplied query images. Similarity is evaluated based on four image feature types: color histogram, image texture, Fourier coefficients, and wavelet coefficients, using the vector dot product as a distance metric. Current retrieval accuracy varies across pathological categories depending on the number of available training samples and the effectiveness of the feature set. The distance measure of the search algorithm was validated by agglomerative cluster analysis in light of the medical domain knowledge. Results show a correlation between pathological significance and the image document distance value generated by the computer algorithm. This correlation agrees with observed visual similarity. This validation method has an advantage over traditional statistical evaluation methods when sample size is small and where domain knowledge is important. A multi-dimensional scaling analysis shows a low dimensionality nature of the embedded space for the current test set.  相似文献   

18.
刑侦现勘图像数据库是具有保密性高、图像内容罕见等极具行业特色的图像数据库.针对现勘图像内容复杂、目标物体不明确的特点,提出了DCT-DCT波纹理特征,并与HSV颜色直方图特征、GIST特征相融合构成融合特征.与常用的图像特征相比,DCT-DCT波纹理特征能够得到较高的检索效率,而融合特征的平均检索查准率高于构成其本身的三种特征的平均检索查准率.最后,将语义分析技术引入到检索过程中,提出基于检索结果优化的现勘图像检索算法,利用支持向量机(Support Vector Machine,SVM)分类器对查询图像进行语义提取,并对初次检索的结果进行语义分析,根据初检结果中语义类别的占比选择二次检索方案,该算法能在按例查询的基础上进一步提高平均检索查准率.  相似文献   

19.
For the complex questions of Chinese question answering system, we propose an answer extraction method with discourse structure feature combination. This method uses the relevance of questions and answers to learn to rank the answers. Firstly, the method analyses questions to generate the query string, and then submits the query string to search engines to retrieve relevant documents. Secondly, the method makes retrieved documents segmentation and identifies the most relevant candidate answers, in addition, it uses the rhetorical relations of rhetorical structure theory to analyze the relationship to determine the inherent relationship between paragraphs or sentences and generate the answer candidate paragraphs or sentences. Thirdly, we construct the answer ranking model, and extract five feature groups and adopt Ranking Support Vector Machine (SVM) algorithm to train ranking model. Finally, it reranks the answers with the training model and find the optimal answers. Experiments show that the proposed method combined with discourse structure features can effectively improve the answer extracting accuracy and the quality of non-factoid answers. The Mean Reciprocal Rank (MRR) of the answer extraction reaches 69.53%.  相似文献   

20.
文本分类中改进型互信息特征选择的研究   总被引:5,自引:2,他引:3  
互信息是文本分类中常用的特征选择方法.提出了一种新的基于互信息的特征选择方法.首先分析了特征选择影响文本分类精度的因素,将这些因素组合起来表征特征对于分类的强弱,并用公式直观地表示由组合因素计算出的特征值,根据这些值得大小选择对分类影响大的特征.最后理论证明了其可行性,并通过实验证明了该方法在提高分类精度方面比传统方法提高了10%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号