首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
索引对象标识与特征文件管理   总被引:1,自引:0,他引:1  
本文提出一种基于对象标识原理、时间戳排序技术及汉字联想思想实现特征文件有效组织和汉字支撑环境自动形成的方法,并讨论了面向文本数据库管理系统(FIMS)的文本对象索引系统的设计与实现问题。  相似文献   

2.
特征文件索引、时间戳排序技术是数据库技术研究方面的两个重要课题,前者通常用于支持文本数据的索引和检索操作,后者为实现数据库并发控制的两个基本方法之一。本文主要讨论面向文本数据库管理系统(FIMS)基于索引时间戳概念的文本对象索引模型的形式化描述、检索相关性计算及特征文件系统逻辑设计等问题。  相似文献   

3.
基于邻接矩阵的全文索引模型   总被引:5,自引:0,他引:5  
周水庚  胡运发  关佶红 《软件学报》2002,13(10):1933-1942
文本信息的急剧增加和越来越多的用户通过在线方式获取文本信息,使得查询效率成为信息检索系统一个突出瓶颈.提出两种新型全文索引模型,用于改善信息检索系统的查询效率.通过使用有向图表示文本串,引出关于文本串的邻接矩阵;采用两种不同的方式实现文本串邻接矩阵,导出了两种基于邻接矩阵的新型全文索引模型,即基于邻接矩阵的倒排文件和基于邻接矩阵的PAT数组.给出了基于新模型的文本查询算法;分析了新模型的存储空间和查询时间的开销,并分别与两种传统索引模型进行了比较.对实际文本库进行了测试以证实新模型的效能.新模型能够以相对于原文较小的空间代价获得较大幅度的查询效率的提高,因此适合于在大规模文本检索系统中应用.  相似文献   

4.
文字识别软件在识别文字时会产生错字、漏字等错误,因此要进行文件修正,以解决文本显示、文本修正和数据正确输出3个核心问题。为了保证数据的输入、显示、修正和输出时的完整性,设计了4个数据结构体,分别存储文字、表格、图像及需要修改的错误文字的信息,从而在改正原文件错误信息的同时不丢失原有信息。采用面向对象的方法将系统划分为文本视图对象、文本编辑对象和文本文档对象,方便地实现了文本文档数据的传输、文本内容的显示及文本内容的编辑。  相似文献   

5.
针对普通的空间关键字查询通常会导致多查询结果的问题。本文提出了一种基于空间对象位置-文本相关度的top- k 查询与排序方法,用于获取与给定空间关键字查询在文本上相关且位置上相近的典型空间对象。该方法分为离线处理和在线查询处理2个阶段。在离线阶段,根据空间对象之间的位置相近性和文本相似性,度量任意一对空间对象之间的位置-文本关系紧密度。在此基础上,提出了基于概率密度的代表性空间对象选取算法,根据空间对象之间的位置-文本关系为每个代表性空间对象构建相应的空间对象序列。在线查询处理阶段,对于一个给定的空间关键字查询,利用Cosine相似度评估方法计算查询条件与代表性空间对象之间的相关度,然后使用阈值算法(threshold algorithm,TA)在预先创建的空间对象序列上快速选出top- k 个满足查询需求的典型空间对象。实验结果表明:提出的空间对象top- k 查询与排序方法能够有效地满足用户查询需求,并且具有较高的准确性、典型性和执行效率。  相似文献   

6.
传统的犯罪查询的查询条件是文本信息,查询结果是有序的文档列表,这种方式无法展示结果之间的关系.基于异构信息网络以信息网络的形式重构假币犯罪信息数据,构建了假币犯罪信息网络,使用人名消歧的技术建立假币犯罪信息网络中嫌疑人之间的关系,并使用排序学习方法研究假币犯罪信息网络中的节点相关性问题,设计并实现了假币犯罪信息分析系统,通过以实体对象为查询项和网络图为查询结果的方式解决假币犯罪数据的查询问题.  相似文献   

7.
文字识别是深度学习网络的重要应用领域,主流算法基于光学信息预测自然场景文字。然而在一些特定领域的文本对象上,额外的关键特征将会进一步提高文字识别算法的准确性。在安防监控领域,画面中的时间戳文本拥有格式规范、限定数值范围等特点,根据这一特点,对时间戳文本识别网络进行了研究,提出一种时间戳信息约束机制,融合文本语义约束信息和光学特征达到识别规范文本的效果,增强输出时间戳文本的格式规范性和数值合理性。在全匹配率、编辑距离等标准上全面超过基于光学特征的经典文字识别算法。  相似文献   

8.
针对倒排索引空间开销大、查询时间效率低以及难以同时支持连接布尔查询和排序查询的问题,提出了一种同时提高空间效率与查询时间效率的高效随机访问分块倒排文件自索引RABIF.为了在降低空间消耗的同时支持连接布尔查询与排序查询,RABIF将倒排列表进行合理地分块,然后对每个子块的不同部分采用相应的压缩方式,在不需要插入任何附加辅助信息的前提下实现压缩索引的快速定位与随机访问.理论分析及实验结果表明,与忽略倒排文件自索引SIF相比,提出的RABIF空间开销平均减少5.3%,布尔查询时间平均减少17.8%;对于0.2%与1%排序查询,查询时间分别平均减少34.4%与27.5%.  相似文献   

9.
现有的空间文本skyline查询忽略了地理空间对象的时间信息,考虑到时间信息对应用的重要性,将时间信息应用到空间文本skyline查询中,提出了一种新的查询,即已知时间的空间文本skyline查询(Time-aware Spatial-Textual Skyline Query,TSTSQ)。TSTSQ中skyline对象的筛选依赖于三个条件:文本相关性、空间邻近和有效时间。分别设计了对象的空间文本相关性和时间文本相关性的计算函数,构建时空信息和文本信息的对象索引结构TKR-Tree,通过构造高效的裁剪策略实现了TSTSQ的查询算法。通过实验数据的分析和对比,验证了TSTSQ查询的有效性。  相似文献   

10.
针对"多义词"和"词典问题",结合文本分析和用户行为分析,提出了一种基于主题的个性化查询扩展模型.分析文本时,结合关联规则和图排序算法构建TextRank模型,脱离了对人工词典的依赖,并用此模型提取多文本主题;在用户行为分析上,使用移动时间窗口法建立用户模型,有效地捕获了当前的查询主题.查询扩展时,匹配用户主题与文本主题,选择相应的关联规则进行扩展.对结合关联规则与图排序的主题提取进行了实验,并将基于主题的查询扩展模型与其它查询扩展模型进行了比较.  相似文献   

11.
论文提出一种数据流管理系统中支持实时性查询的数据流操作语言PQL。PQL以SQL_99为蓝本,引入了时间戳、快照窗口、标记窗口、滑动窗口以及连续查询等一系列相关概念,对数据流连续查询中的近似查询和查询的实时性也给出了充分的语法和语义支持。PQL充分地实现了数据流的操作特征,它不仅可以实现数据流上的选择、投影、连接等操作,支持物理时间和逻辑时间两种时间戳,而且可以实现数据流与关系表的连接操作。  相似文献   

12.
Recent progress in hardware and operating system technologies has made it possible to manage multimedia data consisting of text, static images, sound and/or video. Video data is considered to be the most informative of these types of data. It presents a scene consisting of objects and the motion of objects conveying particular meaning of the scene. Thus, the inherent feature of video data lies in the motion of objects. In this paper, we present a system that retrieves video data by means of the motion of objects observed in the video data in the database. The system accepts a query for a video database, which is specified by drawing an example trajectory of an object, and retrieves video data by extracting a moving object observed in the video data. The proposed way of specifying a query condition is superior to other ways of representing a condition, e.g. by text, in the sense that it is suitable for representing the difference of motion.  相似文献   

13.
叶靓  王智斌  邵谦明 《计算机工程》2007,33(17):228-230
提出并实现了一种基于相关反馈的语音检索引擎,该引擎基于Sphinx语音识别工具将语音转化为文本,再采用Lucene对文本进行索引。为了提高语音检索的质量,系统引入了相关反馈机制,不仅通过局部相关反馈修正用户的查询,还通过全局类相关反馈机制挖掘Sphinx的识别错误模式,扩展了用户的查询,大大增强了该索引系统的准确性和实时动态性。实验结果证明该系统能符合检索者的需求,具有实用价值。  相似文献   

14.
Multimodal Retrieval is a well-established approach for image retrieval. Usually, images are accompanied by text caption along with associated documents describing the image. Textual query expansion as a form of enhancing image retrieval is a relatively less explored area. In this paper, we first study the effect of expanding textual query on both image and its associated text retrieval. Our study reveals that judicious expansion of textual query through keyphrase extraction can lead to better results, either in terms of text-retrieval or both image and text-retrieval. To establish this, we use two well-known keyphrase extraction techniques based on tf-idf and KEA. While query expansion results in increased retrieval efficiency, it is imperative that the expansion be semantically justified. So, we propose a graph-based keyphrase extraction model that captures the relatedness between words in terms of both mutual information and relevance feedback. Most of the existing works have stressed on bridging the semantic gap by using textual and visual features, either in combination or individually. The way these text and image features are combined determines the efficacy of any retrieval. For this purpose, we adopt Fisher-LDA to adjudge the appropriate weights for each modality. This provides us with an intelligent decision-making process favoring the feature set to be infused into the final query. Our proposed algorithm is shown to supersede the previously mentioned keyphrase extraction algorithms for query expansion significantly. A rigorous set of experiments performed on ImageCLEF-2011 Wikipedia Retrieval task dataset validates our claim that capturing the semantic relation between words through Mutual Information followed by expansion of a textual query using relevance feedback can simultaneously enhance both text and image retrieval.  相似文献   

15.
Text Database Discovery on the Web: Neural Net Based Approach   总被引:1,自引:0,他引:1  
As large numbers of text databases have become available on the Web, many efforts have been made to solve the text database discovery problem: finding which text databases (out of many candidates) are most likely to provide relevant documents to a given query. In this paper, we propose a neural net based approach to this problem. First, we present a neural net agent that learns about underlying text databases from the user's relevance feedback. For a given query, the neural net agent, which is sufficiently trained on the basis of the backpropagation learning mechanism, discovers the text databases associated with the relevant documents and retrieves those documents effectively. In order to scale our approach with the large number of text databases, we also propose the hierarchical organization of neural net agents which reduces the total training cost at the acceptable level. Finally, we evaluate the performance of our approach by comparing it to those of the conventional well-known statistical approaches.  相似文献   

16.
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various data types such as spatial, textual, and relational data types can be used for the top-k MULTI query. The top-k MULTI query distinguishes itself from the traditional top-k query in that the component scores to calculate final scores are determined dependent of the query. Hence, each component score is calculated only when the query is given for each data type rather than being calculated apriori as in the top-k query. As a representative instance, the traditional top-k spatial keyword query is an instance of the top-k MULTI query. It deals with the spatial data type and text data type and finds the information based on spatial proximity and textual relevance between the query and the object, which is determined only when the query is given. In this paper, we first define the top-k MULTI query formally and define a new specific instance for the top-k MULTI query, the top-k spatial-keyword-relational(SKR) query, by integrating the relational data type into the traditional top-k spatial keyword query. Then, we investigate the processing approaches for the top-k MULTI query. We discuss the scalability of those approaches as new data types are integrated. We also devise the processing methods for the top-k SKR query. Finally, through extensive experiments on the top-k SKR query using real and synthetic data sets, we compare efficiency of the methods in terms of the query performance and storage.  相似文献   

17.
Massive amount of data that are associated with geographic information are generated in Internet. More and more researches focus on how to retrieve geo-textual data effectively. Existing methods mostly allow exact matches for query keywords but fail to support fuzzy preference queries. In this paper, we study the skyline problem of fuzzy preference queries. That is, given a set of geo-textual data, the skyline comprises the objects that are not dominated by others. In this paper, we only consider the problem of two dimensions, the text relevance dimension and the spatial relevance dimension. We introduce two functions to quantify the text relevance and the spatial relevance. We also develop a new index structure to organize the geo-textual data and an algorithm based on it. Theoretical analysis and experimental results show that our method offers scalability and good performance.  相似文献   

18.
Since documents on the Web are naturally partitioned into many text databases, the efficient document retrieval process requires identifying the text databases that are most likely to provide relevant documents to the query and then searching for the identified text databases. In this paper, we propose a neural net based approach to such an efficient document retrieval. First, we present a neural net agent that learns about underlying text databases from the user's relevance feedback. For a given query, the neural net agent, which is sufficiently trained on the basis of the BPN learning mechanism, discovers the text databases associated with the relevant documents and retrieves those documents effectively. In order to scale our approach with the large number of text databases, we also propose the hierarchical organization of neural net agents which reduces the total training cost at the acceptable level. Finally, we evaluate the performance of our approach by comparing it to those of the conventional well-known approaches. Received 5 March 1999 / Revised 7 March 2000 / Accepted in revised form 2 November 2000  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号