首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Taylor  S.M. 《IT Professional》2004,6(6):28-34
Most readily available tools - basic search engines, possibly a news or information service, and perhaps agents and Web crawlers - are inadequate for many information retrieval tasks and downright dangerous for others. These tools either return too much useless material or miss important material. Even when such tools find useful information, the data is still in a text form that makes it difficult to build displays or diagrams. Employing the data in data mining or standard database operations, such as sorting and counting, can also be difficult. An emerging technology called information extraction (IE) is beginning to change all that, and you might already be using some very basic IE tools without even knowing it. Companies are increasingly applying IE behind the scenes to improve information and knowledge management applications such as text search, text categorization, data mining, and visualization (Rao, 2003). IE has also begun playing a key role in fields such as national security, law enforcement, insurance, and biomedical research, which have highly critical information and knowledge needs. In these fields, IE's powerful capabilities arc necessary to save lives or substantial investments of time and money. IE views language up close, considering grammar and vocabulary, and tries to determine the details of "who did what to whom" from a piece of text. In its most in-depth applications, IE is domain focused; it does not try to define all the events or relationships present in a piece of text, but focuses only on items of particular interest to the user organization.  相似文献   

2.
Databases deepen the Web   总被引:2,自引:0,他引:2  
Ghanem  T.M. Aref  W.G. 《Computer》2004,37(1):116-117
The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynamically-pages that are hard for traditional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the "deep Web" in contrast to the "surface Web" that traditional search engines access easily. We expect deep Web search engines and technologies to improve rapidly and to dramatically affect how the Web is used by providing easy access to many more information resources.  相似文献   

3.
Information Extraction (IE) systems that can exploit the vast source of textual information that is the internet would provide a revolutionary step forward in terms of delivering large volumes of content cheaply and precisely, thus enabling a wide range of new knowledge driven applications and services. However, despite this enormous potential, few IE systems have successfully made the transition from laboratory to commercial application. The reason may be a purely practical one—to build useable, scaleable IE systems requires bringing together a range of different technologies as well as providing clear and reproducible guidelines as to how to collectively configure and deploy those technologies.This paper is an attempt to address these issues. The paper focuses on two primary goals. Firstly, we show that an information extraction system which is used for real world applications and different domains can be built using some autonomous, corporate components (agents). Such a system has some advanced properties: clear separation to different extraction tasks and steps, portability to multiple application domain, trainability, extensibility, etc. Secondly, we show that machine learning and, in particular, learning in different ways and at different levels, can be used to build practical IE systems. We show that carefully selecting the right machine learning technique for the right task and selective sampling can be used to reduce the human effort required to annotate examples for building such systems.  相似文献   

4.
Information extraction (IE) is an important and growing field, in part because of the development of ubiquitous social media networking millions of people and producing huge collections of textual information. Mined information is being used in a wide array of application areas from targeted marketing of products to intelligence gathering for military and security needs. IE has its roots in artificial intelligence fields including machine learning, logic and search algorithms, computational linguistics, and pattern recognition. This review summarizes the history of IE, surveys the various uses of IE, identifies current technological accomplishments and challenges, and explores the role that neural and adaptive computing might play in future research. A goal for this review is also to encourage practitioners of neural and adaptive computing to look for interesting applications in the important emerging area of IE.  相似文献   

5.
智能文本搜索新技术   总被引:1,自引:0,他引:1  
面对当今互联网上海量的信息,以及搜索信息准确、高效、个性化等需求,提出了一套包括信息检索、信息抽取和信息过滤在内的智能文本搜索新技术.首先举荐了与信息检索新技术相关的企业检索、实体检索、博客检索、相关反馈子任务.然后介绍了与信息抽取技术相关的实体关联和实体填充子任务,以及与信息过滤技术相关的垃圾邮件过滤子任务.这些关键技术融合在一起,在多个著名的国际评测中得到应用,如美国主办的文本检索会议评测和文本分析会议评测,并且在互联网舆情、短信舆情和校园网对象搜索引擎等实际系统中得到了检验.  相似文献   

6.
宋杰  于戈  王大玲  鲍玉斌 《计算机工程》2007,33(20):43-45,4
为了有效解决模块之间因共享数据而产生的交互耦合,提出了一种新的设计模式——注册仓模式。该模式封装了共享数据,避免了数据在各个使用者之间传递,分离了数据的提供者和使用者,实现了模块间一种简化的数据访问协议,降低了模块间的耦合度。理论和实践证明注册仓模式能良好地适用基于组件或模块的软件体系架构下面向对象的程序设计。  相似文献   

7.
资源发现的质量与网格系统中资源的组织方式和查询消息的路由方式是密不可分的。随着P2P网络的兴起,许多新的概念和研究方向为网格相关研究提供了思路。该文引入了虚拟组织的相关概念,在此基础上将网格资源进行分层管理,提出了分层C&;G体系构架。针对资源查询的处理,给出了3个相关算法:IE算法,ID算法和IR算法。通过实验与传统算法进行了对比,结果表明其性能有了一定的提高。  相似文献   

8.
近年来,许多实际应用不仅需要支持空间连接查询而且需要具备关键词搜索功能,以帮助用户查找那些既满足空间连接条件又包含指定关键词的空间对象组合.正是在这种需求的驱动之下,定义了一种具备关键词搜索功能的空间连接查询(Spatial Join with Keyword Search,缩写SJKS),并提出了一种基于IR2-Tree的SJKS查询处理算法(IR2-TreeSJKS算法),旨在实现关键词搜索与空间连接查询的高效结合.实验表明,本算法可有效支持具有关键词搜索功能的空间连接查询处理.  相似文献   

9.
Information retrieval (IR) systems exploit relevant information when tailoring search results to individual information needs. However, current search experience becomes poor without considering similar queries entered by previous searchers. In the following paper, we discuss a solution to this problem, which combines collaborative filtering algorithms with traditional IR models to enable EIR. We also present various iterative refinement methods for improving the raw performance of this system. We validate our theories in an experiment using queries extracted from the click‐through log of a commercial search engine. According to our results, an IR system employing iteratively refined, collaborative retrieval significantly outperforms various baseline retrieval models.  相似文献   

10.
元搜索引擎中的成员选择和结果合并策略研究   总被引:2,自引:0,他引:2  
近年来,信息检索成为研究热点,搜索引擎成为用户经常使用的服务之一.但是独立搜索引擎的覆盖面狭窄,检索效率低,为了得到比较全面和准确的结果,需要反复调用多个搜索引擎.而元搜索引擎就是一种调用其它独立搜索引擎的引擎,它可以更好的满足用户查询的需求.简述了元搜索引擎的工作原理,分析比较了元搜索引擎的一些技术,并针对元搜索引擎的成员选择和结果合并方面提出了一些实现策略.  相似文献   

11.
Many expert systems need a lot of data. For a long time this point has appeared to be a bottleneck for the growth of Artificial Intelligence applications. A major way to provide an expert system with knowledge is to enter it by hand. With the maturity of Natural Language Processing (NLP), a new way has been opened with automatic Information Extraction (IE) from text. This paper briefly presents a financial decision support system, named SAPE, connected with an IE system. This application is used by Caisse des dépôts et consignations (CDC) in order to anticipate takeover bids on the European stock markets. It provides ways to manage the highly complex and moving network of European shareholdings. SAPE is available on CDC group"s intranet and is used by fund managers as part of their everyday work. This paper also describes how our NLP system, Exosème, uses the economic newswire from the Agence France-Presse (French press agency) in order to extract information on shareholdings and how this information is managed by the user to provide SAPE database with the large amount of information needed for its computations. After months of use, IE has appeared to be a powerful concrete solution. Moreover, if the economic value of takeover bids has lead us to pay particular attention to shareholdings, this approach can be extended to other events. In fact, with IE, new possibilities for portfolio decision support systems are coming. This paper presents the improvements we plan, and discusses those, though tempting, that are still out of reach due to the lack of adaptive tools.  相似文献   

12.
Multimedia news may be organized by the keywords and categories for exploration and retrieval applications, but it is very difficult to integrate the relation and visual information into the traditional category browsing and keyword-based search framework. This paper propose a new semantic model that can integrate keyword, relation and visual information in a uniform framework. Based on this semantic representation framework, the news exploration and retrieval applications can be organized by not only keywords and categories but also relations and visual properties. We also proposed a set of algorithms to automatically extract the proposed semantic model automatically from large collection of multimedia news reports.  相似文献   

13.
The focus on the use of existing and new technologies to facilitate advances in medical imaging and medical informatics (MIMI) is often directed to the technical capabilities and possibilities that these technologies bring. The technologies, though, in acting as a mediating agent alter the dynamics and context of information delivery in subtle ways. While these changes bring benefits in more efficient information transfer and offer the potential of better healthcare, they also disrupt traditional processes and practices which have been formulated for a different setting. The governance processes that underpin core ethical principles, such as patient confidentiality and informed consent, may no longer be appropriate in a new technological context. Therefore, in addition to discussing new methodologies, techniques and applications, there is need for a discussion of ethical, legal and socio-economic (ELSE) issues surrounding the use and application of technologies in MIMI. Consideration of these issues is especially important for the area of medical informatics which after all exists to support patients, healthcare practitioners and inform science. This paper brings to light some important ethical, legal and socio-economic issues related to MIMI with the aim of furthering an interdisciplinary approach to the increasing use of Information and Communication Technologies (ICT) in healthcare.  相似文献   

14.
自然语言处理在信息检索中的应用综述   总被引:5,自引:0,他引:5  
在信息检索发展的过程中,研究者们不断尝试着将自然语言处理应用到检索里,希望能够为检索效果提高带来帮助。然而这些尝试的结果大多和研究者们最初的设想相反,自然语言处理在大多数情况下没有改进信息检索效果,甚至反而起了负面作用。即便有一些帮助,也往往是微小的,远远不如自然语言处理所需要的计算消耗那么大。研究者们对这些现象进行了分析,认为: 自然语言处理更适合于应用在需要精确结果的任务中,例如问答系统、信息抽取等;自然语言处理需要针对信息检索进行优化才可能发挥积极作用。最新的一些进展(例如在语言模型中加入自然语言处理)在一定程度上印证了这一结论。  相似文献   

15.
Peer-to-peer (p2p) networks are being increasingly adopted as an invaluable resource for various information retrieval (IR) applications, including similarity estimation, content recommendation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts regarding the ability to actually extract sufficiently accurate information.This paper quantifies the measurement effort required to obtain and optimize the information obtained from p2p networks for the purpose of IR applications. We identify and measure inherent difficulties in collecting p2p data, namely, partial crawling, user-generated noise, sparseness, and popularity and localization of content and search queries. These aspects are quantified using music files shared in the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the popular content using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the “long tail”, hence a much more exhaustive crawl is needed. Furthermore, we show that content and search queries are highly localized, indicating that location-crossing conclusions require a wide spread spatial crawl. Finally, we present techniques for overcoming noise originating from user generated content and for filtering non-informative data, while minimizing information loss.  相似文献   

16.
Since the introduction of the smart grid, accelerated deployment of various smart grid technologies and applications have been experienced. This allows the traditional power grid to become more reliable, resilient, and efficient. Despite such a widespread deployment, it is still not clear which communication technology solutions are the best fit to support grid applications. This is because different smart grid applications have different network requirements – in terms of data payloads, sampling rates, latency and reliability. Based on a variety of smart grid use cases and selected standards, this paper compiles information about different communication network requirements for different smart grid applications, ranging from those used in a Home Area Network (HAN), Neighborhood Area Network (NAN) and Wide-Area Network (WAN). Communication technologies used to support implementation of selected smart grid projects are also discussed. This paper is expected to serve as a comprehensive database of technology requirements and best practices for use by communication engineers when designing a smart grid network.  相似文献   

17.
Search has become a hot topic in Internet computing, with rival search engines battling to become the de facto Web portal, harnessing search algorithms to wade through information on a scale undreamed of by early information retrieval (IR) pioneers. This article examines how search has matured from its roots in specialized IR systems to become a key foundation of the Web. The authors describe new challenges posed by the Web's scale, and show how search is changing the nature of the Web as much as the Web has changed the nature of search.  相似文献   

18.
Wildcard Search in Structured Peer-to-Peer Networks   总被引:1,自引:0,他引:1  
We address wildcard search in structured peer-to-peer (P2P) networks, which, to our knowledge, has not yet been explored in the literature. We begin by presenting an approach based on some well-known techniques in information retrieval (IR) and discuss why it is not appropriate in a distributed environment. We then present a simple and novel technique to index objects for wildcard search in a fully decentralized manner, along with some search strategies to retrieve objects. Our index scheme, as opposed to a traditional IR approach, can achieve quite balanced loads, avoid hop spots and single point of failure, reduce storage and maintenance costs, and offer some ranking mechanisms for matching objects. We use the compact disc (CD) records collected in FreeDB (http://freedb.org) as the experimental data set to evaluate our scheme. The results confirm that our index scheme is very effective in balancing the load. Moreover, search efficiency depends on the information given in a query: the more the information, the higher the performance.  相似文献   

19.
With the deep combination of both modern information technology and traditional agriculture,the era of agriculture 4.0,which takes the form of smart agriculture,has come.Smart agriculture provides solutions for agricultural intelligence and automation.However,information security issues cannot be ignored with the development of agriculture brought by modern information technology.In this paper,three typical development modes of smart agriculture(precision agriculture,facility agriculture,and order agriculture)are presented.Then,7 key technologies and 11 key applications are derived from the above modes.Based on the above technologies and applications,6 security and privacy countermeasures(authentication and access control,privacy-preserving,blockchain-based solutions for data integrity,cryptography and key management,physical countermeasures,and intrusion detection systems)are summarized and discussed.Moreover,the security challenges of smart agriculture are analyzed and organized into two aspects:1)agricultural production,and 2)information technology.Most current research projects have not taken agricultural equipment as potential security threats.Therefore,we did some additional experiments based on solar insecticidal lamps Internet of Things,and the results indicate that agricultural equipment has an impact on agricultural security.Finally,more technologies(5 G communication,fog computing,Internet of Everything,renewable energy management system,software defined network,virtual reality,augmented reality,and cyber security datasets for smart agriculture)are described as the future research directions of smart agriculture.  相似文献   

20.
郎皓  王斌  李锦涛  丁凡 《软件学报》2008,19(2):291-300
目前,查询性能预测(predicting query performance,简称PQP)已经被认为是检索系统最重要的功能之一.近几年的研究和实验表明,PQP技术在文本检索领域有着广阔的发展前景和拓展空间.对文本检索中的PQP进行综述,重点论述其主要方法和关键技术.首先介绍了常用的实验语料和评价体系;然后介绍了影响查询性能的各方面因素;之后,按照基于检索前和检索后的分类体系概述了目前主要的PQP方法;简介了PQP在几个方面的应用;最后讨论了PQP所面临的一些挑战.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号