首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cross impact analysis (CIA) consists of a set of related methodologies that predict the occurrence probability of a specific event and that also predict the conditional probability of a first event given a second event. The conditional probability can be interpreted as the impact of the second event on the first. Most of the CIA methodologies are qualitative that means the occurrence and conditional probabilities are calculated based on estimations of human experts. In recent years, an increased number of quantitative methodologies can be seen that use a large number of data from databases and the internet. Nearly 80% of all data available in the internet are textual information and thus, knowledge structure based approaches on textual information for calculating the conditional probabilities are proposed in literature. In contrast to related methodologies, this work proposes a new quantitative CIA methodology to predict the conditional probability based on the semantic structure of given textual information. Latent semantic indexing is used to identify the hidden semantic patterns standing behind an event and to calculate the impact of the patterns on other semantic textual patterns representing a different event. This enables to calculate the conditional probabilities semantically. A case study shows that this semantic approach can be used to predict the conditional probability of a technology on a different technology.  相似文献   

2.
We investigate an automated identification of weak signals according to Ansoff to improve strategic planning and technological forecasting. Literature shows that weak signals can be found in the organization’s environment and that they appear in different contexts. We use internet information to represent organization’s environment and we select these websites that are related to a given hypothesis. In contrast to related research, a methodology is provided that uses latent semantic indexing (LSI) for the identification of weak signals. This improves existing knowledge based approaches because LSI considers the aspects of meaning and thus, it is able to identify similar textual patterns in different contexts. A new weak signal maximization approach is introduced that replaces the commonly used prediction modeling approach in LSI. It enables to calculate the largest number of relevant weak signals represented by singular value decomposition (SVD) dimensions. A case study identifies and analyses weak signals to predict trends in the field of on-site medical oxygen production. This supports the planning of research and development (R&D) for a medical oxygen supplier. As a result, it is shown that the proposed methodology enables organizations to identify weak signals from the internet for a given hypothesis. This helps strategic planners to react ahead of time.  相似文献   

3.
The weak signal concept according to Ansoff has the aim to advance strategic early warning. It enables to predict the appearance of events in advance that are relevant for an organization. An example is to predict the appearance of a new and relevant technology for a research organization. Existing approaches detect weak signals based on an environmental scanning procedure that considers textual information from the internet. This is because about 80% of all data in the internet are textual information. The texts are processed by a specific clustering approach where clusters that represent weak signals are identified. In contrast to these related approaches, we propose a new methodology that investigates a sequence of clusters measured at successive points in time. This enables to trace the development of weak signals over time and thus, it enables to identify relevant weak signal developments for organization’s decision making in strategic early warning environment.  相似文献   

4.
In recent years, governmental and industrial espionage becomes an increased problem for governments and corporations. Especially information about current technology development and research activities are interesting targets for espionage. Thus, we introduce a new and automated methodology that investigates the information leakage risk of projects in research and technology (R&T) processed by an organization concerning governmental or industrial espionage. Latent semantic indexing is applied together with machine based learning and prediction modeling. This identifies semantic textual patterns representing technologies and their corresponding application fields that are of high relevance for the organization’s strategy. These patterns are used to estimate organization’s costs of an information leakage for each project. Further, a web mining approach is processed to identify worldwide knowledge distribution within the relevant technologies and corresponding application fields. This information is used to estimate the probability that an information leakage occur. A risk assessment methodology calculates the information leakage risk for each project. In a case study, the information leakage risk of defense based R&T projects is investigated. This is because defense based R&T is of particularly interest by espionage agents. Overall, it can be shown that the proposed methodology is successful in calculation the espionage information leakage risk of projects. This supports an organization by processing espionage risk management.  相似文献   

5.
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.  相似文献   

6.
接警日志包含时间、空间和案件描述信息,属于非结构时空数据.与时空社交媒体相比,接警日志的数据项之间存在较少的联系,数据项之间不能形成复杂网络关系,在挖掘其数据模式时难以提供有价值的线索,因此,其分析更加依赖于其中的语义挖掘和语义时空模式探索.针对这一问题,提出了一个可视分析框架支持对大规模非结构接警日志时空模式的交互探...  相似文献   

7.
We analyze the impact of textual information from e-commerce companies’ websites on their commercial success. The textual information is extracted from web content of e-commerce companies divided into the Top 100 worldwide most successful companies and into the Top 101 to 500 worldwide most successful companies. It is shown that latent semantic concepts extracted from the analysis of textual information can be adopted as success factors for a Top 100 e-commerce company classification. This contributes to the existing literature concerning e-commerce success factors. As evaluation, a regression model based on these concepts is built that is successful in predicting the commercial success of the Top 100 companies. These findings are valuable for e-commerce websites creation.  相似文献   

8.
In this introduction, we briefly summarize the state of data and text mining today. Taking a very broad view, we use the term information mining to refer to the organization and analysis of structured or unstructured data that can be quantitative, textual, and/or pictorial in nature. The key question, in our view, is, “How can we transform data (in the very broad sense of this term) into ‘actionable knowledge’, knowledge that we can use in pursuit of a specified objective(s).” After detailing a set of key components of information mining, we introduce each of the papers in this volume and detail the focus of their contributions.  相似文献   

9.
10.
融合语义主题的图像自动标注   总被引:7,自引:0,他引:7  
由于语义鸿沟的存在,图像自动标注已成为一个重要课题.在概率潜语义分析的基础上,提出了一种融合语义主题的方法以进行图像的标注和检索.首先,为了更准确地建模训练数据,将每幅图像的视觉特征表示为一个视觉"词袋";然后设计一个概率模型分别从视觉模态和文本模态中捕获潜在语义主题,并提出一种自适应的不对称学习方法融合两种语义主题.对于每个图像文档,它在各个模态上的主题分布通过加权进行融合,而权值由该文档的视觉词分布的熵值来确定.于是,融合之后的概率模型适当地关联了视觉模态和文本模态的信息,因此能够很好地预测未知图像的语义标注.在一个通用的Corel图像数据集上,将提出的方法与几种前沿的图像标注方法进行了比较.实验结果表明,该方法具有更好的标注和检索性能.  相似文献   

11.
Industrial tabular information extraction and its semantic fusion with text (ITIESF) is of great significance in converting and fusing industrial unstructured data into structured knowledge to guide cognitive intelligence analysis in the manufacturing industry. A novel end-to-end ITIESF approach is proposed to integrate tabular information and construct a tabular information-oriented causality event evolutionary knowledge graph (TCEEKG). Specifically, an end-to-end joint learning strategy is presented to mine the semantic information in tables. The definition and modeling method of the intrinsic relationships between tables with their rows and columns in engineering documents are provided to model the tabular information. Due to this, an end-to-end joint entity relationship extraction method for textual and tabular information from engineering documents is proposed to construct text-based knowledge graphs (KG) and tabular information-based causality event evolutionary graphs (CEEG). Then, a novel NSGCN (neighborhoods sample graph convolution network)-based entity alignment is proposed to fuse the cross-knowledge graphs into a unified knowledge base. Furthermore, a translation-based graph structure-driven Q&A (question and answer) approach is designed to respond to cause analysis and problem tracing. Our models can be easily integrated into a prototype system to provide a joint information processing and cognitive analysis. Finally, the approach is evaluated by employing the aerospace machining documents to illustrate that the TCEEKG can considerably help workers strengthen their skills in the cause-and-effect analysis of machining quality issues from a global perspective.  相似文献   

12.
We investigate the issue of predicting new customers as profitable based on information about existing customers in a business-to-business environment. In particular, we show how latent semantic concepts from textual information of existing customers’ websites can be used to uncover characteristics of websites of companies that will turn into profitable customers. Hence, the use of predictive analytics will help to identify new potential acquisition targets. Additionally, we show that a regression model based on these concepts is successful in the profitability prediction of new customers. In a case study, the acquisition process of a mail-order company is supported by creating a prioritized list of new customers generated by this approach. It is shown that the density of profitable customers in this list outperforms the density of profitable customers in traditional generated address lists (e.g. from list brokers). From a managerial point of view, this approach supports the identification of new business customers and helps to estimate the future profitability of these customers in a company. Consequently, the customer acquisition process can be targeted more effectively and efficiently. This leads to a competitive advantage for B2B companies and improves the acquisition process that is time- and cost-consuming with traditionally low conversion rates.  相似文献   

13.
This paper presents a novel approach to automatic image annotation which combines global, regional, and contextual features by an extended cross-media relevance model. Unlike typical image annotation methods which use either global or regional features exclusively, as well as neglect the textual context information among the annotated words, the proposed approach incorporates the three kinds of information which are helpful to describe image semantics to annotate images by estimating their joint probability. Specifically, we describe the global features as a distribution vector of visual topics and model the textual context as a multinomial distribution. The global features provide the global distribution of visual topics over an image, while the textual context relaxes the assumption of mutual independence among annotated words which is commonly adopted in most existing methods. Both the global features and textual context are learned by a probability latent semantic analysis approach from the training data. The experiments over 5k Corel images have shown that combining these three kinds of information is beneficial in image annotation.  相似文献   

14.
一种基于融合重构的子空间学习的零样本图像分类方法   总被引:1,自引:0,他引:1  
图像分类是计算机视觉中一个重要的研究子领域.传统的图像分类只能对训练集中出现过的类别样本进行分类.然而现实应用中,新的类别不断涌现,因而需要收集大量新类别带标记的数据,并重新训练分类器.与传统的图像分类方法不同,零样本图像分类能够对训练过程中没有见过的类别的样本进行识别,近年来受到了广泛的关注.零样本图像分类通过语义空间建立起已见类别和未见类别之间的关系,实现知识的迁移,进而完成对训练过程中没有见过的类别样本进行分类.现有的零样本图像分类方法主要是根据已见类别的视觉特征和语义特征,学习从视觉空间到语义空间的映射函数,然后利用学习好的映射函数,将未见类别的视觉特征映射到语义空间,最后在语义空间中用最近邻的方法实现对未见类别的分类.但是由于已见类和未见类的类别差异,以及图像的分布不同,从而容易导致域偏移问题.同时直接学习图像视觉空间到语义空间的映射会导致信息损失问题.为解决零样本图像分类知识迁移过程中的信息损失以及域偏移的问题,本文提出了一种图像分类中基于子空间学习和重构的零样本分类方法.该方法在零样本训练学习阶段,充分利用未见类别已知的信息,来减少域偏移,首先将语义空间中的已见类别和未见类别之间的关系迁移到视觉空间中,学习获得未见类别视觉特征原型.然后根据包含已见类别和未见类别在内的所有类别的视觉特征原型所在的视觉空间和语义特征原型所在的语义空间,学习获得一个潜在类别原型特征空间,并在该潜在子空间中对齐视觉特征和语义特征,使得所有类别在潜在子空间中的表示既包含视觉空间下的可分辨性信息,又包含语义空间下的类别关系信息,同时在子空间的学习过程中利用重构约束,减少信息损失,同时也缓解了域偏移问题.最后零样本分类识别阶段,在不同的空间下根据最近邻算法对未见类别样本图像进行分类.本文的主要贡献在于:一是通过对语义空间中类别间关系的迁移,学习获得视觉空间中未见类别的类别原型,使得在训练过程中充分利用未见类别的信息,一定程度上缓解域偏移问题.二是通过学习一个共享的潜在子空间,该子空间既包含了图像视觉空间中丰富的判别性信息,也包含了语义空间中的类别间关系信息,同时在子空间学习过程中,通过重构,缓解知识迁移过程中信息损失的问题.本文在四个公开的零样本分类数据集上进行对比实验,实验结果表明本文提出的零样本分类方法取得了较高的分类平均准确率,证明了本文方法的有效性.  相似文献   

15.
针对跨模态哈希检索方法中存在标签语义利用不充分,从而导致哈希码判别能力弱、检索精度低的问题,提出了一种语义相似性保持的判别式跨模态哈希方法.该方法将异构模态的特征数据投影到一个公共子空间,并结合多标签核判别分析方法将标签语义中的判别信息和潜在关联嵌入到公共子空间中;通过最小化公共子空间与哈希码之间的量化误差提高哈希码的判别能力;此外,利用标签构建语义相似性矩阵,并将语义相似性保留到所学的哈希码中,进一步提升哈希码的检索精度.在LabelMe、MIRFlickr-25k、NUS-WIDE三个基准数据集上进行了大量实验,其结果验证了该方法的有效性.  相似文献   

16.
Information quality is one of the key determinants of information system success. When information quality is poor, it can cause a variety of risks in an organization. To manage resources for information quality improvement effectively, it is necessary to understand where, how, and how much information quality impacts an organization's ability to successfully deliver its objectives. So far, existing approaches have mostly focused on the measurement of information quality but not adequately on the impact that information quality causes. This paper presents a model to quantify the business impact that arises through poor information quality in an organization by using a risk based approach. It hence addresses the inherent uncertainty in the relationship between information quality and organizational impact. The model can help information managers to obtain quantitative figures which can be used to build reliable and convincing business cases for information quality improvement.  相似文献   

17.
传统潜在语义分析(Latent Semantic Analysis, LSA)方法无法获得场景目标空间分布信息和潜在主题的判别信息。针对这一问题提出了一种基于多尺度空间判别性概率潜在语义分析(Probabilistic Latent Semantic Analysis, PLSA)的场景分类方法。首先通过空间金字塔方法对图像进行空间多尺度划分获得图像空间信息,结合PLSA模型获得每个局部块的潜在语义信息;然后串接每个特定局部块中的语义信息得到图像多尺度空间潜在语义信息;最后结合提出的权值学习方法来学习不同图像主题间的判别信息,从而得到图像的多尺度空间判别性潜在语义信息,并将学习到的权值信息嵌入支持向量基(Support Vector Machine, SVM)分类器中完成图像的场景分类。在常用的三个场景图像库(Scene-13、Scene-15和Caltech-101)上的实验表明,该方法平均分类精度比现有许多state-of-art方法均优。验证了其有效性和鲁棒性。  相似文献   

18.
We present Wiser, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking.Wiser indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author’s publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author’s expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia.At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author’s documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author’s expertise and the query topic via the above graph-based profiles.The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that Wiser achieves better performance than all the other competitors, thus proving the effectiveness of modeling author’s profile via our “semantic” graph of entities. Finally, we comment on the use of Wiser for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University.  相似文献   

19.
In this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) for text clustering. The main difficulty in the application of genetic algorithms (GAs) for document clustering is thousands or even tens of thousands of dimensions in feature space which is typical for textual data. Because the most straightforward and popular approach represents texts with the vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Latent semantic indexing (LSI) is a successful technology in information retrieval which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. Meanwhile, LSI takes into account the effects of synonymy and polysemy, which constructs a semantic structure in textual data. GA belongs to search techniques that can efficiently evolve the optimal solution in the reduced space. We propose a variable string length genetic algorithm which has been exploited for automatically evolving the proper number of clusters as well as providing near optimal data set clustering. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. The superiority of GAL approach over conventional GA applied in VSM model is demonstrated by providing good Reuter document clustering results.  相似文献   

20.
The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc. While the amount of textual data is increasing rapidly, users’ ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号