首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 14 毫秒
1.
A knowledge organization system (KOS) can help easily indicate the deep knowledge structure of a patent document set. Compared to classification code systems, a personalized KOS made up of topics can represent the technology information in a more agile, detailed manner. This paper presents an approach to automatically construct a KOS of patent documents based on term clumping, Latent Dirichlet Allocation (LDA) model, K-Means clustering and Principal Components Analysis (PCA). Term clumping is adopted to generate a better bag-of-words for topic modeling and LDA model is applied to generate raw topics. Then by iteratively using K-Means clustering and PCA on the document set and topics matrix, we generated new upper topics and computed the relationships between topics to construct a KOS. Finally, documents are mapped to the KOS. The nodes of the KOS are topics which are represented by terms and their weights and the leaves are patent documents. We evaluated the approach with a set of Large Aperture Optical Elements (LAOE) patent documents as an empirical study and constructed the LAOE KOS. The method used discovered the deep semantic relationships between the topics and helped better describe the technology themes of LAOE. Based on the KOS, two types of applications were implemented: the automatic classification of patents documents and the categorical refinements above search results.  相似文献   

2.

Probabilistic topic modeling algorithms like Latent Dirichlet Allocation (LDA) have become powerful tools for the analysis of large collections of documents (such as papers, projects, or funding applications) in science, technology an innovation (STI) policy design and monitoring. However, selecting an appropriate and stable topic model for a specific application (by adjusting the hyperparameters of the algorithm) is not a trivial problem. Common validation metrics like coherence or perplexity, which are focused on the quality of topics, are not a good fit in applications where the quality of the document similarity relations inferred from the topic model is especially relevant. Relying on graph analysis techniques, the aim of our work is to state a new methodology for the selection of hyperparameters which is specifically oriented to optimize the similarity metrics emanating from the topic model. In order to do this, we propose two graph metrics: the first measures the variability of the similarity graphs that result from different runs of the algorithm for a fixed value of the hyperparameters, while the second metric measures the alignment between the graph derived from the LDA model and another obtained using metadata available for the corresponding corpus. Through experiments on various corpora related to STI, it is shown that the proposed metrics provide relevant indicators to select the number of topics and build persistent topic models that are consistent with the metadata. Their use, which can be extended to other topic models beyond LDA, could facilitate the systematic adoption of this kind of techniques in STI policy analysis and design.

  相似文献   

3.
《World Patent Information》1988,10(2):133-149
Patent literature is a unique source of information. In research management, it is often important to compare the best research areas in which a fit exists between academic scientific production and applied research. Comparison between different sets of patents is also useful. The purpose of this paper is to provide a fast and efficient way to achieve this goal by automatically mapping the research network of the set of references to be examined.  相似文献   

4.
This paper analyzes whether methods from social network analysis can be adopted for the modeling of scientific fields in order to obtain a better understanding of the respective scientific area. The approach proposed is based on articles published within the respective scientific field and certain types of nodes deduced from these papers, such as authors, journals, conferences and organizations. As a proof of concept, the techniques discussed here are applied to the field of ‘Mobile Social Networking’. For this purpose, a tool was developed to create a large data collection representing the aforementioned field. The paper analyzes various views on the complete network and discusses these on the basis of the data collected on Mobile Social Networking. The authors demonstrate that the analysis of particular subgraphs derived from the data collection allows the identification of important authors as well as separate sub-disciplines such as classic network analysis and sensor networks and also contributes to the classification of the field of ‘Mobile Social Networking’ within the greater context of computer science, applied mathematics and social sciences. Based on these results, the authors propose a set of concrete services which could be offered by such a network and which could help the user to deal with the scientific information process. The paper concludes with an outlook upon further possible research topics.  相似文献   

5.
Patent and scientific literature are the fundamental sources of innovation in knowledge creation and transfer activities. Establishing and understanding the complex relationships between them can help scientists and stakeholders to predict and promote the innovation process. In this paper, we consider the domain of nanoscience, using a large scale collection of patents and scientific literature to find evolution patterns and distinctive keywords of each topic in a particular period. By extracting the semantic-level topics from a dataset of nearly 810,000 scientific literature from Web of Science and 160,000 patents from Derwent, the results reveal that the degree of topic popularity for both innovative platforms shows a totally different situation during the last 20 years from 1995 to 2015. In addition, the top keywords of patents and scientific literature, representing the topic content of concern, have changed respectively as time went on. Not only our analysis can be used for confirming existing topics in nanoscience, but it also gives new views on the relationship between scientific literature and patents.  相似文献   

6.
7.
8.
Rehs  Andreas 《Scientometrics》2020,125(2):1229-1251

The detection of differences or similarities in large numbers of scientific publications is an open problem in scientometric research. In this paper we therefore develop and apply a machine learning approach based on structural topic modelling in combination with cosine similarity and a linear regression framework in order to identify differences in dissertation titles written at East and West German universities before and after German reunification. German reunification and its surrounding time period is used because it provides a structure with both minor and major differences in research topics that could be detected by our approach. Our dataset is based on dissertation titles in economics and business administration and chemistry from 1980 to 2010. We use university affiliation and year of the dissertation to train a structural topic model and then test the model on a set of unseen dissertation titles. Subsequently, we compare the resulting topic distribution of each title to every other title with cosine similarity. The cosine similarities and the regional and temporal origin of the dissertation titles they come from are then used in a linear regression approach. Our results on research topics in economics and business administration suggest substantial differences between East and West Germany before the reunification and a rapid conformation thereafter. In chemistry we observe minor differences between East and West before the reunification and a slightly increased similarity thereafter.

  相似文献   

9.

A thermodynamic approach has been applied to solving the problem of selecting the number of clusters/topics in topic modeling. The main principles of this approach are formulated and the behavior of topic models during temperature variations is studied. Using thermodynamic formalism, the existence of the entropy phase transition in topic models is shown and criteria for the choice of optimum number of clusters/ topics are determined.

  相似文献   

10.
The editorial handling of articles in scientific journals as a human activity process is considered. Using recently proposed approaches of human dynamics theory we examine the probability distributions of random variables reflecting the temporal characteristics of studied processes. The first part of this article contains our results of analysis of the real data about articles published in scientific journals. The second part is devoted to modeling of time-series connected with editorial work. The purpose of our study is to present new object that can be studied in terms of human dynamics theory and to corroborate the scientometrical application of the results obtained.  相似文献   

11.
全极化微波辐射计遥感海面风场的 关键技术和科学问题   总被引:6,自引:0,他引:6  
全极化微波辐射计是一种新型的微波遥感器,它不但测量目标微波辐射信号的两个正交极化分量,并且测量两个正交极化分量的复相关量。这些相关信息对于表面微波辐射的各向异性更加敏感,这样就提供了测量海面风场的一种手段。 文章在介绍国内外全极化微波辐射计现状和特点的基础上,首先对于全极化微波辐射计遥感海面风场的原理及其影响因素进行分析;然后,从全极化微波辐射计的硬件设计和定标两个方面论述了硬件实现的关键技术问题;最后,对于风场反演的有关科学问题进行分析,重点论述了风场反演算法建立的关键,主要技术指标对于风场反演误差的影  相似文献   

12.
Population modeling of the emergence and development of scientific fields   总被引:1,自引:0,他引:1  
We analyze the temporal evolution of emerging fields within several scientific disciplines in terms of numbers of authors and publications. From bibliographic searches we construct databases of authors, papers, and their dates of publication. We show that the temporal development of each field, while different in detail, is well described by population contagion models, suitably adapted from epidemiology to reflect the dynamics of scientific interaction. Dynamical parameters are estimated and discussed to reflect fundamental characteristics of the field, such as time of apprenticeship and recruitment rate. We also show that fields are characterized by simple scaling laws relating numbers of new publications to new authors, with exponents that reflect increasing or decreasing returns in scientific productivity.  相似文献   

13.
14.
Viegas  Felipe  Pereira  Antônio  Cecílio  Pablo  Tuler  Elisa  Meira  Wagner  Gonçalves  Marcos  Rocha  Leonardo 《Scientometrics》2022,127(8):5005-5026

Recent efforts have focused on identifying multidisciplinary teams and detecting co-Authorship Networks based on exploring topic modeling to identify researchers’ expertise. Though promising, none of these efforts perform a real-life evaluation of the quality of the built topics. This paper proposes a Semantic Academic Profiler (SAP) framework that allows summarizing articles written by researchers to automatically build research profiles and perform online evaluations regarding these built profiles. SAP exploits and extends state-of-the-art Topic Modeling strategies based on Cluwords considering n-grams and introduces a new visual interface able to highlight the main topics related to articles, researchers and institutions. To evaluate SAP’s capability of summarizing the profile of such entities as well as its usefulness for supporting online assessments of the topics’ quality, we perform and contrast two types of evaluation, considering an extensive repository of Brazilian curricula vitae: (1) an offline evaluation, in which we exploit a traditional metric (NPMI) to measure the quality of several data representations strategies including (i) TFIDF, (ii) TFIDF with Bi-grams, (iii) Cluwords, and (iv) CluWords with Bi-grams; and (2) an online evaluation through an A/B test where researchers evaluate their own built profiles. We also perform an online assessment of SAP user interface through a usability test following the SUS methodology. Our experiments indicate that the CluWords with Bi-grams is the best solution and the SAP interface is very useful. We also observed essential differences in the online and offline assessments, indicating that using both together is very important for a comprehensive quality evaluation. Such type of study is scarce in the literature and our findings open space for new lines of investigation in the Topic Modeling area.

  相似文献   

15.
Jang  Wooseok  Park  Yongtae  Seol  Hyeonju 《Scientometrics》2021,126(8):6505-6532
Scientometrics - As technology rapidly advances with the Fourth Industrial Revolution, many emerging technologies have been developed in several technology sectors. These technologies can (1)...  相似文献   

16.
The goal of our work is inspired by the task of associating segments of text to their real authors. In this work, we focus on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and to thus simulate/ mimic such behavior accordingly. Unlike the majority of the work done in this field (i.e. authorship attribution, plagiarism detection, etc.) which uses content features, we focus only on the stylometric, i.e. content-agnostic, characteristics of authors. Therefore, we conducted two pilot studies to determine, if humans can identify authorship among documents with high content similarity. The first was a quantitative experiment involving crowd-sourcing, while the second was a qualitative one executed by the authors of this paper. Both studies confirmed that this task is quite challenging. To gain a better understanding of how humans tackle such a problem, we conducted an exploratory data analysis on the results of the studies. In the first experiment, we compared the decisions against content features and stylometric features. While in the second, the evaluators described the process and the features on which their judgment was based. The findings of our detailed analysis could (1) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (2) assist forensic experts or linguists to create profiles of writers, (3) support intelligence applications to analyze aggressive and threatening messages and (4) help editor conformity by adhering to, for instance, journal specific writing style.  相似文献   

17.
Arun Agarwal 《Sadhana》1993,18(2):189-208
We describe the organization and several components of an automated document processing system that begins with digitized images of documents and produces representations at higher levels. Such representations inlcude: the visual sketch (connected components extracted from the background), physical layout (spatial extents of blocks corresponding to text, graphics), logical layout (grouping of strings into words and phrases), and block primitives (e.g., recognised characters and words in text blocks, recognition of hand-drawn line drawings i.e. schematic electronic circuits). We describe algorithms for deriving several of the representations and describe the interaction of different modules. The methods are illustrated with examples.  相似文献   

18.
19.
A method to identify core documents within a given subject domain has been developed by the author. The method builds on the concept of polyrepresentation by using different search rationales in several databases and isolating the overlaps between them. This paper delineates the ideas behind the method and describes the study done to measure its effectiveness. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

20.
本文扼要阐述了缩微胶片作为现代多媒体成像技术的主要组成部分,尤其是与光盘技术的配合应用,对促进文献档案的保藏和检索技术迅速发展的重要作用,以及影响缩微品文献保存性的因素和贮存的适宜条件。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号