首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Gao  Qiang  Huang  Xiao  Dong  Ke  Liang  Zhentao  Wu  Jiang 《Scientometrics》2022,127(3):1543-1563

The combination of the topic model and the semantic method can help to discover the semantic distributions of topics and the changing characteristics of the semantic distributions, further providing a new perspective for the research of topic evolution. This study proposes a solution for quantifying the semantic distributions and the changing characteristics based on words in topic evolution through the Dynamic topic model (DTM) and the word2vec model. A dataset in the field of Library and information science (LIS) is utilized in the empirical study, and the topic-semantic probability distribution is derived. The evolving dynamics of the topics are constructed. The characteristics of evolving dynamics are used to explain the semantic distributions of topics in topic evolution. Then, the regularities of evolving dynamics are summarized to explain the changing characteristics of semantic distributions in topic evolution. Results show that no topic is distributed in a single semantic concept, and most topics correspond to various semantic concepts in LIS. The three kinds of topics in LIS are the convergent, diffusive, and stable topics. The discovery of different modes of topic evolution can further prove the development of the field. In addition, findings indicate that the popularity of topics and the characteristics of evolving dynamics of topics are irrelevant.

  相似文献   

2.
Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.  相似文献   

3.
Cagliero  Luca  Garza  Paolo  Kavoosifar  Mohammad Reza  Baralis  Elena 《Scientometrics》2018,116(2):1273-1301

Identifying the most relevant scientific publications on a given topic is a well-known research problem. The Author-Topic Model (ATM) is a generative model that represents the relationships between research topics and publication authors. It allows us to identify the most influential authors on a particular topic. However, since most research works are co-authored by many researchers the information provided by ATM can be complemented by the study of the most fruitful collaborations among multiple authors. This paper addresses the discovery of research collaborations among multiple authors on single or multiple topics. Specifically, it exploits an exploratory data mining technique, i.e., weighted association rule mining, to analyze publication data and to discover correlations between ATM topics and combinations of authors. The mined rules characterize groups of researchers with fairly high scientific productivity by indicating (1) the research topics covered by their most cited publications and the relevance of their scientific production separately for each topic, (2) the nature of the collaboration (topic-specific or cross-topic), (3) the name of the external authors who have (occasionally) collaborated with the group either on a specific topic or on multiple topics, and (4) the underlying correlations between the addressed topics. The applicability of the proposed approach was validated on real data acquired from the Online Mendelian Inheritance in Man catalog of genetic disorders and from the PubMed digital library. The results confirm the effectiveness of the proposed strategy.

  相似文献   

4.

Probabilistic topic modeling algorithms like Latent Dirichlet Allocation (LDA) have become powerful tools for the analysis of large collections of documents (such as papers, projects, or funding applications) in science, technology an innovation (STI) policy design and monitoring. However, selecting an appropriate and stable topic model for a specific application (by adjusting the hyperparameters of the algorithm) is not a trivial problem. Common validation metrics like coherence or perplexity, which are focused on the quality of topics, are not a good fit in applications where the quality of the document similarity relations inferred from the topic model is especially relevant. Relying on graph analysis techniques, the aim of our work is to state a new methodology for the selection of hyperparameters which is specifically oriented to optimize the similarity metrics emanating from the topic model. In order to do this, we propose two graph metrics: the first measures the variability of the similarity graphs that result from different runs of the algorithm for a fixed value of the hyperparameters, while the second metric measures the alignment between the graph derived from the LDA model and another obtained using metadata available for the corresponding corpus. Through experiments on various corpora related to STI, it is shown that the proposed metrics provide relevant indicators to select the number of topics and build persistent topic models that are consistent with the metadata. Their use, which can be extended to other topic models beyond LDA, could facilitate the systematic adoption of this kind of techniques in STI policy analysis and design.

  相似文献   

5.

A thermodynamic approach has been applied to solving the problem of selecting the number of clusters/topics in topic modeling. The main principles of this approach are formulated and the behavior of topic models during temperature variations is studied. Using thermodynamic formalism, the existence of the entropy phase transition in topic models is shown and criteria for the choice of optimum number of clusters/ topics are determined.

  相似文献   

6.
A knowledge organization system (KOS) can help easily indicate the deep knowledge structure of a patent document set. Compared to classification code systems, a personalized KOS made up of topics can represent the technology information in a more agile, detailed manner. This paper presents an approach to automatically construct a KOS of patent documents based on term clumping, Latent Dirichlet Allocation (LDA) model, K-Means clustering and Principal Components Analysis (PCA). Term clumping is adopted to generate a better bag-of-words for topic modeling and LDA model is applied to generate raw topics. Then by iteratively using K-Means clustering and PCA on the document set and topics matrix, we generated new upper topics and computed the relationships between topics to construct a KOS. Finally, documents are mapped to the KOS. The nodes of the KOS are topics which are represented by terms and their weights and the leaves are patent documents. We evaluated the approach with a set of Large Aperture Optical Elements (LAOE) patent documents as an empirical study and constructed the LAOE KOS. The method used discovered the deep semantic relationships between the topics and helped better describe the technology themes of LAOE. Based on the KOS, two types of applications were implemented: the automatic classification of patents documents and the categorical refinements above search results.  相似文献   

7.
In recent years, the Triple Helix model has identified feasible approaches to measuring relations among universities, industries, and governments. Results have been extended to different databases, regions, and perspectives. This paper explores how bibliometrics and text mining can inform Triple Helix analyses. It engages Competitive Technical Intelligence concepts and methods for studies of Newly Emerging Science & Technology (NEST) in support of technology management and policy. A semantic TRIZ approach is used to assess NEST innovation patterns by associating topics (using noun phrases to address subjects and objects) and actions (via verbs). We then classify these innovation patterns by the dominant categories of origination: Academy, Industry, or Government. We then use TRIZ tags and benchmarks to locate NEST progress using Technology Roadmapping. Triple Helix inferences can then be related to the visualized patterns. We demonstrate these analyses via a case study for dye-sensitized solar cells.  相似文献   

8.
An applied mathematics perspective on stochastic modelling for climate   总被引:1,自引:0,他引:1  
Systematic strategies from applied mathematics for stochastic modelling in climate are reviewed here. One of the topics discussed is the stochastic modelling of mid-latitude low-frequency variability through a few teleconnection patterns, including the central role and physical mechanisms responsible for multiplicative noise. A new low-dimensional stochastic model is developed here, which mimics key features of atmospheric general circulation models, to test the fidelity of stochastic mode reduction procedures. The second topic discussed here is the systematic design of stochastic lattice models to capture irregular and highly intermittent features that are not resolved by a deterministic parametrization. A recent applied mathematics design principle for stochastic column modelling with intermittency is illustrated in an idealized setting for deep tropical convection; the practical effect of this stochastic model in both slowing down convectively coupled waves and increasing their fluctuations is presented here.  相似文献   

9.
The topic of fuzzy set theory was examined using the occurrence of phrases in bibliographic records. Records containing the word fuzzy, were downloaded from over 100 databases, and from these records, phrases were extracted surrounding the word fuzzy. A methodology was developed to trim this list of phrases to a list of high frequency phrases relevant to fuzzy set theory. This list of phrases was in turn used to extract records from the original downloaded set, which were (algorithmically) relevant to fuzzy set theory. This set of records was then analysed to show the development of the topic of fuzzy set theory, the distribution of the fuzzy phrases over time and the frequency distribution of the fuzzy phrases. In addition, the field of the bibliographic record in which the phrase occurred was examined, as well as the first appearance of a particular fuzzy phrase.  相似文献   

10.
User‐generated reviews can serve as an efficient tool for evaluating the customer‐perceived quality of online products and services. This article proposes a joint control chart for monitoring the quantitative evolution of document‐level topics and sentiments in online customer reviews. A sequential model is constructed to convert the temporally correlated document collections to topic and sentiment distributions, which are subsequently used to monitor the topics that users are concerned about and the topic‐specific opinions in an ongoing product and service process. Simulation studies on various data scenarios demonstrate the superior performance of the proposed control chart in terms of both detecting shifts and identifying truly out‐of‐control terms.  相似文献   

11.
曹准  谢吾  李永建 《包装工程》2018,39(6):89-93
目的人机交互方式的丰富性和交互量的日益增加,交互词与其上下位通过联想或者搭配关系,形成一个完成的意义有助于减少用户的思考时间,提高交互效率,因此对交互词和交互效率之间的关系进行研究具有现实意义。通过认知学方法研究网页交互词语义对交互效率的影响,试图了解不同交互词与交互效率之间的关系。方法采用理论分析与认知学实验相结合的方法,辅以问卷调查及认知学实验等方式获取数据,通过统计等手段对问卷及反应时等数据进行分析处理后,并结合相关理论进行分析。结论证实了交互词语义距离对网页交互效率具有显著影响,并且上下位短语的语义关系也是影响交互效率的要素之一,未来的界面设计中,可以将交互词的语义距离及上下位短语的语义关系等要素作为设计中需要考虑的变量加以权衡。  相似文献   

12.
从地理本体入手,构造了空间语义角色和实例模式,通过空间语义角色标注、短语识别以及模式匹配等手段,从Web中提取出与空间位置相关的本体实例,为问答式移动空间信息服务提供了可动态更新且具有明确语义的数据源,并利用语义Web技术,为移动用户提供问答式本体实例查询服务.初步实验表明,该方法具有较好的准确率,召回率有待进一步提高.  相似文献   

13.
针对基于短语的统计机器翻译(SMT)模型中由于采用精确匹配策略导致的短语稀疏问题,提出了一种基于短语相似度的统计机器翻译模型.该模型将基于实例的翻译方法引入到统计机器翻译中.翻译时,对于训练语料库中未出现过的短语,通过计算源语言短语之间的相似度,采用模糊匹配策略从短语表中查找相似的实例短语,并根据实例短语为其构造翻译.与精确匹配策略相比,利用相似度进行模糊匹配增加了对短语表的利用程度,缓解了短语稀疏问题.实验表明,该模型能够明显地提高统计机器翻译的质量,效果超过了当前最好的短语系统"摩西(Moses)".  相似文献   

14.
Calculating the semantic similarity of two sentences is an extremely challenging problem. We propose a solution based on convolutional neural networks (CNN) using semantic and syntactic features of sentences. The similarity score between two sentences is computed as follows. First, given a sentence, two matrices are constructed accordingly, which are called the syntax model input matrix and the semantic model input matrix; one records some syntax features, and the other records some semantic features. By experimenting with different arrangements of representing the syntactic and semantic features of the sentences in the matrices, we adopt the most effective way of constructing the matrices. Second, these two matrices are given to two neural networks, which are called the sentence model and the semantic model, respectively. The convolution process of the neural networks of the two models is carried out in multiple perspectives. The outputs of the two models are combined as a vector, which is the representation of the sentence. Third, given the representation vectors of two sentences, the similarity score of these representations is computed by a layer in the CNN. Experiment results show that our algorithm (SSCNN) surpasses the performance MPCPP, which noticeably the best recent work of using CNN for sentence similarity computation. Comparing with MPCNN, the convolution computation in SSCNN is considerably simpler. Based on the results of this work, we suggest that by further utilization of semantic and syntactic features, the performance of sentence similarity measurements has considerable potentials to be improved in the future.  相似文献   

15.
Given that many frontiers and hotspots of science and technology are emerging from interdisciplines, the accurate identification and forecasting of interdisciplinary topics has become increasingly significant. Existing methods of interdisciplinary topic identification have their respective application fields, and each identification result can help researchers acquire partial characteristics of interdisciplinary topics. This paper offers an integrated method for identifying and predicting interdisciplinary topics from scientific literature. It integrates various methods, including co-occurrence networks analysis, high-TI terms analysis and burst detection, and offers an overall perspective into interdisciplinary topic identification. The results of the different methods are mutually confirmed and complemented, further overviewing the characteristics of the interdisciplinary field and highlighting the importance or potential of interdisciplinary topics. In this study, Information Science and Library Science is selected as a case study. The research has clearly shown that more accurate and comprehensive results can be achieved for interdisciplinary topic identification and prediction by employing this integrated method. Further, the integration of different methods has promising potential for application in knowledge discovery and scientific measurement in the future.  相似文献   

16.
This paper deals with two main topics. The first one concerns the equivalence of stress algorithms, based on a Backward-Euler-step applied on viscoplastic models of Chaboche-type, and their elastoplastic counterpart. Generally, the stress algorithm yields a system of non-linear algebraic equations and the corresponding consistent tangent operator, occurring in the principle of virtual displacements, leads to a system of linear equations. This procedure can be obtained utilizing only numerical methods. The second topic concerns a special constitutive relation based on a kinematic hardening model using a sum of Armstrong/Frederick terms, which is equivalent to a multi-surface plasticity model. Applying this model a so-called problem-adapted stress algorithm is derived, where only one non-linear equation must be solved. This result is independent of the number of terms in the hardening model. Furthermore, only the viscoplastic algorithm must be implemented, since it includes the elastoplastic constitutive model as a special case. © 1997 by John Wiley & Sons, Ltd.  相似文献   

17.
Rehs  Andreas 《Scientometrics》2020,125(2):1229-1251

The detection of differences or similarities in large numbers of scientific publications is an open problem in scientometric research. In this paper we therefore develop and apply a machine learning approach based on structural topic modelling in combination with cosine similarity and a linear regression framework in order to identify differences in dissertation titles written at East and West German universities before and after German reunification. German reunification and its surrounding time period is used because it provides a structure with both minor and major differences in research topics that could be detected by our approach. Our dataset is based on dissertation titles in economics and business administration and chemistry from 1980 to 2010. We use university affiliation and year of the dissertation to train a structural topic model and then test the model on a set of unseen dissertation titles. Subsequently, we compare the resulting topic distribution of each title to every other title with cosine similarity. The cosine similarities and the regional and temporal origin of the dissertation titles they come from are then used in a linear regression approach. Our results on research topics in economics and business administration suggest substantial differences between East and West Germany before the reunification and a rapid conformation thereafter. In chemistry we observe minor differences between East and West before the reunification and a slightly increased similarity thereafter.

  相似文献   

18.
One of the topics which forms part of CONRAD project addresses the problems related to the dosimetry of complex-mixed radiation fields at workplaces. This topic was included in work package (WP) 6. WP 6 was established to co-ordinate research activities in two areas:the development of new techniques and the improvement of current techniques for characterisation of complex workplace fields (including high-energy fields and pulsed fields): measurement and calculation of particle energy and direction distributions (Subgroup A); and model improvements for dose assessment of solar particle events (Subgroup B). In both cases in order to aid the research, WP 6 increases the efficiency of resource utilisation, and facilitates the technology transfer to practical application and for the development of standards. This contribution presents a general overview of activities of SG A; specific results related to the benchmark experiment at GSI Darmstadt are presented separately, and will be published in other way. As far as the results acquired in the frame of the SG B activities, these are presented in the meeting held as part of EURADOS AM 2008.  相似文献   

19.
Research on aquaculture is expanding along with the exceptional growth of the sector and has an important role in supporting even further the future developments of this relatively young food production industry. In this paper we examined the aquaculture literature using bibliometrics and computational semantics methods (latent semantic analysis, topic model and co-citation analysis) to identify the main themes and trends in research. We analysed bibliographic information and abstracts of 14,308 scientific articles on aquaculture recorded in Scopus. Both the latent semantic analysis and the topic model indicate that the broad themes of research on aquaculture are related to genetics and reproduction, growth and physiology, farming systems and environment, nutrition, water quality, and health. The topic model gives an estimate of the relevance of these research themes by single articles, authors, research institutions, species and time. With the co-citation analysis it was possible to identify more specific research fronts, which are attracting high number of co-citations by the scientific community. The largest research fronts are related to probiotics, benthic sediments, genomics, integrated aquaculture and water treatment. In terms of temporal evolution, some research fronts such as probiotics, genomics, sea-lice, and environmental impacts from cage aquaculture, are still expanding while others, such as mangroves and shrimp farming, benthic sediments, are gradually losing weight. While bibliometric methods do not necessarily provide a measure of output or impact of research activities, they proved useful for mapping a research area, identifying the relevance of themes in the scientific literature and understanding how research fronts evolve and interact. By using different methodological approaches the study is taking advantage of the strengths of each method in mapping the research on aquaculture and showing in the meantime possible limitations and some directions for further improvements.  相似文献   

20.
The bibliometrics research on nanotechnology highlights close interrelationships between scientific and technological activities (S&T) in the field of nanotechnology. Notwithstanding abundant empirical evidence on the mutual relations between S&T, the dynamics of the relationship from a contextual perspective have gained relatively little attention. Accordingly, our understanding of how science- and technology-oriented nanotechnology identifies development opportunities from each other is still at a nascent stage. To address this gap, by focusing on nanotechnology in the semiconductor industry, we use structural topic model to empirically explore the dynamic interrelationships between science- and technology-oriented nanotechnology. We empirically delineate the dynamic development trends in the context of the interrelationships between S&T and demonstrate how development opportunities are identified from each other. These findings show a new window of opportunities for how state-of-the-art models for semantic analysis can be used in the literature on S&T interrelationships.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号