首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
Gao  Qiang  Huang  Xiao  Dong  Ke  Liang  Zhentao  Wu  Jiang 《Scientometrics》2022,127(3):1543-1563

The combination of the topic model and the semantic method can help to discover the semantic distributions of topics and the changing characteristics of the semantic distributions, further providing a new perspective for the research of topic evolution. This study proposes a solution for quantifying the semantic distributions and the changing characteristics based on words in topic evolution through the Dynamic topic model (DTM) and the word2vec model. A dataset in the field of Library and information science (LIS) is utilized in the empirical study, and the topic-semantic probability distribution is derived. The evolving dynamics of the topics are constructed. The characteristics of evolving dynamics are used to explain the semantic distributions of topics in topic evolution. Then, the regularities of evolving dynamics are summarized to explain the changing characteristics of semantic distributions in topic evolution. Results show that no topic is distributed in a single semantic concept, and most topics correspond to various semantic concepts in LIS. The three kinds of topics in LIS are the convergent, diffusive, and stable topics. The discovery of different modes of topic evolution can further prove the development of the field. In addition, findings indicate that the popularity of topics and the characteristics of evolving dynamics of topics are irrelevant.

  相似文献   

2.
Traditional topic models have been widely used for analyzing semantic topics from electronic documents. However, the obvious defects of topic words acquired by them are poor in readability and consistency. Only the domain experts are possible to guess their meaning. In fact, phrases are the main unit for people to express semantics. This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation (DRPhrase LDA) which is a phrase topic model. Specifically, we reasonably enhance the semantic information of phrases via distributed representation in this model. The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.  相似文献   

3.
The meaning of a word includes a conceptual meaning and a distributive meaning. Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity, especially for low-frequency words. In knowledge bases, manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted. In this paper, we propose a Conceptual Semantics Enhanced Word Representation (CEWR) model, computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus, and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words. We evaluate the CEWR model on two tasks: word similarity computation and short text classification. The Spearman correlation between model results and human judgement are improved to 64.71%, 81.84%, and 85.16% on Wordsim297, MC30, and RG65, respectively. Moreover, CEWR improves the F1 score by 3% in the short text classification task. The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding. This proves that conceptual semantics, especially hypernymous information, is a good complement to distributed word representation.  相似文献   

4.
We present a case study of how scientometric tools can reveal the structure of scientific theory in a discipline. Specifically, we analyze the patterns of word use in the discipline of cognitive science using latent semantic analysis, a well-known semantic model, in the abstracts of over a thousand academic papers relevant to these theories. Our results show that it is possible to link these theories with specific statistical distributions of words in the abstracts of papers that espouse these theories. We show that theories have different patterns of word use, and that the similarity relationships with each other are intuitive and informative. Moreover, we show that it is possible to predict fairly accurately the theory of a paper by constructing a model of the theories based on their distribution of word use. These results may open new avenues for the application of scientometric tools on theoretical divides.  相似文献   

5.
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data, this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform. Since the TF-IDF (term frequency-inverse document frequency) algorithm under Spark is irreversible to word mapping, the mapped words indexes cannot be traced back to the original words. In this paper, an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored. Firstly, the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper, and then the features are inputted to the LDA (Latent Dirichlet Allocation) topic model for training. Finally, the text topic clustering is obtained. Experimental results show that for large data samples, the processing speed of LDA topic model clustering has been improved based Spark. At the same time, compared with the LDA topic model based on word frequency input, the model proposed in this paper has a reduction of perplexity.  相似文献   

6.
The topic of fuzzy set theory was examined using the occurrence of phrases in bibliographic records. Records containing the word fuzzy, were downloaded from over 100 databases, and from these records, phrases were extracted surrounding the word fuzzy. A methodology was developed to trim this list of phrases to a list of high frequency phrases relevant to fuzzy set theory. This list of phrases was in turn used to extract records from the original downloaded set, which were (algorithmically) relevant to fuzzy set theory. This set of records was then analysed to show the development of the topic of fuzzy set theory, the distribution of the fuzzy phrases over time and the frequency distribution of the fuzzy phrases. In addition, the field of the bibliographic record in which the phrase occurred was examined, as well as the first appearance of a particular fuzzy phrase.  相似文献   

7.
Text mining has become a major research topic in which text classification is the important task for finding the relevant information from the new document. Accordingly, this paper presents a semantic word processing technique for text categorization that utilizes semantic keywords, instead of using independent features of the keywords in the documents. Hence, the dimensionality of the search space can be reduced. Here, the Back Propagation Lion algorithm (BP Lion algorithm) is also proposed to overcome the problem in updating the neuron weight. The proposed text classification methodology is experimented over two data sets, namely, 20 Newsgroup and Reuter. The performance of the proposed BPLion is analysed, in terms of sensitivity, specificity, and accuracy, and compared with the performance of the existing works. The result shows that the proposed BPLion algorithm and semantic processing methodology classifies the documents with less training time and more classification accuracy of 90.9%.  相似文献   

8.
Much academic research is never cited and may be rarely read, indicating wasted effort from the authors, referees and publishers. One reason that an article could be ignored is that its topic is, or appears to be, too obscure to be of wide interest, even if excellent scholarship produced it. This paper reports a word frequency analysis of 874,411 English article titles from 18 different Scopus natural, formal, life and health sciences categories 2009–2015 to assess the likelihood that research on obscure (rarely researched) topics is less cited. In all categories examined, unusual words in article titles associate with below average citation impact research. Thus, researchers considering obscure topics may wish to reconsider, generalise their study, or to choose a title that reflects the wider lessons that can be drawn. Authors should also consider including multiple concepts and purposes within their titles in order to attract a wider audience.  相似文献   

9.
通过python爬取豆瓣网站上《少年的你》的短评文本,对评论文本进行清洗并利用构建的分词词典和停用词词典分别进行分词处理和去停用词处理后得到较为规范化的文本.利用TF-IDF算法提取评论文本的关键词,以关键词为基础建立LDA主题模型,从定量的角度提取评论主题,从而分析观众对这部电影的情感态度和评论的热点话题,为消费者的购买行为提供一定的决策支持,同时为商品提供者提供一定的发展方向.  相似文献   

10.
基于矢量空间模型和最大熵模型的词义问题解决策略   总被引:2,自引:0,他引:2  
针对单义词的词义问题构建了融合触发对(trigger pair)的矢量空间模型用来进行词义相似度的计算,并以此为基础进行了词语的聚类;针对多义词的词义问题应用融合远距离上下文信息的最大熵模型进行了有导词义消歧的研究。为克服以往词义消歧评测中通过人工构造带有词义标记的测试例句而带来的覆盖程度小、主观影响大等问题,将模型的评测直接放到了词语聚类和分词歧义这两个实际的应用中。分词歧义的消解正确率达到了92%,词语聚类的结果满足进一步应用的需要。  相似文献   

11.
Understanding a word in context relies on a cascade of perceptual and conceptual processes, starting with modality-specific input decoding, and leading to the unification of the word''s meaning into a discourse model. One critical cognitive event, turning a sensory stimulus into a meaningful linguistic sign, is the access of a semantic representation from memory. Little is known about the changes that activating a word''s meaning brings about in cortical dynamics. We recorded the electroencephalogram (EEG) while participants read sentences that could contain a contextually unexpected word, such as ‘cold’ in ‘In July it is very cold outside’. We reconstructed trajectories in phase space from single-trial EEG time series, and we applied three nonlinear measures of predictability and complexity to each side of the semantic access boundary, estimated as the onset time of the N400 effect evoked by critical words. Relative to controls, unexpected words were associated with larger prediction errors preceding the onset of the N400. Accessing the meaning of such words produced a phase transition to lower entropy states, in which cortical processing becomes more predictable and more regular. Our study sheds new light on the dynamics of information flow through interfaces between sensory and memory systems during language processing.  相似文献   

12.
一种新的潜在语义分析语言模型   总被引:1,自引:0,他引:1  
提出了基于聚类的方法实现词的快速量化表示,并由此导出潜在语义分析语言模型预测置信度,同时运用新提出的几何加权静态插值方式同三元文法模型相结合,构建了一种新的潜在语义分析语言模型,并将其应用于汉语语音识别。实验表明其效率和性能均优于传统基于奇异值分解的潜在语义分析语言模型,相比于三元文法模型,识别错误率相对下降为3.6%~7.1%左右,并为有效量化表示词对进一步提高潜在语义分析语言模型性能提供了新的途径。  相似文献   

13.
刘保旗  林丽  郭主恩 《包装工程》2024,45(2):110-117
目的 为解决传统感性设计研究中意象实验耗时大以及小样本偶然性等问题,依托现有网络评价文本信息提取了用户意象认知。方法 首先,爬取大规模汽车外观评论文本,构建语义分析词汇库,构建word2vec词向量模型;然后,基于模型获取词库内部的语义联系,计算高频关键形容词之间的语义离散性,以构建代表性意象词空间;最后,通过语义量化匹配将评论映射到意象词空间,得到大规模用户对各车型的显著性意象表征,明确了指定意象词汇下的汽车外观匹配结果。结果 运用该方法挖掘汽车外观显著性意象与基于人工评价的实验结果无显著性差异且具有高度相关性,证明了该方法的有效性。结论 以该方法挖掘用户意象认知,运用了现有的大批量用户反馈知识,提高了意象分析效率,有助于决策者快速理解消费者对汽车外观的感性知识,在设计迭代中可使产品更符合市场期望;对比相关研究,基于语义量化匹配的方式无需对超高维向量进行降维和聚类,避免了以往研究因特征降维而可能导致的词向量语义联系的损失,以得到更为准确的意象挖掘结果。  相似文献   

14.
In bibliometric research, keyword analysis of publications provides an effective way not only to investigate the knowledge structure of research domains, but also to explore the developing trends within domains. To identify the most representative keywords, many approaches have been proposed. Most of them focus on using statistical regularities, syntax, grammar, or network-based characteristics to select representative keywords for the domain analysis. In this paper, we argue that the domain knowledge is reflected by the semantic meanings behind keywords rather than the keywords themselves. We apply the Google Word2Vec model, a model of a word distribution using deep learning, to represent the semantic meanings of the keywords. Based on this work, we propose a new domain knowledge approach, the Semantic Frequency-Semantic Active Index, similar to Term Frequency-Inverse Document Frequency, to link domain and background information and identify infrequent but important keywords. We adopt a semantic similarity measuring process before statistical computation to compute the frequencies of “semantic units” rather than keyword frequencies. Semantic units are generated by word vector clustering, while the Inverse Document Frequency is extended to include the semantic inverse document frequency; thus only words in the inverse documents with a certain similarity will be counted. Taking geographical natural hazards as the domain and natural hazards as the background discipline, we identify the domain-specific knowledge that distinguishes geographical natural hazards from other types of natural hazards. We compare and discuss the advantages and disadvantages of the proposed method in relation to existing methods, finding that by introducing the semantic meaning of the keywords, our method supports more effective domain knowledge analysis.  相似文献   

15.
Production and operations management has been a significant field of research for many years. However, other than an educated guess by researchers in the field or a perusal of textbook chapter titles, the major topics and their trends over time are not well established. This study provides a comprehensive review of production and operations management literature using a data-driven approach. We use Latent Semantic Analysis on 21,053 abstracts representing all publications in six leading operations management journals since their inception. 18 unique topic clusters were identified algorithmically. Just being aware of the history of research topics should be of great interest to all academics in the field, but to help future researchers we conducted three post hoc analyses: 1) analysis of methods used in all these studies, 2) citation rates by topic area over time, and 3) the growing prevalence of research covering multiple topics.  相似文献   

16.
In this paper we analyze topic evolution over time within bioinformatics to uncover the underlying dynamics of that field, focusing on the recent developments in the 2000s. We select 33 bioinformatics related conferences indexed in DBLP from 2000 to 2011. The major reason for choosing DBLP as the data source instead of PubMed is that DBLP retains most bioinformatics related conferences, and to study dynamics of the field, conference papers are more suitable than journal papers. We divide a period of a dozen years into four periods: period 1 (2000–2002), period 2 (2003–2005), period 3 (2006–2008) and period 4 (2009–2011). To conduct topic evolution analysis, we employ three major procedures, and for each procedure, we develop the following novel technique: the Markov Random Field-based topic clustering, automatic cluster labeling, and topic similarity based on Within-Period Cluster Similarity and Between-Period Cluster Similarity. The experimental results show that there are distinct topic transition patterns between different time periods. From period 1 to period 3, new topics seem to have emerged and expanded, whereas from period 3 to period 4, topics are merged and display more rigorous interaction with each other. This trend is confirmed by the collaboration pattern over time.  相似文献   

17.
User‐generated reviews can serve as an efficient tool for evaluating the customer‐perceived quality of online products and services. This article proposes a joint control chart for monitoring the quantitative evolution of document‐level topics and sentiments in online customer reviews. A sequential model is constructed to convert the temporally correlated document collections to topic and sentiment distributions, which are subsequently used to monitor the topics that users are concerned about and the topic‐specific opinions in an ongoing product and service process. Simulation studies on various data scenarios demonstrate the superior performance of the proposed control chart in terms of both detecting shifts and identifying truly out‐of‐control terms.  相似文献   

18.
The ever-evolving nature of research works creates the cacophony of new topics incessantly resulting in an unstable state in every field of research. Researchers are disseminating their works producing a huge volume of articles. In fact, the spectacular growth in scholarly literature is widening the choice sets overwhelmingly for researchers. Consequently, they face difficulties in identifying a suitable topic of current importance from a plethora of research topics. This remains an ill-defined problem for researchers due to the overload of choices. The problem is even more severe for new researchers due to the lack of experience. Hence, there is a definite need for a system that would help researchers make decisions on appropriate topics. Recommender systems are good options for performing this very task. They have been proven to be useful for researchers to keep pace with research dynamics and at the same time to overcome the information overload problem by retrieving useful information from the large information space of scholarly literature. In this article, we present RTRS, a knowledge-based Research Topics Recommender System to assist both novice and experienced researchers in selecting research topics in their chosen field. The core of this system hinges upon bibliometric information of the literature. The system identifies active research topics in a particular area and recommends top N topics to the target users. The results obtained have proven useful to academic researchers, particularly novices, in making an early decision on research topics.  相似文献   

19.
Summary Mapping of science and technology can be done at different levels of aggregation, using a variety of methods. In this paper, we propose a method in which title words are used as indicators for the content of a research topic, and cited references are used as the context in which words get their meaning. Research topics are represented by sets of papers that are similar in terms of these word-reference combinations. In this way we use words without neglecting differences and changes in their meanings. The method has several advantages, such as high coverage of publications. As an illustration we apply the method to produce knowledge maps of information science.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号