首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The most popular method for judging the impact of biomedical articles is citation count which is the number of citations received. The most significant limitation of citation count is that it cannot evaluate articles at the time of publication since citations accumulate over time. This work presents computer models that accurately predict citation counts of biomedical publications within a deep horizon of 10 years using only predictive information available at publication time. Our experiments show that it is indeed feasible to accurately predict future citation counts with a mixture of content-based and bibliometric features using machine learning methods. The models pave the way for practical prediction of the long-term impact of publication, and their statistical analysis provides greater insight into citation behavior.  相似文献   

2.
Researchers typically pay greater attention to scientific papers published within the last 2 years, and especially papers that may have great citation impact in the future. However, the accuracy of current citation impact prediction methods is still not satisfactory. This paper argues that objective features of scientific papers can make citation impact prediction relatively accurate. The external features of a paper, features of authors, features of the journal of publication, and features of citations are all considered in constructing a paper’s feature space. The stepwise multiple regression analysis is used to select appropriate features from the space and to build a regression model for explaining the relationship between citation impact and the chosen features. The validity of this model is also experimentally verified in the subject area of Information Science & Library Science. The results show that the regression model is effective within this subject.  相似文献   

3.

Citations play a pivotal role in indicating various aspects of scientific literature. Quantitative citation analysis approaches have been used over the decades to measure the impact factor of journals, to rank researchers or institutions, to discover evolving research topics etc. Researchers doubted the pure quantitative citation analysis approaches and argued that all citations are not equally important; citation reasons must be considered while counting. In the recent past, researchers have focused on identifying important citation reasons by classifying them into important and non-important classes rather than individually classifying each reason. Most of contemporary citation classification techniques either rely on full content of articles, or they are dominated by content based features. However, most of the time content is not freely available as various journal publishers do not provide open access to articles. This paper presents a binary citation classification scheme, which is dominated by metadata based parameters. The study demonstrates the significance of metadata and content based parameters in varying scenarios. The experiments are performed on two annotated data sets, which are evaluated by employing SVM, KLR, Random Forest machine learning classifiers. The results are compared with the contemporary study that has performed similar classification employing rich list of content-based features. The results of comparisons revealed that the proposed model has attained improved value of precision (i.e., 0.68) just by relying on freely available metadata. We claim that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable.

  相似文献   

4.
Citations play an important role in the scientific community by assisting in measuring multifarious policies like the impact of journals, researchers, institutions, and countries. Authors cite papers for different reasons, such as extending previous work, comparing their study with the state-of-the-art, providing background of the field, etc. In recent years, researchers have tried to conceptualize all citations into two broad categories, important and incidental. Such a categorization is very important to enhance scientific output in multiple ways, for instance, (1) Helping a researcher in identifying meaningful citations from a list of 100 to 1000 citations (2) Enhancing the impact factor calculation mechanism by more strongly weighting important citations, and (3) Improving researcher, institutional, and university rankings by only considering important citations. All of these uses depend upon correctly identifying the important citations from the list of all citations in a paper. To date, researchers have utilized many features to classify citations into these broad categories: cue phrases, in-text citation counts, and metadata features, etc. However, contemporary approaches are based on identification of in-text citation counts, mapping sections onto the Introduction, Methods, Results, and Discussion (IMRAD) structure, identifying cue phrases, etc. Identifying such features accurately is a challenging task and is normally conducted manually, with the accuracy of citation classification demonstrated in terms of these manually extracted features. This research proposes to examine the content of the cited and citing pair to identify important citing papers for each cited paper. This content similarity approach was adopted from research paper recommendation approaches. Furthermore, a novel section-based content similarity approach is also proposed. The results show that solely using the abstract of the cited and citing papers can achieve similar accuracy as the state-of-the-art approaches. This makes the proposed approach a viable technique that does not depend on manual identification of complex features.  相似文献   

5.
This paper intends to illuminate the relationship between science funding and citation impact in seven STEMM disciplines (science, technology, engineering, mathematics, and medicine). Using a regression model with Heckman bias correction, we find that funding has a positive, significant association with a paper’s citations in STEMM fields. Further analyses show that this association is magnified by the factors of multiple authorship and multiple institutions. For funded papers in STEM, multi-author and multi-institution papers tend to receive even more citations than single-authored and single-institution papers; however, funded papers in Medicine received less gain in citation impact when either factor is considered. Based on the finding that funding support has a stronger association with citation impact when it is treated as a binary variable than as a count variable, this paper recommends the allocation of funding to researchers without active funding support, instead of giving awards to those with multiple funding supports at hand.  相似文献   

6.

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles—mostly from biology and medicine—applicable to the COVID-19 crisis. Our study employs a combination of state-of-the-art machine learning techniques for text understanding, including embeddings-based language model BERT, several systems for detection and semantic expansion of entities: ConceptNet, Pubtator and ScispaCy. To interpret the resulting models, we use several explanation algorithms: random forest feature importance, LIME, and Shapley values. We compare the performance and comprehensibility of models obtained by “black-box” machine learning algorithms (neural networks and random forests) with models built with rule learning (CORELS, CBA), which are intrinsically explainable. Multiple rules were discovered, which referred to biomedical entities of potential interest. Of the rules with the highest lift measure, several rules pointed to dipeptidyl peptidase4 (DPP4), a known MERS-CoV receptor and a critical determinant of camel to human transmission of the camel coronavirus (MERS-CoV). Some other interesting patterns related to the type of animal investigated were found. Articles referring to bats and camels tend to draw citations, while articles referring to most other animal species related to coronavirus are lowly cited. Bat coronavirus is the only other virus from a non-human species in the betaB clade along with the SARS-CoV and SARS-CoV-2 viruses. MERS-CoV is in a sister betaC clade, also close to human SARS coronaviruses. Thus both species linked to high citation counts harbor coronaviruses which are more phylogenetically similar to human SARS viruses. On the other hand, feline (FIPV, FCOV) and canine coronaviruses (CCOV) are in the alpha coronavirus clade and more distant from the betaB clade with human SARS viruses. Other results include detection of apparent citation bias favouring authors with western sounding names. Equal performance of TF-IDF weights and binary word incidence matrix was observed, with the latter resulting in better interpretability. The best predictive performance was obtained with a “black-box” method—neural network. The rule-based models led to most insights, especially when coupled with text representation using semantic entity detection methods. Follow-up work should focus on the analysis of citation patterns in the context of phylogenetic trees, as well on patterns referring to DPP4, which is currently considered as a SARS-Cov-2 therapeutic target.

  相似文献   

7.

This paper examines the citation impact of papers published in scientific-scholarly journals upon patentable technology, as reflected in examiner- or inventor-given references in granted patents. It analyses data created by SCImago Research Group, linking PATSTAT’s scientific non-patent references (SNPRs) to source documents indexed in Scopus. The frequency of patent citations to journal papers is calculated per discipline, year, institutional sector, journal subject category, and for “top” journals. PATSTAT/Scopus-based statistics are compared to those derived from Web of Science/USPTO linkage. A detailed assessment is presented of the technological impact of research publications in social sciences and humanities (SSH). Several subject fields perform well in terms of the number of citations from patents, especially Library and Information Science, Language and Linguistics, Education, and Law, but many of the most cited journals find themselves in the interface between SSH and biomedical or natural sciences. Analyses of the titles of citing patents and cited papers are presented that shed light upon the cognitive content of patent citations. It is proposed to develop more advanced indicators of citation impact of papers upon patents, and ways to combine citation counts with citation content and context analysis.

  相似文献   

8.
Lv  Yiqin  Xie  Zheng  Zuo  Xiaojing  Song  Yiping 《Scientometrics》2022,127(8):4847-4872

The classification task of scientific papers can be implemented based on contents or citations. In order to improve the performance on this task, we express papers as nodes and integrate scientific papers’ contents and citations into a heterogeneous graph. It has two types of edges. One type represents the semantic similarity between papers, derived from papers’ titles and abstracts. The other type represents the citation relationship between papers and the journals or proceedings of conferences of their references. We utilize a contrastive learning method to embed the nodes in the heterogeneous graph into a vector space. Then, we feed the paper node vectors into classifiers, such as the decision tree, multilayer perceptron, and so on. We conduct experiments on three datasets of scientific papers: the Microsoft Academic Graph with 63,211 scientific papers in 20 classes, the Proceedings of the National Academy of Sciences with 38,243 scientific papers in 18 classes, and the American Physical Society with 443,845 scientific papers in 5 classes. The experimental results on the multi-class task show that our multi-view method scores the classification accuracy up to 98%, outperforming state-of-the-arts.

  相似文献   

9.
Given the limitations of the community question answering (CQA) answer quality prediction method in measuring the semantic information of the answer text, this paper proposes an answer quality prediction model based on the question-answer joint learning (ACLSTM). The attention mechanism is used to obtain the dependency relationship between the Question-and-Answer (Q&A) pairs. Convolutional Neural Network (CNN) and Long Short-term Memory Network (LSTM) are used to extract semantic features of Q&A pairs and calculate their matching degree. Besides, answer semantic representation is combined with other effective extended features as the input representation of the fully connected layer. Compared with other quality prediction models, the ACLSTM model can effectively improve the prediction effect of answer quality. In particular, the mediumquality answer prediction, and its prediction effect is improved after adding effective extended features. Experiments prove that after the ACLSTM model learning, the Q&A pairs can better measure the semantic match between each other, fully reflecting the model’s superior performance in the semantic information processing of the answer text.  相似文献   

10.
The most popular method for evaluating the quality of a scientific publication is citation count. This metric assumes that a citation is a positive indicator of the quality of the cited work. This assumption is not always true since citations serve many purposes. As a result, citation count is an indirect and imprecise measure of impact. If instrumental citations could be reliably distinguished from non-instrumental ones, this would readily improve the performance of existing citation-based metrics by excluding the non-instrumental citations. A citation was operationally defined as instrumental if either of the following was true: the hypothesis of the citing work was motivated by the cited work, or the citing work could not have been executed without the cited work. This work investigated the feasibility of developing computer models for automatically classifying citations as instrumental or non-instrumental. Instrumental citations were manually labeled, and machine learning models were trained on a combination of content and bibliometric features. The experimental results indicate that models based on content and bibliometric features are able to automatically classify instrumental citations with high predictivity (AUC = 0.86). Additional experiments using independent hold out data and prospective validation show that the models are generalizeable and can handle unseen cases. This work demonstrates that it is feasible to train computer models to automatically identify instrumental citations.  相似文献   

11.
We obtained data of statistical significance to verify the intuitive impression that collaboration leads to higher impact. We selected eight scientific journals to analyze the correlations between the number of citations and the number of coauthors. For different journals, the single-authored articles always contained the lowest citations. The citations to those articles with fewer than five coauthors are lower than the average citations of the journal. We also provided a simple measurement to the value of authorship with regards to the increase number of citations. Compared to the citation distribution, similar but smaller fluctuations appeared in the coauthor distribution. Around 70% of the citations were accumulated in 30% of the papers, while 60% of the coauthors appeared in 40% of the papers. We find that predicting the citation number from the coauthor number can be more reliable than predicting the coauthor number from the citation number. For both citation distribution and coauthor distribution, the standard deviation is larger than the average value. We caution the use of such an unrepresentative average value. The average value can be biased significantly by extreme minority, and might not reflect the majority.  相似文献   

12.
The present paper addresses some of the many possible uses of citations, including bookmark, intellectual heritage, impact tracker, and self-serving purposes. The main focus is on the applicability of citation analysis as an impact or quality measure. If a paper's bibliography is viewed as consisting of a directed (research impact or quality) component related to intellectual heritage and random components related to specific self-interest topics, then for large numbers of citations from many different citing paper, the most significant intellectual heritage (research impact or quality) citations will aggregate and the random author-specific self-serving citations will be scattered and not accumulate. However, there are at least two limitations to this model of citation analysis for stand-alone use as a measure of research impact of quality. First, the reference to intellectual heritage could be positive or negative. Second, there could be systemic biases which affect the aggregate results, and one of these, the “Pied Piper Effect”, is described in detail. Finally, the results of a short citation study comparing Russian and American papers in different technical fields are presented. The questions raised in interpreting this data highlight a few of the difficulties in attempting to interpret citation results without supplementary information. Leydesdorff (Leydesdorff, 1998) addresses the history of citations and citation analysis, and the transformation of a reference mechanism into a purportedly quantitive measure of research impact/quality. The present paper examines different facets of citations and citation analysis, and discusses the validity of citation analysis as a useful measure of research impact/quality.  相似文献   

13.
In this paper, the machine learning tools were used to identify key features influencing citation impact. Both the papers?? external and quality information were considered in constructing papers?? feature space. Based on the feature space, the soft fuzzy rough set was used to generate a series of associated feature subsets. Then, the KNN classifier was used to find the feature subset with the best classification performance. The results show that citation impact could be predicted by objectively assessed factors. Both the papers?? quality and external features, mainly represented as the reputation of the first author, are contributed to future citation impact.  相似文献   

14.
Summary The authors have constructed an original database of the full text of the Japanese Patent Gazette published since 1994. The database includes not only the front page but also the body text of more than 880,000 granted Japanese patents. By reading the full texts of all 1,500 patent samples, we found that some inventors cite many academic papers in addition to earlier patents in the body texts of their Japanese patents. Using manually extracted academic paper citations and patent citations as “right” answers, we fine-tuned a search algorithm that automatically retrieves cited scientific papers and patents from the entire texts of all the Japanese patents in the database. An academic paper citation in a patent text indicates that the inventor used scientific knowledge in the cited paper when he/she invented the idea codified in the citing patent. The degree of science linkage, as measured by the number of research papers cited in patent documents, is particularly strong in biotechnology. Among other types of technology, those related to photographic-sensitized material, cryptography, optical computing, and speech recognition also show strong science linkage. This suggests that the degree of dependence on scientific knowledge differs from technology to technology and therefore, different ways of university-industry collaboration are necessary for different technology fields.  相似文献   

15.
Huang  Heng  Zhu  Donghua  Wang  Xuefeng 《Scientometrics》2022,127(9):5257-5281

Citation counts are commonly used to evaluate the scientific impact of a publication on the general premise that more citations probably mean more endorsements. However, two questionable assumptions underpin this idea: a) that all authors contributed equally to the paper; and b) that the endorsement is positive. Obviously, neither of these assumptions hold true. Hence, with this study, we examine two components of citations—their purpose, i.e., the reason for the citation, and polarity, being the author’s attitude toward the cited work. Our findings provide a new perspective on the scientific impact of highly-cited publications. Our methodology consists of three steps. Firstly, a pre-trained model composed of a Word2Vec—a well-known word embedding approach—and a convolutional neural network (CNN) is used to identify citation polarity and purpose. Secondly, in a set of highly-cited papers, we compare eight categories of purpose from foundational to critical and three categories of polarity: positive, negative, and neutral. We further explore how different types of papers—those discussing discoveries or those discussing utilitarian topics—influence the evaluation of scientific impact of papers. Finally, we mine and discover the knowledge (e.g. method, concept, tool or data) to explain the actual scientific impact of a highly-cited paper. To demonstrate how combining citation polarity with purpose can provide far greater details of a paper’s scientific impact, we undertake a case study with 370 highly-cited journal articles spanning “Biochemistry & Molecular Biology” and “Genetics & Heredity”. The results yield valuable insights into the assumption about citation counts as a metric for evaluating scientific impact.

  相似文献   

16.
Li Zhai  Xiangbin Yan  Bin Zhu 《Scientometrics》2014,98(2):1021-1031
This paper proposes h l -index as an improvement of the h-index, a popular measurement for the research quality of academic researchers. Although the h-index integrates the number of publications and the academic impact of each publication to evaluate the productivity of a researcher, it assumes that all papers that cite an academic article contribute equally to the academic impact of this article. This assumption, of course, could not be true in most times. The citation from a well-cited paper certainly brings more attention to the article than the citation from a paper that people do not pay attention to. It therefore becomes important to integrate the impact of papers that cite a researcher’s work into the evaluation of the productivity of the researcher. Constructing a citation network among academic papers, this paper therefore proposes h l -index that integrating the h-index with the concept of lobby index, a measures that has been used to evaluate the impact of a node in a complex network based on the impact of other nodes that the focal node has direct link with. This paper also explores the characteristics of the proposed h l -index by comparing it with citations, h-index and its variant g-index.  相似文献   

17.
Donner  Paul 《Scientometrics》2021,126(12):9431-9456

This study investigates the potential of citation analysis of Ph.D. theses to obtain valid and useful early career performance indicators at the level of university departments. For German theses from 1996 to 2018 the suitability of citation data from Scopus and Google Books is studied and found to be sufficient to obtain quantitative estimates of early career researchers’ performance at departmental level in terms of scientific recognition and use of their dissertations as reflected in citations. Scopus and Google Books citations complement each other and have little overlap. Individual theses’ citation counts are much higher for those awarded a dissertation award than others. Departmental level estimates of citation impact agree reasonably well with panel committee peer review ratings of early career researcher support.

  相似文献   

18.
During Eugene Garfield’s (EG’s) lengthy career as information scientist, he published about 1500 papers. In this study, we use the impressive oeuvre of EG to introduce a new type of bibliometric networks: keyword co-occurrences networks based on the context of citations, which are referenced in a certain paper set (here: the papers published by EG). The citation context is defined by the words which are located around a specific citation. We retrieved the citation context from Microsoft Academic. To interpret and compare the results of the new network type, we generated two further networks: co-occurrence networks which are based on title and abstract keywords from (1) EG’s papers and (2) the papers citing EG’s publications. The comparison of the three networks suggests that papers of EG and citation contexts of papers citing EG are semantically more closely related to each other than to titles and abstracts of papers citing EG. This result accords with the use of citations in research evaluation that is based on the premise that citations reflect the cognitive influence of the cited on the citing publication.  相似文献   

19.
Summary The present paper addresses the objective of developing forward indicators of research performance using bibliometric information on the UK science base. Most research indicators rely primarily on historical time series relating to inputs to, activity within and outputs from the research system. Policy makers wish to be able to monitor changing research profiles in a more timely fashion, the better to determine where new investment is having the greatest effect. Initial (e.g. 12 months from publication) citation counts might be useful as a forward indicator of the long-term (e.g. 10 years from publication) quality of research publications, but - although there is literature on citation-time functions - no study to evaluate this specifically has been carried out by Thomson ISI or any other analysts. Here, I describe the outcomes of a preliminary study to explore these citation relationships, drawing on the UK National Citation Report held by Evidence Ltd under licence from Thomson ISI for OST policy use. Annual citation counts typically peak at around the third year after publication. I show that there is a statistically highly significant correlation between initial (years 1-2) and later (years 3-10) citations in six research categories across the life and physical sciences. The relationship holds over a wide range of initial citation counts. Papers that attract more than a definable but field dependent threshold of citations in the initial period after publication are usually among the top 1% (the most highly cited papers) for their field and year. Some papers may take off slowly but can later join the high impact group. It is important to recognise that the statistical relationship is applicable to groups of publications. The citation profiles of individual articles may be quite different. Nonetheless, it seems reasonable to conclude that leading indicators of research excellence could be developed. This initial study should now be extended across a wider range fields to test the initial outcomes: earlier papers suggest the model holds in economics. Additional statistical tests should be applied to explore and model the relationship between initial, later and total citation counts and thus to create a general tool for policy application.  相似文献   

20.
Sangwal K 《Scientometrics》2012,92(3):643-655
The basic concepts and equations of the progressive nucleation mechanism (PNM) are presented first for the growth and decay of items. The mechanism is then applied to describe the cumulative citations L and citations ΔL per year of the individual most-cited papers i of four selected Polish professors as a function of citation duration t. It was found that the PNM satisfactorily describes the time dependence of cumulative citations L of the papers published by different authors with sufficiently high citations ΔL, as represented by the highest yearly citations ΔL(max) during the entire citation period t (normal citation behavior). The citation period for these papers is less than 15?years and it is even 6-8?years in several cases. However, for papers with citation periods exceeding about 15?years, the growth behavior of citations does not follow the PNM in the entire citation period (anomalous citation behavior), and there are regions of citations in which the citation data may be described by the PNM. Normal and anomalous citation behaviors are attributed, respectively, to the occurrence and nonoccurrence of stationary nucleation of citations for the papers. The PNM also explains the growth and decay of citations ΔL per year of papers exhibiting normal citation behavior.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号