In the field of bioinformatics, a large number of classical software becomes a necessary research tool. To measure the influence of scientific software as one kind of important intellectual products, a few strategies have been proposed to identify the software names from full texts of papers to collect the usage data of packages in bioinformatics research. However, the performance of these strategies is limited because of the highly imbalance of data in the full texts. This study proposes EnsembleSVMs-CRF, a two-step refinement strategy based on ensemble learning that gradually increases the sentences that contain software mentions to improve the performance of named entity recognition. The experiment on the bioinformatics corpus shows that the performance of EnsembleSVMs-CRF, in terms of the local F1 (78.81%) and the global F1-A (73.49%), is superior to the rule-based bootstrapping method and direct CRF. Application of this strategy to the articles published between 2013 and 2017 in 27 bioinformatics journals extracted 8,239 unique packages. The most popular 50 packages thus identified demonstrate that most of them are professional software which generally requires inter-discipline knowledge, rather than programming skill. Meanwhile, we found that researchers in bioinformatics tend to use free scientific software, and the application of general software is increasing compared with professional software.
相似文献This study explores the patterns of exchange of research knowledge among Education Research, Cognitive Science, and what we call “Border Fields.” We analyze a set of 32,121 articles from 177 selected journals, drawn from five sample years between 1994 and 2014. We profile the references that those articles cite, and the papers that cite them. We characterize connections among the fields in sources indexed by Web of Science (WoS) (e.g., peer-reviewed journal articles and proceedings), and connections in sources that are not (e.g., conference talks, chapters, books, and reports). We note five findings—first, over time the percentage of Education Research papers that extensively cite Cognitive Science has increased, but the reverse is not true. Second, a high percentage of Border Field papers extensively cite and are cited by the other fields. Border Field authors’ most cited papers overlap those most cited by Education Research and Cognitive Science. There are fewer commonalities between Educational research and Cognitive Science most cited papers. This is consistent with Border Fields being a bridge between fields. Third, over time the Border Fields have moved closer to Education Research than to Cognitive Science, and their publications increasingly cite, and are cited by, other Border Field publications. Fourth, Education Research is especially strongly represented in the literature published outside those WoS-indexed publications. Fifth, the rough patterns observed among these three fields when using a more restricted dataset drawn from the WoS are similar to those observed with the dataset lying outside the WoS, but Education Research shows a far heavier influence than would be indicated by looking at WoS records alone.
相似文献Our main objective is to create a framework to analyze signals sent from academic journals. The signals chosen for the framework were: journal’ scopes; and the latest published papers. We apply the framework to the field of accounting with the main focus of categorizing the journal scopes and the latest published articles into research topics by using text mining techniques. We analyze the published papers of research topics in accounting journals during the 2016–2018 period. Another objective is to compare research topics from the last published papers with research topics identified in accounting journal scopes. We found a majority of journals with a broader scope in terms of accounting research areas, but we see a concentration on specific research topics by analyzing the papers. In addition, the most signaled accounting areas in scopes are financial accounting and auditing. The framework helps us categorize 5270 research papers into accounting research topics correctly, faster than manually reading titles, abstracts, and keywords. While specific scopes may carry the risk of missing new research trends, broad scopes may require more reviewers from different research areas. Diversity can be seen as applying other methodological choices, theoretical lenses, and conceptual or empirical research approaches. We believe that academic diversity is for the benefit of accounting research.
相似文献There are different citation habits in the research fields that influence the obsolescence of the research literature. We analyze the distinctive obsolescence of research literature in disciplinary journals in eight scientific subfields based on cited references distribution, as a synchronous approach. We use both negative binomial (NB) and Poisson distributions to capture this obsolescence. The corpus being examined is published in 2019 and covers 22,559 papers citing 872,442 references. Moreover, three measures to analyze the tail of the distribution are proposed: (i) cited reference survival rate, (ii) cited reference mortality rate, and (iii) cited reference percentile. These measures are interesting because the tail of the distribution collects the behavior of the citations at the time when the document starts to get obsolete in the sense that it is little cited (used). As main conclusion, the differences observed in obsolescence are so important even between disciplinary journals in the same subfield, that it would be necessary to use some measure for the tail of the citation distribution, such as those proposed in this paper, when analyzing in an appropriate way the long time impact of a journal.
相似文献In last few decades, with the advent of World Wide Web (WWW), world is being overloaded with huge data. This huge data carries potential information that once extracted, can be used for betterment of humanity. Information from this data can be extracted using manual and automatic analysis. Manual analysis is not scalable and efficient, whereas, the automatic analysis involves computing mechanisms that aid in automatic information extraction over huge amount of data. WWW has also affected overall growth in scientific literature that makes the process of literature review quite laborious, time consuming and cumbersome job for researchers. Hence a dire need is felt to automatically extract potential information out of immense set of scientific articles to automate the process of literature review. Therefore, in this study, aim is to present the overall progress concerning automatic information extraction from scientific articles. The information insights extracted from scientific articles are classified in two broad categories i.e. metadata and key-insights. As available benchmark datasets carry a significant role in overall development in this research domain, existing datasets against both categories are extensively reviewed. Later, research studies in literature that have applied various computational approaches applied on these datasets are consolidated. Major computational approaches in this regard include Rule-based approaches, Hidden Markov Models, Conditional Random Fields, Support Vector Machines, Naïve-Bayes classification and Deep Learning approaches. Currently, there are multiple projects going on that are focused towards the dataset construction tailored to specific information needs from scientific articles. Hence, in this study, state-of-the-art regarding information extraction from scientific articles is covered. This study also consolidates evolving datasets as well as various toolkits and code-bases that can be used for information extraction from scientific articles.
相似文献Being the most proliferative journal of oncology a cancer research of the past decade, the Open Access journal Oncotarget had reached more than 20,000 publications and a relatively high impact factor score in the past years. In 2018, the journal citation report decided to withdraw the status of an impact factor journal. Since there was a large discussion in the scientific community and specific reasons for the withdrawal were not stated, this bibliometric analysis was performed to assess if Oncotarget exhibits any differences in its bibliometric structure compared to other journals. For this purpose, we used the “New Quality and Quantity Indices in Sciences” platform and analyzed 20,000 Oncotarget articles. Density equalizing mapping technique helps to construct maps of cancer research in Oncotarget and shows that it has led to a unique global landscape which is not asymmetrically dominated by the Western hemisphere but exhibits a publishing architecture with a pronounced emphasis on Chinese articles.
相似文献The production of scientific knowledge is a complex social process, where many actors contribute by their publications to the disclosure of the hidden truth. However, due to different methods, analysed samples, and control variables, empirical findings from this process are often contradictory. Thus, quantitative sciences use meta-analyses in order to extract the likely truth from a corpus of publications about a given research question. Unfortunately, this procedure is often impaired by different forms of the so-called publication bias: papers with null-results are sometimes not published due to the publication policy of journal editors and their boards. Similarly, articles with a high news value may have a better chance of being published, even if their findings finally prove to be wrong. Thus the publications used for meta-analyses are often distorted and lead to wrong conclusions about the truth. For this reason the present article develops a formal model of the effects of the publication bias on the results of meta-analyses. It is successfully tested with empirical data and used for studying the conditions, under which meta-analyses disclose, obscure, or revert the underlying truth. As a main result of the related computer simulations it turns out that the publication bias has for true zero-relations other consequences than for true non-zero relations. Moreover, there are situations where certain forms of the publication bias have unexpectedly favourable effects on the disclosure of the truth by meta-analyses.
相似文献This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010–2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allowed us to identify different patterns of adoption of OA. Taking into account the particularities of the German research landscape, variations in terms of productivity, OA uptake and approaches to OA are examined at the meso-level and possible explanations are discussed. The development of the OA uptake is analysed for the different research sectors in Germany (universities, non-university research institutes of the Helmholtz Association, Fraunhofer Society, Max Planck Society, Leibniz Association, and government research agencies). Combining several data sources (incl. Web of Science, Unpaywall, an authority file of standardised German affiliation information, the ISSN-Gold-OA 3.0 list, and OpenDOAR), the study confirms the growth of the OA share mirroring the international trend reported in related studies. We found that 45% of all considered articles during the observed period were openly available at the time of analysis. Our findings show that subject-specific repositories are the most prevalent type of OA. However, the percentages for publication in fully OA journals and OA via institutional repositories show similarly steep increases. Enabling data-driven decision-making regarding the implementation of OA in Germany at the institutional level, the results of this study furthermore can serve as a baseline to assess the impact recent transformative agreements with major publishers will likely have on scholarly communication.
相似文献