首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Knowing how records on a particular topic are distributed over databases is useful for both practical and theoretical reasons; however little work in this area appears to have been done. This paper examines the distribution of records on the topic of “Fuzzy Set Theory” in over 100 bibliographic databases and determines whether the distribution of records over databases is similar to the traditional Bradford hyperbolic distribution of records over journals. Different methods for counting duplicate records between and within databases have been developed. A comparison of the various distributions based on these counting methods is presented; and the distributions are compared to results of earlier studies. The results also give an indication of the number of databases necessary to search for coverage of a literature to specified percentages using the different counting techniques developed in this study.  相似文献   

2.
Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.  相似文献   

3.
针对基于短语的统计机器翻译(SMT)模型中由于采用精确匹配策略导致的短语稀疏问题,提出了一种基于短语相似度的统计机器翻译模型.该模型将基于实例的翻译方法引入到统计机器翻译中.翻译时,对于训练语料库中未出现过的短语,通过计算源语言短语之间的相似度,采用模糊匹配策略从短语表中查找相似的实例短语,并根据实例短语为其构造翻译.与精确匹配策略相比,利用相似度进行模糊匹配增加了对短语表的利用程度,缓解了短语稀疏问题.实验表明,该模型能够明显地提高统计机器翻译的质量,效果超过了当前最好的短语系统"摩西(Moses)".  相似文献   

4.
Traditional topic models have been widely used for analyzing semantic topics from electronic documents. However, the obvious defects of topic words acquired by them are poor in readability and consistency. Only the domain experts are possible to guess their meaning. In fact, phrases are the main unit for people to express semantics. This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation (DRPhrase LDA) which is a phrase topic model. Specifically, we reasonably enhance the semantic information of phrases via distributed representation in this model. The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.  相似文献   

5.
A comparison of two bibliometric methods for mapping of the research front   总被引:4,自引:0,他引:4  
Summary This paper builds on previous research concerned with the classification and specialty mapping of research fields. Two methods are put to test in order to decide if significant differences as to mapping results of the research front of a science field occur when compared. The first method was based on document co-citation analysis where papers citing co-citation clusters were assumed to reflect the research front. The second method was bibliographic coupling where likewise citing papers were assumed to reflect the research front. The application of these methods resulted in two different types of aggregations of papers: (1) groups of papers citing clusters of co-cited works and (2) clusters of bibliographically coupled papers. The comparision of the two methods as to mapping results was pursued by matching word profiles of groups of papers citing a particular co-citation cluster with word profiles of clusters of bibliographically coupled papers. Findings suggested that the research front was portrayed in two considerably different ways by the methods applied. It was concluded that the results in this study would support a further comparative study of these methods on a more detailed and qualitative ground. The original data set encompassed 73,379 articles from the fifty most cited environmental science journals listed in Journal Citation Report, science edition downloaded from the Science Citation Index on CD-ROM.  相似文献   

6.
The aim of this paper is to identify the research status quo on pervasive and ubiquitous computing via scientometric analysis. Information visualization and knowledge domain visualization techniques were adopted to determine how the study of pervasive and ubiquitous computing has evolved. A total of 5,914 papers published between 1995 and 2009 were retrieved from the Web of Science with a topic search of pervasive or ubiquitous computing. CiteSpace is a java application for analyzing and visualizing a wide range of networks from bibliographic data. By use of it, we generated the subject category network to identify the leading research fields, the research power network to find out the most productive countries and institutes, the journal co-citation map to identify the distribution of core journals, the author co-citation map to identify key scholars and their co-citation patterns, the document co-citation network to reveal the ground-breaking literature and detect the co-citation clusters on pervasive and ubiquitous computing, and depicted the hybrid network of keywords and noun phrases to explore research foci on pervasive and ubiquitous computing over the entire span 1995–2009.  相似文献   

7.
Literature-related discovery (LRD) is the linking of two or more literature concepts that have heretofore not been linked (i.e., disjoint), in order to produce novel, interesting, and intelligible knowledge (i.e., potential discovery). The mainstream software for assisting LRD is Arrowsmith. It uses text-based linkage to connect two disjoint literatures, and it generates intermediate linking literatures by matching Title phrases from two disjoint literatures (literatures that do not share common records). Arrowsmith then prioritizes these linking phrases through a series of text-based filters. The present study examines citation-based linkage in addition to text-based linkage to link disjoint literatures through a process called bibliographic coupling. Two disjoint literatures were selected for the demonstration: Parkinson’s Disease (PD) (neurodegeneration) and Crohn’s Disease (CD) (autoimmune). Three cases were examined: (1) matching phrases in records with no shared references (text-based linkage only); (2) shared references in records with no matching phrases (citation-based linkage only); (3) matching phrases in records with shared references (text-based and citation-based linkages). In addition, the main themes in the body of shared references were examined through grouping techniques to identify the common themes between the two literatures. All the high-level concepts in the Case 1) records could be found in Case 3) records Some new concepts (at the sub-set level of the main themes) not found in the Case 3) records were identified in the Case 2) records. The synergy of matching phrases and shared references provides a strong prioritization to the selection of promising matching phrases as discovery mechanisms. There were three major themes that unified the PD and CD literatures: Genetics; Neuroimmunology; Cell Death. However, these themes are not completely independent. For example, there are genetic determinants of the inflammatory response. Naturally occurring genetic variants in important inflammatory mediators such as TNF-alpha appear to alter inflammatory responses in numerous experimental and a few clinical models of inflammation. Additionally, there is a strong link between neuroimmunology and cell death. In PD, for example, neuroinflammatory processes that are mediated by activated glial and peripheral immune cells might eventually lead to dopaminergic cell death and subsequent disease progression.  相似文献   

8.
Akbulut  Müge  Tonta  Yaşar  White  Howard D. 《Scientometrics》2020,122(2):957-987

The Related Records feature in the Web of Science retrieves records that share at least one item in their reference lists with the references of a seed record. This search method, known as bibliographic coupling, does not always yield topically relevant results. Our exploratory case study asks: How do retrievals of the type used in pennant diagrams compare with retrievals through Related Records? Pennants are two-dimensional visualizations of documents co-cited with a seed paper. In them, the well-known tf*idf (term frequency*inverse document frequency) formula is used to weight the co-citation counts. The weights have psychological interpretations from relevance theory; given the seed, tf predicts a co-cited document’s cognitive effects on the user, and idf predicts the user’s relative ease in relating its title to the seed’s title. We chose two seed papers from information science, one with only two references and the other with 20, and used them to retrieve 50 documents per method in WoS for each of our two seeds. We illustrate with pennant diagrams. Pennant retrieval indeed produced more relevant documents, especially for the paper with only two references, and it produced mostly different ones. Related Records performed almost as well on the paper with the longer reference list, improving remarkably as the coupling units between the seed and other papers increased. We argue that relevance rankings based on co-citation, with pennant-style weighting as an option, would be a desirable addition to WoS and similar databases.

  相似文献   

9.
The study of science at the individual scholar level requires the disambiguation of author names. The creation of author’s publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and tests a new methodology called seed + expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980–2011. In particular, we combine author records from a Dutch National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify ‘seed publications’ for each author using five different approaches. Subsequently, we ‘expand’ the set of publications in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a ‘gold standard’ dataset of authors for which verified publications in the period 2001–2010 are available.  相似文献   

10.
A new semi-automatic method is presented to standardize or codify addresses, in order to produce bibliometric indicators from bibliographic databases. The hypothesis is that this new method is very trustworthy to normalize authors’ addresses, easy and quick to obtain. As a way to test the method, a set of already hand-coded data is chosen to verify its reliability: 136,821 Spanish documents (2006–2008) downloaded previously from the Web of Science database. Unique addresses from this set were selected to produce a list of keywords representing various institutional sectors. Once the list of terms is obtained, addresses are standardized with this information and the result is compared to the previous hand-coded data. Some tests are done to analyze possible association between both systems (automatic and hand-coding), calculating measures of recall and precision, and some statistical directional and symmetric measures. The outcome shows a good relation between both methods. Although these results are quite general, this overview of institutional sectors is a good way to develop a second approach for the selection of particular centers. This system has some new features because it provides a method based on the previous non-existence of master lists or tables and it has a certain impact on the automation of tasks. The validity of the hypothesis has been proved taking into account not only the statistical measures, but also considering that the obtaining of general and detailed scientific output is less time-consuming and will be even less due to the feedback of these master tables reused for the same kind of data. The same method could be used with any country and/or database creating a new master list taking into account their specific characteristics.  相似文献   

11.
European authorship trends in fifteen major scientific and technical bibliographic databases on the DIALOG information system are examined for works published between 1970 and 1990. There was an increasing number of records with European authors in 21% of the data set. In 6%, an overall decline was found. In 52%, authorship increased into the 1980's, and then declined. The most heavily represented countries were the former Soviet Union, the United Kingdom, Germany, and France. Overall, with the exception of MEDLINE, BIOSIS, and INSPEC, coverage of the works of European authors has been declining over the past twenty years, and particularly so in the last five.  相似文献   

12.
Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers. The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase. Firstly, we extract keyframes from our dataset. Secondly, we produce a Concatenated Frame Images (CFIs) that represent the utterance sequence in one single image. Finally, the VGG-19 is employed for visual features extraction in our proposed model. We have examined different keyframes: 10, 15, and 20 for comparing two types of approaches in the proposed model: (1) the VGG-19 base model and (2) VGG-19 base model with batch normalization. The results show that the second approach achieves greater accuracy: 94% for digit recognition, 97% for phrase recognition, and 93% for digits and phrases recognition in the test dataset. Therefore, our proposed model is superior to models based on CFIs input.  相似文献   

13.
Natural language processing is the study of computer programs that can understand and produce human language. An important goal in the research to produce such technology is identifying the right meaning of words and phrases. In this paper, we give an overview of current research in three areas: (i) inducing word meaning; (ii) distinguishing different meanings of words used in context; and (iii) determining when the meaning of a phrase cannot straightforwardly be obtained from its parts. Manual construction of resources is labour intensive and costly and furthermore may not reflect the meanings that are useful for the task or data at hand. For this reason, we focus particularly on systems that use samples of language data to learn about meanings, rather than examples annotated by humans.  相似文献   

14.
V. Cano 《Scientometrics》1995,34(1):121-138
Bibliometric research can provide science policy makers with indicators of the capacity of a country's national scientific system to produce printed information. The capacity of the local publishing industry to produce scientific and technical periodical publications reflects the availability of outlets for the dissemination of scientific findings. The present research attempts to evaluate the role of the publishing industry in the level of bibliographic control, and the level of peer review of periodical publications from Latin America. A random search was performed on the 1990 Cd-Rom version ofThe Serials Directory, a commercially produced international reference source on periodical publications. A sample of 311 periodicals from Latin America was downloaded to a local database. A similar search was performed on publications from the United States and the United Kingdom for comparison purposes. A random search of 235 publications was downloaded into a local database. Publishers were classified for both samples according to three types: academic, governmental, and commercial. Publications were sorted thematically and indicators of bibliographic control, and of peer review were recorded for both samples. Publications from Latin America showed a very low level of bibliographic control, particularly in the case of the assignment of ISSN numbers, where 58% of the sample studied was published without this element of bibliographic control. This contrasted sharply with the periodicals from the US and UK, where 83% (195) journals had an ISSN number assigned. The involvement of editorial boards in the academic quality of Latin American publications amounted only to 21% of the sample studied. Periodicals from the US and UK reported an editor as responsible for the journal in 40% (93) of the cases. This amount constitutes about double the number of editors reported by Latin American publications. Latin American academic publishers are the most numerous publishers in the sample studied accounting for 37% (114) of the journals studied however, 68% (77) of those editors printed periodicals without a named editor. Governmental publishers are the second largest publisher type. They produced 29% (89) of the journals in the sample. Commercial publishers are responsible for 26% (82) of the journals studied. Publications from the US and UK show a clear predominance of commercial publishers, accounting for 47% (111) of the journals. Academic publishers only produced 29% (68) of the 235 journals in the sample. This clear dominance of the commercial publisher sector shows that publishing in at least the two countries studied is clearly practised as a business enterprise. This is in sharp comparison to the publishing patterns exhibited in Latin America where the academic sector is the most prominent one.  相似文献   

15.
We present a case study of how scientometric tools can reveal the structure of scientific theory in a discipline. Specifically, we analyze the patterns of word use in the discipline of cognitive science using latent semantic analysis, a well-known semantic model, in the abstracts of over a thousand academic papers relevant to these theories. Our results show that it is possible to link these theories with specific statistical distributions of words in the abstracts of papers that espouse these theories. We show that theories have different patterns of word use, and that the similarity relationships with each other are intuitive and informative. Moreover, we show that it is possible to predict fairly accurately the theory of a paper by constructing a model of the theories based on their distribution of word use. These results may open new avenues for the application of scientometric tools on theoretical divides.  相似文献   

16.
Technology analysis is a process which uses textual analysis to detect trends in technological innovation. Co-word analysis (CWA), a popular method for technology analysis, encompasses (1) defining a set of keyword or key phrase patterns which are represented in technology-dependent terms, (2) generating a network that codifies the relations between occurrences of keywords or key phrases, and (3) identifying specific trends from the network. However, defining the set of keyword or key phrase patterns heavily relies on effort of experts, who may be expensive or unavailable. Furthermore defining keyword or key phrase patterns of new or emerging technology areas may be a difficult task even for experts. To solve the limitation in CWA, this research adopts a property-function based approach. The property is a specific characteristic of a product, and is usually described using adjectives; the function is a useful action of a product, and is usually described using verbs. Properties and functions represent the innovation concepts of a system, so they show innovation directions in a given technology. The proposed methodology automatically extracts properties and functions from patents using natural language processing. Using properties and functions as nodes, and co-occurrences as links, an invention property-function network (IPFN) can be generated. Using social network analysis, the methodology analyzes technological implications of indicators in the IPFN. Therefore, without predefining keyword or key phrase patterns, the methodology assists experts to more concentrate on their knowledge services that identify trends in technological innovation from patents. The methodology is illustrated using a case study of patents related to silicon-based thin film solar cells.  相似文献   

17.
This paper shows how Database Tomgraphy can be used to derive technical intelligence from the published literature. Database Tomography is a patented system for analyzing large amounts of textual computerized material. It includes algorithms for extracting multi-word phrase frequencies and performing phrase proximity analyses. Phrase frequency analysis provides the pervasive themes of a database, and the phrase proximity analysis provides the relationships among the pervasive themes, and between the pervasive themes and sub-themes. One potential application of Database Tomography is to obtain the thrusts and interrelationships of a technical field from papers published in the literature within that field. This paper provides applications of Database Tomography to analyses of both the non-technical field of Research Impact Assessment (RIA) and the technical field of Chemistry. A database of relevant RIA articles was analyzed to produce characteristics and key features of the RIA field. The recent prolific RIA authors, the journals prolific in RIA papers, the prolific institutions in RIA, the prolific keywords specified by the authors, and the authors whose works are cited most prolifically as well as the particular papers/journals/institutions cited most prolifically, are identified. The pervasive themes of RIA are identified through multi-word phrase of the database. A phrase proximity analysis of the database shows the relationships among the pervasive themes, and the relationships between the pervasive themes and subthemes. A similar process was applied to Chemistry, with the exception that the database was limited to one year's issues of the Journal of the American Chemical Society. Wherever possible, the RIA and Chemistry results were compared. Finally, the conceptual use of Database Tomography to help identify promising research directions was discussed.  相似文献   

18.
This paper examines estimation of the extremely low frequency magnetic fields (MF) in the power substation. First, the results of the previous relevant research studies and the MF measurements in a sample power substation are presented. Then, a fuzzy logic model based on the geometric definitions in order to estimate the MF distribution is explained. Visual software, which has a three-dimensional screening unit, based on the fuzzy logic technique, has been developed.  相似文献   

19.
Nelson  Michael  Downie  J. Stephen 《Scientometrics》2002,54(2):243-255
We analyse the statistical properties a database of musical notes for the purpose of designing an information retrieval system as part of the Musifind project. In order to reduce the amount of musical information we convert the database to the intervals between notes, which will make the database easier to search. We also investigate a further simplification by creating equivalence classes of musical intervals which also increases the resilience of searches to errors in the query. The Zipf, Zipf-Mandelbrot, Generalized Waring (GW) and Generalized Inverse Gaussian-Poisson (GIGP) distributions are tested against these various representations with the GIGP distribution providing the best overall fit for the data. There are many similarities with text databases, especially those with short bibliographic records. There are also some differences, particularly in the highest frequency intervals which occur with a much lower frequency than the highest frequency “stopwords” in a text database. This provides evidence to support the hypothesis that traditional text retrieval methods will work for a music database. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

20.
This paper presents a comprehensive framework for reliability prediction during the product development process. Early in the product development process, there is typically little or no quantitative evidence to predict the reliability of the new concept except indirect or qualitative information. The proposed framework addresses the issue of utilizing qualitative information in the reliability analysis. The framework is based on the Bayesian approach. The fuzzy logic theory is used to enhance the capability of the Bayesian approach to deal with qualitative information. This paper proposes to extract the information from various design tools and design review records and incorporate it into the Bayesian framework through a fuzzy inference system. The Weibull distribution is considered as failure/survival time distribution with the assumption of a known value of shape factor. Initial parameters of the Weibull distribution are estimated from warranty data of prior systems to estimate the initial Bayesian parameter ( λt). The applicability of the framework is illustrated via an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号