首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
As an increasing number of scientific literature dataset are open access, more attention has gravitated to keyword analysis in many scientific fields. Traditional keyword analyses include the frequency based and the network based methods, both providing efficient mining techniques for identifying the representative keywords. The semantic meanings behind the keywords are important for understanding the research content. However, traditional keyword analysis methods pay scant attention to semantic meanings; the network based or frequency based methods as traditionally used, present limited semantic associations among the keywords. Moreover, the ways in which the semantic meanings behind the keywords are associated to the citations are not clear. Thus, we use the Google Word2Vec model to build word vectors and reduce them to a two-dimensional plane in a Voronoi diagram using the t-SNE algorithm, to link meanings with citations. The distance between semantic meanings of keywords in two-dimensional plane are similar to distances in geographical space, thus we introduce a geographic metaphor, “Ghost City” to describe the relationship between semantics and citations for hot topics that have recently become not so hot. Along with “Ghost City” zones, “Always Hot”, “Newly Emerging Hot”, and “Always Silent” areas are classified and mapped, describing the spatial heterogeneity and homogeneity of the semantic distribution of keywords cited in a domain database. Using a collection of “geographical natural hazard” literature datasets, we demonstrate that the proposed method and classification scheme can efficiently provide a unique viewpoint for interpreting the interaction between semantics and the citations, as “Ghost City”, “Always Hot”, “Newly Emerging Hot”, and “Always Silent” areas.  相似文献   

2.
3.
Companies should investigate possible patent infringement and cope with potential risks because patent litigation may have a tremendous financial impact. An important factor to identify the possibility of patent infringement is the technological similarity among patents, so this paper considered technological similarity as a criterion for judging the possibility of infringement. Technological similarities can be measured by transforming patent documents into abstracted forms which contain specific technological key-findings and structural relationships among technological components in the invention. Although keyword-based technological similarity has been widely adopted for patent analysis related research, it is inadequate for identifying patent infringement because a keyword vector cannot reflect specific technological key-findings and structural relationships among technological components. As a remedy, this paper exploited a subject–action–object (SAO) based semantic technological similarity. An SAO structure explicitly describes the structural relationships among technological components in the patent, and the set of SAO structures is considered to be a detailed picture of the inventor’s expertise, which is the specific key-findings in the patent. Therefore, an SAO based semantic technological similarity can identify patent infringement. Semantic similarity between SAO structures is automatically measured using SAO based semantic similarity measurement method using WordNet, and the technological relationships among patents were mapped onto a 2-dimensional space using multidimensional scaling (MDS). Furthermore, a clustering algorithm is used to automatically suggest possible patent infringement cases, allowing large sets of patents to be handled with minimal effort by human experts. The proposed method will be verified by detecting real patent infringement in prostate cancer treatment technology, and we expect this method to relieve human experts’ work in identifying patent infringement.  相似文献   

4.
Natural language semantic construction improves natural language comprehension ability and analytical skills of the machine. It is the basis for realizing the information exchange in the intelligent cloud-computing environment. This paper proposes a natural language semantic construction method based on cloud database, mainly including two parts: natural language cloud database construction and natural language semantic construction. Natural Language cloud database is established on the CloudStack cloud-computing environment, which is composed by corpus, thesaurus, word vector library and ontology knowledge base. In this section, we concentrate on the pretreatment of corpus and the presentation of background knowledge ontology, and then put forward a TF-IDF and word vector distance based algorithm for duplicated webpages (TWDW). It raises the recognition efficiency of repeated web pages. The part of natural language semantic construction mainly introduces the dynamic process of semantic construction and proposes a mapping algorithm based on semantic similarity (MBSS), which is a bridge between Predicate-Argument (PA) structure and background knowledge ontology. Experiments show that compared with the relevant algorithms, the precision and recall of both algorithms we propose have been significantly improved. The work in this paper improves the understanding of natural language semantics, and provides effective data support for the natural language interaction function of the cloud service.  相似文献   

5.
Stochastic analysis of structures using probability methods requires the statistical knowledge of uncertain material parameters. This is often quite easier to identify these statistics indirectly from structure response by solving an inverse stochastic problem. In this paper, a robust and efficient inverse stochastic method based on the non-sampling generalized polynomial chaos method is presented for identifying uncertain elastic parameters from experimental modal data. A data set on natural frequencies is collected from experimental modal analysis for sample orthotropic plates. The Pearson model is used to identify the distribution functions of the measured natural frequencies. This realization is then employed to construct the random orthogonal basis for each vibration mode. The uncertain parameters are represented by polynomial chaos expansions with unknown coefficients and the same random orthogonal basis as the vibration modes. The coefficients are identified via a stochastic inverse problem. The results show good agreement with experimental data.  相似文献   

6.
刘保旗  林丽  郭主恩 《包装工程》2024,45(2):110-117
目的 为解决传统感性设计研究中意象实验耗时大以及小样本偶然性等问题,依托现有网络评价文本信息提取了用户意象认知。方法 首先,爬取大规模汽车外观评论文本,构建语义分析词汇库,构建word2vec词向量模型;然后,基于模型获取词库内部的语义联系,计算高频关键形容词之间的语义离散性,以构建代表性意象词空间;最后,通过语义量化匹配将评论映射到意象词空间,得到大规模用户对各车型的显著性意象表征,明确了指定意象词汇下的汽车外观匹配结果。结果 运用该方法挖掘汽车外观显著性意象与基于人工评价的实验结果无显著性差异且具有高度相关性,证明了该方法的有效性。结论 以该方法挖掘用户意象认知,运用了现有的大批量用户反馈知识,提高了意象分析效率,有助于决策者快速理解消费者对汽车外观的感性知识,在设计迭代中可使产品更符合市场期望;对比相关研究,基于语义量化匹配的方式无需对超高维向量进行降维和聚类,避免了以往研究因特征降维而可能导致的词向量语义联系的损失,以得到更为准确的意象挖掘结果。  相似文献   

7.
As supply chains are becoming ever more global and agile in the modern manufacturing era, enterprises are increasingly dependent upon the efficient and effective discovery of shared manufacturing resources provided by their partners, wherever they are. Enterprises are thus faced with increasing challenges caused by the technical difficulties and ontological issues in manufacturing interoperability and integration over heterogeneous computing platforms. This paper presents a prototype intelligent system SWMRD (Semantic Web-based manufacturing resource discovery) for distributed manufacturing collaboration across ubiquitous virtual enterprises. Ontology-based annotation to the distributed manufacturing resources via a new, multidisciplinary manufacturing ontology is proposed on the semantic web to convert resources into machine understandable knowledge, which is a prelude to the meaningful resource discovery for cross-enterprise multidisciplinary collaboration. An ontology-based multi-level knowledge retrieval model is devised to extend the traditional information retrieval approaches based on keyword search, with integrated capabilities of graph search, semantic search, fuzzy search and automated reasoning to realise the intelligent discovery of manufacturing resources, e.g. to facilitate more flexible, meaningful, accurate and automated resource discovery. A case study for intelligent discovery of manufacturing resources is used to demonstrate the practicality of the developed system.  相似文献   

8.
This study aims to map the content and structure of the knowledge base of research on intercultural relations as revealed in co-citation networks of 30 years of scholarly publications. Source records for extracting co-citation information are retrieved from Web of Science (1980–2010) through comprehensive keyword search and filtered by manual semantic coding. Exploratory network and content analysis is conducted (1) to discover the development of major research themes and the relations between them over time; (2) to locate representative core publications (the stars) that are highly co-cited with others and those (the bridges) connecting more between rather than within subfields or disciplines. Structural analysis of the co-citation networks identifies a core cluster that contains foundational knowledge of this domain. It is well connected to almost all the other clusters and covers a wide range of subject categories. The evolutionary path of research themes shows trends moving towards (e.g. psychology and business and economics) and away from (e.g. language education and communication) the core cluster over time. Based on the results, a structural framework of the knowledge domain of intercultural relations research is proposed to represent thematic relatedness between topical groups and their relations.  相似文献   

9.
10.
In order to stimulate innovation during the collaborative process of new product and production development, especially to avoid duplicating existing techniques or infringing upon others’ patents and intellectual property rights, the collaborative team of research and development, and patent engineers must accurately identify relevant patent knowledge in a timely manner. This research develops a novel knowledge management approach using ontology-based artificial neural network (ANN) algorithm to automatically classify and search knowledge documents stored in huge online patent corpuses. This research focuses on developing a smart and semantic oriented classification and search from the sources of the most critical and well-structured knowledge publications, i.e. patents, to gain valuable and practical references for the collaborative networks of technology-centric product and production development teams. The research uses the domain ontology schema created using Protégé and derives the semantic concept probabilities of key phrases that frequently occur in domain relevant patent documents. Then, by combining the term frequencies and the concept probabilities of key phrases as the ANN inputs, the method shows significant improvement in classification accuracy. In addition, this research provides an advanced semantic-oriented search algorithm to accurately identify related patent documents in the patent knowledge base. The case demonstration analyses 343 chemical mechanical polishing and 150 radio-frequency identification patents sample sets to verify and measure the performance of the proposed approach. The results are compared with the previous automatic classification methods demonstrating much improved outcomes.  相似文献   

11.
12.
Conferences play a major role for the development of scientific domains. While journal and article contributions in the field of international business (IB) are a general and well researched area of scientometric studies, conferences are not. The absence of a systematic assessment of international business conferences as a reference to the collective status of the Academy of International Business (AIB) community is astonishing. Whatever reasons are accountable for that fact, this paper starts to fill that gap. It establishes a knowledge network composed of the last six years AIB conferences. We collected all the contributions in full text with their abstracts and keywords from 2006 to 2011. All the data have been organized in a data system and we used the information-theoretic clustering method which allows different analytical views through the entire knowledge corpus. The results indicate significant statistical differences between topic modules and keyword threads of the yearly conferences. There are three keywords which dominate as a leitmotif between 2006 and 2011, but the detailed structure changes from conference to conference significantly.  相似文献   

13.
Bibliometric analysis of publication metadata is an important tool for investigating emerging fields of technology. However, the application of field definitions to define an emerging technology is complicated by ongoing and at times rapid change in the underlying technology itself. There is limited prior work on adapting the bibliometric definitions of emerging technologies as these technologies change over time. The paper addresses this gap. We draw on the example of the modular keyword nanotechnology search strategy developed at Georgia Institute of Technology in 2006. This search approach has seen extensive use in analyzing emerging trends in nanotechnology research and innovation. Yet with the growth of the nanotechnology field, novel materials, particles, technologies, and tools have appeared. We report on the process and results of reviewing and updating this nanotechnology search strategy. By employing structured text-mining software to profile keyword terms, and by soliciting input from domain experts, we identify new nanotechnology-related keywords. We retroactively apply the revised evolutionary lexical query to 20 years of publication data and analyze the results. Our findings indicate that the updated search approach offers an incremental improvement over the original strategy in terms of recall and precision. Additionally, the updated strategy reveals the importance for nanotechnology of several emerging cited-subject categories, particularly in the biomedical sciences, suggesting a further extension of the nanotechnology knowledge domain. The implications of the work for applying bibliometric definitions to emerging technologies are discussed.  相似文献   

14.
The meaning of a word includes a conceptual meaning and a distributive meaning. Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity, especially for low-frequency words. In knowledge bases, manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted. In this paper, we propose a Conceptual Semantics Enhanced Word Representation (CEWR) model, computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus, and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words. We evaluate the CEWR model on two tasks: word similarity computation and short text classification. The Spearman correlation between model results and human judgement are improved to 64.71%, 81.84%, and 85.16% on Wordsim297, MC30, and RG65, respectively. Moreover, CEWR improves the F1 score by 3% in the short text classification task. The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding. This proves that conceptual semantics, especially hypernymous information, is a good complement to distributed word representation.  相似文献   

15.
An approach using semantic metrics to provide insight into software quality early in the design phase of software development by automatically analysing natural language (NL) design specifications for object-oriented systems is presented. Semantic metrics are based on the meaning of software within the problem domain. In this paper, we extend semantic metrics to analyse design specifications. Since semantic metrics can now be calculated from early in design through software maintenance, they provide a consistent and seamless type of metric that can be collected through the entire lifecycle. We discuss our semMet system, an NL-based program comprehension tool we have expanded to calculate semantic metrics from design specifications. To validate semantic metrics from design specifications and to illustrate their seamless nature across the software lifecycle, we compare semantic metrics from different phases of the lifecycle, and we also compare them to syntactically oriented metrics calculated from the source code. Results indicate semantic metrics calculated from design specifications can give insight into the quality of the source code based on that design. Also, these results illustrate that semantic metrics provide a consistent and seamless type of metric that can be collected through the entire lifecycle.  相似文献   

16.
Entity and relation extraction is an indispensable part of domain knowledge graph construction, which can serve relevant knowledge needs in a specific domain, such as providing support for product research, sales, risk control, and domain hotspot analysis. The existing entity and relation extraction methods that depend on pretrained models have shown promising performance on open datasets. However, the performance of these methods degrades when they face domain-specific datasets. Entity extraction models treat characters as basic semantic units while ignoring known character dependency in specific domains. Relation extraction is based on the hypothesis that the relations hidden in sentences are unified, thereby neglecting that relations may be diverse in different entity tuples. To address the problems above, this paper first introduced prior knowledge composed of domain dictionaries to enhance characters’ dependence. Second, domain rules were built to eliminate noise in entity relations and promote potential entity relation extraction. Finally, experiments were designed to verify the effectiveness of our proposed methods. Experimental results on two domains, including laser industry and unmanned ship, showed the superiority of our methods. The F1 value on laser industry entity, unmanned ship entity, laser industry relation, and unmanned ship relation datasets is improved by +1%, +6%, +2%, and +1%, respectively. In addition, the extraction accuracy of entity relation triplet reaches 83% and 76% on laser industry entity pair and unmanned ship entity pair datasets, respectively.  相似文献   

17.
Yin  Xicheng  Wang  Hongwei  Yin  Pei  Zhu  Hengmin  Zhang  Zhenyu 《Scientometrics》2020,124(3):1885-1905
Scientometrics - The performance of keyword expansion in prior methods is often enhanced by adopting external knowledge. Given a set of initial keywords, this paper is motivated to propose a novel...  相似文献   

18.
Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.  相似文献   

19.
基于矢量空间模型和最大熵模型的词义问题解决策略   总被引:2,自引:0,他引:2  
针对单义词的词义问题构建了融合触发对(trigger pair)的矢量空间模型用来进行词义相似度的计算,并以此为基础进行了词语的聚类;针对多义词的词义问题应用融合远距离上下文信息的最大熵模型进行了有导词义消歧的研究。为克服以往词义消歧评测中通过人工构造带有词义标记的测试例句而带来的覆盖程度小、主观影响大等问题,将模型的评测直接放到了词语聚类和分词歧义这两个实际的应用中。分词歧义的消解正确率达到了92%,词语聚类的结果满足进一步应用的需要。  相似文献   

20.
Document processing in natural language includes retrieval, sentiment analysis, theme extraction, etc. Classical methods for handling these tasks are based on models of probability, semantics and networks for machine learning. The probability model is loss of semantic information in essential, and it influences the processing accuracy. Machine learning approaches include supervised, unsupervised, and semi-supervised approaches, labeled corpora is necessary for semantics model and supervised learning. The method for achieving a reliably labeled corpus is done manually, it is costly and time-consuming because people have to read each document and annotate the label of each document. Recently, the continuous CBOW model is efficient for learning high-quality distributed vector representations, and it can capture a large number of precise syntactic and semantic word relationships, this model can be easily extended to learn paragraph vector, but it is not precise. Towards these problems, this paper is devoted to developing a new model for learning paragraph vector, we combine the CBOW model and CNNs to establish a new deep learning model. Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号