首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
利用聚类算法提高基于内容的图像检索准确率   总被引:3,自引:0,他引:3  
给出了一种基于内容的图像检索算法,该算法使用了图像的颜色直方图作为图像的检索特征,并且利用了K均值聚类算法以及用户相关反馈技术来提高检索的准确率。  相似文献   

2.
This paper analyzes differences between a numeric and symbolic approach to inductive inference. It shows the importance of existing structures in the acquisition of further knowledge, including statistical confirmation. We present a new way of looking at Hempel's paradox, in which both existing structures and statistical confirmation play a role in order to decrease the harm it does to learning. We point out some of the most important structures, and we illustrate how uncertainty does blur but does not destroy these structures. We conclude that pure symbolic as well as pure statistical learning is not realistic, but the integration of the two points of view is the key to future progress, but it is far from trivial. Our system KBG is a first-order logic conceptual clustering system; thus it builds knowledge structures out of unrelated examples. We describe the choices done in KBG in order to build these structures, using both numeric and symbolic types of knowledge. Our argument gives us firm grounds to contradict Carnap's view that induction is nothing but uncertain deduction, and to propose a refinement to Popper's purely deductive view of the growth of science. In our view, progressive organization of knowledge plays an essential role in the growth of new (inductive) scientific theories, that will be confirmed later, quite in the Popperian way.  相似文献   

3.
Background: The contribution of modeling in software development has been a subject of debates. The proponents of model-driven development argue that a big upfront modeling requires substantial investment, but it will payoff later in the implementation phase in terms of increased productivity and quality. Other software engineers perceive modeling activity as a waste of time and money without any real contribution to the final software product. Considering present advancement of model-based software development in software industry, we are challenged to investigate the real contribution of modeling in software development. Objective: We analyze the impacts of UML modeling, specifically the production of class and sequence diagrams, on the quality of the code, as measured by defect density, and on defect resolution time. Method: Using data of a proprietary system, we conduct post-mortem analyses to test the difference in defect density between software modules that are modeled and not modeled. Similarly, we test the difference in resolution time between defects that are related to modeled and not modeled functionality. Result: We have found that the production of UML class diagrams and sequence diagrams reduces defect density in the code and the time required to fix defects. These results are obtained after controlling for the effects of co-factors such as code coupling and complexity. Conclusion: The results confirm that not only does the production of UML class diagrams and sequence diagrams possibly help improve the quality of software, but also it possibly help increase the productivity in software maintenance.  相似文献   

4.
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.  相似文献   

5.
张晓博  杨燕  李天瑞  陆凡  彭莉兰 《计算机应用》2020,40(10):3088-3094
针对多发于老龄人群的帕金森病(PD)的早期智能化诊断的问题,提出基于医疗检测文本信息数据的聚类技术来对PD进行分析预测。首先,对原始数据集进行预处理以获取有效特征信息,并通过主成分分析(PCA)方法将原始特征分别降维到8个不同维度的维度空间;然后,应用5个传统的经典聚类模型和3种不同的聚类集成方法分别对8个维度空间的数据进行聚类;最后,采用4个聚类性能指标来预测数据集中的多巴胺异常PD患者、健康体和无多巴胺缺失(SWEDD) PD患者。仿真结果显示,PCA特征维度值取30时,高斯混合模型(GMM)的聚类准确度达到89.12%;PCA特征维度值取70时,谱聚类(SC)的聚类准确度达到61.41%;PCA特征维度值取80时,元聚类算法(MCLA)的聚类准确度达到59.62%。对比实验结果表明,5种经典聚类方法中,PCA的特征维度值小于40时,高斯混合模型聚类效果最佳;3种聚类集成方法中,对于不同的特征维度,MCLA的聚类性能均表现优异,进而为PD的早期智能化辅助诊断提供了技术和理论支撑。  相似文献   

6.
张晓博  杨燕  李天瑞  陆凡  彭莉兰 《计算机应用》2005,40(10):3088-3094
针对多发于老龄人群的帕金森病(PD)的早期智能化诊断的问题,提出基于医疗检测文本信息数据的聚类技术来对PD进行分析预测。首先,对原始数据集进行预处理以获取有效特征信息,并通过主成分分析(PCA)方法将原始特征分别降维到8个不同维度的维度空间;然后,应用5个传统的经典聚类模型和3种不同的聚类集成方法分别对8个维度空间的数据进行聚类;最后,采用4个聚类性能指标来预测数据集中的多巴胺异常PD患者、健康体和无多巴胺缺失(SWEDD) PD患者。仿真结果显示,PCA特征维度值取30时,高斯混合模型(GMM)的聚类准确度达到89.12%;PCA特征维度值取70时,谱聚类(SC)的聚类准确度达到61.41%;PCA特征维度值取80时,元聚类算法(MCLA)的聚类准确度达到59.62%。对比实验结果表明,5种经典聚类方法中,PCA的特征维度值小于40时,高斯混合模型聚类效果最佳;3种聚类集成方法中,对于不同的特征维度,MCLA的聚类性能均表现优异,进而为PD的早期智能化辅助诊断提供了技术和理论支撑。  相似文献   

7.
8.
Among the various potential applications of neural networks, forecasting is considered to be a major application. Several researchers have reported their experiences with the use of neural networks in forecasting, and the evidence is inconclusive. This paper presents the results of a forecasting competition between a neural network model and a Box-Jenkins automatic forecasting expert system. Seventy-five series, a subset of data series which have been used for comparison of various forecasting techniques, were analysed using the Box-Jenkins approach and a neural network implementation. The results show that the simple neural net model tested on this set of time series could forecast about as well as the Box-Jenkins forecasting system.  相似文献   

9.
Adewoyin  Rilwan A.  Dueben  Peter  Watson  Peter  He  Yulan  Dutta  Ritabrata 《Machine Learning》2021,110(8):2035-2062
Machine Learning - Climate models (CM) are used to evaluate the impact of climate change on the risk of floods and heavy precipitation events. However, these numerical simulators produce outputs...  相似文献   

10.
《Computers & Education》2005,44(3):257-283
In a classroom, a teacher attempts to convey his or her knowledge to the students, and thus it is important for the teacher to obtain formative feedback about how well students are understanding the new material. By gaining insight into the students' understanding and possible misconceptions, the teacher will be able to adjust the teaching and to supply more useful learning materials as necessary. Therefore, the diagnosis of formative student evaluations is critical for teachers and learners, as is the diagnosis of patterns in the overall learning by a class in order to inform a teacher about the efficacy of his or her teaching. This paper investigates what might be called the “class learning diagnosis problem” by embedding important concepts in a test and analyzing the results with a hierarchical coding scheme. Based on previous research, the part-of and type-of relationships among concepts are used to construct a concept hierarchy that may then be coded hierarchically. All concepts embedded in the test items then can be formulated into concept matrices, and the answer sheets of the learners in a class are then analyzed to indicate particular types of concept errors. The trajectories of concept errors are studied to identify both individual misconceptions students might have as well as patterns of misunderstanding in the overall class. In particular, a clustering algorithm is employed to distinguish student groups who might share similar misconceptions. These approaches are implemented as an integrated module in a previously developed system and applied to two real classroom data sets, the results of which show the practicability of this proposed method.  相似文献   

11.
This study applies clustering analysis for data mining and machine learning to predict trends in technology professional turnover rates, including the hybrid artificial neural network and clustering analysis known as the self-organizing map (SOM). This hybrid clustering method was used to study the individual characteristics of turnover trend clusters. Using a transaction questionnaire, we studied the period of peak turnover, which occurs after the Chinese New Year, for individuals divided into various age groups. The turnover trend of technology professionals was examined in well-known Taiwanese companies. The results indicate that the high outstanding turnover trend circle was primarily caused by a lack of inner fidelity identification, leadership and management. Based on cross-verification, the clustering accuracy rate was 92.7%. This study addressed problems related to the rapid loss of key human resources and should help organizations learn how to enhance competitiveness and efficiency.  相似文献   

12.
We introduce an efficient approach to mining multi-dimensional temporal streams of real-world data for ordered temporal motifs that can be used for prediction. Since many of the dimensions of the data are known or suspected to be irrelevant, our approach first identifies the salient dimensions of the data, then the key temporal motifs within each dimension, and finally the temporal ordering of the motifs necessary for prediction. For the prediction element, the data are assumed to be labeled. We tested the approach on two real-world data sets. To verify the generality of the approach, we validated the application on several subjects from the CMU Motion Capture database. Our main application uses several hundred numerically simulated supercell thunderstorms where the goal is to identify the most important features and feature interrelationships which herald the development of strong rotation in the lowest altitudes of a storm. We identified sets of precursors, in the form of meteorological quantities reaching extreme values in a particular temporal sequence, unique to storms producing strong low-altitude rotation. The eventual goal is to use this knowledge for future severe weather detection and prediction algorithms.  相似文献   

13.
Software effort estimation accuracy is a key factor in effective planning, controlling, and delivering a successful software project within budget and schedule. The overestimation and underestimation both are the key challenges for future software development, henceforth there is a continuous need for accuracy in software effort estimation. The researchers and practitioners are striving to identify which machine learning estimation technique gives more accurate results based on evaluation measures, datasets and other relevant attributes. The authors of related research are generally not aware of previously published results of machine learning effort estimation techniques. The main aim of this study is to assist the researchers to know which machine learning technique yields the promising effort estimation accuracy prediction in software development. In this article, the performance of the machine learning ensemble and solo techniques are investigated on publicly and non-publicly domain datasets based on the two most commonly used accuracy evaluation metrics. We used the systematic literature review methodology proposed by Kitchenham and Charters. This includes searching for the most relevant papers, applying quality assessment (QA) criteria, extracting data, and drawing results. We have evaluated a state-of-the-art accuracy performance of 35 selected studies (17 ensemble, 18 solo) using mean magnitude of relative error and PRED (25) as a set of reliable accuracy metrics for performance evaluation of accuracy among two techniques to report the research questions stated in this study. We found that machine learning techniques are the most frequently implemented in the construction of ensemble effort estimation (EEE) techniques. The results of this study revealed that the EEE techniques usually yield a promising estimation accuracy than the solo techniques.  相似文献   

14.
基于6种语体的句法和语义树库分别构建了依存句法和语义网络,对这些网络的边数、节点数、节点平均度、聚类系数、平均最短路径长度、网络中心势、直径、节点度幂律分布的幂指数、度分布与幂律拟合的决定系数等整体特征进行了对比分析。以这些整体特征为变量,采用不同的聚类方法,对这6种语体的句法和语义网络进行了聚类分析。研究结果显示,同样是基于语言学原则构建起来的网络结构,依存句法网络和依存语义网络之间有明显差异。其参数的含义不尽相同,依据其各项参数所做的聚类实验的结果也不相同。采用语义网络的一些主要参数组合,可以获得相对合理的聚类结果,但不能很好地区分书面语体和口语体;通过句法网络的一些主要参数组合,可以很好地区分不同语体的文本,获得较为合理的文本聚类结果。  相似文献   

15.
This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.  相似文献   

16.
Canada is considering the development of a new standard for infant/child life jackets. Eight currently available (approved and non-approved) infant/child life jackets were procured for evaluation. Fifty-six participants were chosen as a sample of convenience from the general public for testing. The life jackets were divided into two groups of four, which were donned on a soft infant manikin procured from the Red Cross. In 224 attempts at donning, only 43 (19%) attempts resulted in the life jacket being donned correctly in less than 1 min. Only one life jacket came close to a good design and passed the life jacket standard for donning time and accuracy. Failure rates were observed across all the participants irrespective of age, gender, experience with children and experience with recreational marine equipment. Accuracy and speed of donning the life jacket were hampered as the number of donning sub-tasks increased. It was concluded that it is possible to design a life jacket that can be donned correctly in under 1 min. The life jacket must be of simple, intuitive design and fall naturally into the anatomical shape of the child. A minimum number of ties, zips and clips should be used in the design, and if such connectors are used they should be color coded or of different shapes and sizes to avoid confusion.  相似文献   

17.
A robust system is proposed to automatically detect and extract text in images from different sources, including video, newspapers, advertisements, stock certificates, photographs, and checks. Text is first detected using multiscale texture segmentation and spatial cohesion constraints, then cleaned up and extracted using a histogram-based binarization algorithm. An automatic performance evaluation scheme is also proposed  相似文献   

18.
The paper deals with the problem of prediction of time series with memory for which classical prediction methods are frequently inadequate. A method is proposed that is based on a model of cellular automata, classification methods, and fuzzy set theory. The accuracy of models based on this method is estimated. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 43–54, November–December 2006.  相似文献   

19.
Clustering-based sentiment analysis is a novel approach for analyzing opinions expressed in reviews, comments or blogs. In contrast to the two traditional mainstream approaches (supervised learning and symbolic techniques), the clustering-based approach is able to produce basically accurate analysis results without any human participation, linguist knowledge or training time. This paper introduces new techniques designed to extend the capability of the clustering-based sentiment analysis approach in two aspects: firstly by applying opposite opinion contents processing and non-opinion contents processing techniques to further enhance accuracy; and secondly by using a modified voting mechanism and distance measurement method to conduct fine-grained (three classes) sentiment analysis. According to the experiment results, the clustering-based approach is proven to be useful in performing high quality sentiment analysis result, and suitable for recognizing neutral opinions.  相似文献   

20.
In this paper we propose a method for evaluating the performance of an evolutionary learning system aimed at producing the optimal set of prototypes to be used by a handwriting recognition system. The trade-off between generalization and specialization embedded into any learning process is managed by iteratively estimating both consistency and completeness of the prototypes, and by using such an estimate for tuning the learning parameters in order to achieve the best performance with the smallest set of prototypes. Such estimation is based on a characterization of the behavior of the learning system, and is accomplished by means of three performance indices. Both the characterization and the indices do not depend on either the system implementation or the application, and therefore allow for a truly black-box approach to the performance evaluation of any evolutionary learning system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号