首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Knowledge and Information Systems - Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which...  相似文献   

2.
In this paper, the state-of-the-art parallel computational model research is reviewed. We will introduce various models that were developed during the past decades. According to their targeting architecture features, especially memory organization, we classify these parallel computational models into three generations. These models and their characteristics are discussed based on three generations classification. We believe that with the ever increasing speed gap between the CPU and memory systems, incorporating non-uniform memory hierarchy into computational models will become unavoidable. With the emergence of multi-core CPUs, the parallelism hierarchy of current computing platforms becomes more and more complicated. Describing this complicated parallelism hierarchy in future computational models becomes more and more important. A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity, thus allowing more complicated models with more parameters to be adopted. Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.  相似文献   

3.
Models of parallel computation :a survey and classification   总被引:5,自引:1,他引:5  
In this paper, the state-of-the-art parallel computational model research is reviewed. We will introduce various models that were developed during the past decades. According to their targeting architecture features, especially memory organization, we classify these parallel computational models into three generations. These models and their characteristics are discussed based on three generations classification. We believe that with the ever increasing speed gap between the CPU and memory systems, incorporating non-uniform memory hierarchy into computational models will become unavoidable. With the emergence of multi-core CPUs, the parallelism hierarchy of current computing platforms becomes more and more complicated. Describing this complicated parallelism hierarchy in future computational models becomes more and more important. A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity, thus allowing more complicated models with more parameters to be adopted. Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.  相似文献   

4.
Malware classification based on call graph clustering   总被引:1,自引:0,他引:1  
Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, enabling the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.  相似文献   

5.
Information Retrieval (IR) systems assist users in finding information from the myriad of information resources available on the Web. A traditional characteristic of IR systems is that if different users submit the same query, the system would yield the same list of results, regardless of the user. Personalised Information Retrieval (PIR) systems take a step further to better satisfy the user’s specific information needs by providing search results that are not only of relevance to the query but are also of particular relevance to the user who submitted the query. PIR has thereby attracted increasing research and commercial attention as information portals aim at achieving user loyalty by improving their performance in terms of effectiveness and user satisfaction. In order to provide a personalised service, a PIR system maintains information about the users and the history of their interactions with the system. This information is then used to adapt the users’ queries or the results so that information that is more relevant to the users is retrieved and presented. This survey paper features a critical review of PIR systems, with a focus on personalised search. The survey provides an insight into the stages involved in building and evaluating PIR systems, namely: information gathering, information representation, personalisation execution, and system evaluation. Moreover, the survey provides an analysis of PIR systems with respect to the scope of personalisation addressed. The survey proposes a classification of PIR systems into three scopes: individualised systems, community-based systems, and aggregate-level systems. Based on the conducted survey, the paper concludes by highlighting challenges and future research directions in the field of PIR.  相似文献   

6.
H. D.  Xiaopeng  Xiaowei  Liming  Xueling 《Pattern recognition》2003,36(12):2967-2991
Breast cancer continues to be a significant public health problem in the world. Approximately, 182,000 new cases of breast cancer are diagnosed and 46,000 women die of breast cancer each year in the United States. Even more disturbing is the fact that one out of eight women in US will develop breast cancer at some point during her lifetime. Primary prevention seems impossible since the causes of this disease still remain unknown. Early detection is the key to improving breast cancer prognosis. Mammography is one of the reliable methods for early detection of breast carcinomas. There are some limitations of human observers, and it is difficult for radiologists to provide both accurate and uniform evaluation for the enormous number of mammograms generated in widespread screening. The presence of microcalcification clusters (MCCs) is an important sign for the detection of early breast carcinoma. An early sign of 30–50% of breast cancer detected mammographically is the appearance of clusters of fine, granular microcalcification, and 60–80% of breast carcinomas reveal MCCs upon histological examinations. The high correlation between the appearance of the microcalcification clusters and the diseases show that the CAD (computer aided diagnosis) systems for automated detection/classification of MCCs will be very useful and helpful for breast cancer control. In this survey paper, we summarize and compare the methods used in various stages of the computer-aided detection systems (CAD). In particular, the enhancement and segmentation algorithms, mammographic features, classifiers and their performances are studied and compared. Remaining challenges and future research directions are also discussed.  相似文献   

7.
Traffic classification groups similar or related traffic data, which is one main stream technique of data fusion in the field of network management and security. With the rapid growth of network users and the emergence of new networking services, network traffic classification has attracted increasing attention. Many new traffic classification techniques have been developed and widely applied. However, the existing literature lacks a thorough survey to summarize, compare and analyze the recent advances of network traffic classification in order to deliver a holistic perspective. This paper carefully reviews existing network traffic classification methods from a new and comprehensive perspective by classifying them into five categories based on representative classification features, i.e., statistics-based classification, correlation-based classification, behavior-based classification, payload-based classification, and port-based classification. A series of criteria are proposed for the purpose of evaluating the performance of existing traffic classification methods. For each specified category, we analyze and discuss the details, advantages and disadvantages of its existing methods, and also present the traffic features commonly used. Summaries of investigation are offered for providing a holistic and specialized view on the state-of-art. For convenience, we also cover a discussion on the mostly used datasets and the traffic features adopted for traffic classification in the review. At the end, we identify a list of open issues and future directions in this research field.  相似文献   

8.
Formalization is becoming more common in all stages of the development of information systems, as a better understanding of its benefits emerges. Classification systems are ubiquitous, no more so than in domain modeling. The classification pattern that underlies these systems provides a good case study of the move toward formalization in part because it illustrates some of the barriers to formalization, including the formal complexity of the pattern and the ontological issues surrounding the “one and the many.” Powersets are a way of characterizing the (complex) formal structure of the classification pattern, and their formalization has been extensively studied in mathematics since Cantor’s work in the late nineteenth century. One can use this formalization to develop a useful benchmark. There are various communities within information systems engineering (ISE) that are gradually working toward a formalization of the classification pattern. However, for most of these communities, this work is incomplete, in that they have not yet arrived at a solution with the expressiveness of the powerset benchmark. This contrasts with the early smooth adoption of powerset by other information systems communities to, for example, formalize relations. One way of understanding the varying rates of adoption is recognizing that the different communities have different historical baggage. Many conceptual modeling communities emerged from work done on database design, and this creates hurdles to the adoption of the high level of expressiveness of powersets. Another relevant factor is that these communities also often feel, particularly in the case of domain modeling, a responsibility to explain the semantics of whatever formal structures they adopt. This paper aims to make sense of the formalization of the classification pattern in ISE and surveys its history through the literature, starting from the relevant theoretical works of the mathematical literature and gradually shifting focus to the ISE literature. The literature survey follows the evolution of ISE’s understanding of how to formalize the classification pattern. The various proposals are assessed using the classical example of classification; the Linnaean taxonomy formalized using powersets as a benchmark for formal expressiveness. The broad conclusion of the survey is that (1) the ISE community is currently in the early stages of the process of understanding how to formalize the classification pattern, particularly in the requirements for expressiveness exemplified by powersets, and (2) that there is an opportunity to intervene and speed up the process of adoption by clarifying this expressiveness. Given the central place that the classification pattern has in domain modeling, this intervention has the potential to lead to significant improvements.  相似文献   

9.
Data mining with incomplete survey data is an immature subject area. Mining a database with incomplete data, the patterns of missing data as well as the potential implication of these missing data constitute valuable knowledge. This paper presents the conceptual foundations of data mining with incomplete data through classification which is relevant to a specific decision making problem. The proposed technique generally supposes that incomplete data and complete data may come from different sub-populations. The major objective of the proposed technique is to detect the interesting patterns of data missing behavior that are relevant to a specific decision making, instead of estimation of individual missing value. Using this technique, a set of complete data is used to acquire a near-optimal classifier. This classifier provides the prediction reference information for analyzing the incomplete data. The data missing behavior concealed in the missing data is then revealed. Using a real-world survey data set, the paper demonstrates the usefulness of this technique.  相似文献   

10.
This paper attempts to lay bare the underlying ideas used in various pattern classification algorithms reported in the literature. It is shown that these algorithms can be classified according to the type of input information required and that the techniques of estimation, decision, and optimization theory can be used effectively to derive known as well as new results.  相似文献   

11.
A review and categorization of electric load forecasting techniques is presented. A wide range of methodologies and models for forecasting are given in the literature. These techniques are classified here into nine categories: (1) multiple regression, (2) exponential smoothing, (3) iterative reweighted least-squares, (4) adaptive load forecasting, (5) stochastic time series, (6) ARMAX models based on genetic algorithms, (7) fuzzy logic, (8) neural networks and (9) expert systems. The methodology for each category is briefly described, the advantages and disadvantages discussed, and the pertinent literature reviewed. Conclusions and comments are made on future research directions.  相似文献   

12.
The application of the CD3 decision tree induction algorithm to telecommunications customer call data to obtain classification rules is described. CD3 is robust against drift in the underlying rules over time (concept drift): it both detects drift and protects the induction process from its effects. Specifically, the task is to data mine customer details and call records to determine whether the profile of customers registering for a friends and family service is changing over time and to maintain a rule set profiling such customers. CD3 and the rationale behind it are described and experimental results on customer data are presented.  相似文献   

13.

The process of separation of brain tumor from normal brain tissues is Brain tumor segmentation. Segmentation of tumor from the MR images is a very challenging task as brain tumors are of different shapes and sizes. There are multiple phases to achieve the segmentation and the phases are pre-processing, segmentation, feature extraction, feature reduction, and classification of the tumor into benign and malignant. In this paper, Otsu thresholding is used in segmentation phase, Discrete Wavelet Transform (DWT) in feature extraction phase, Principal Component Analysis (PCA) in feature reduction phase and Support Vector Machine (SVM), Least Squared-Support Vector Machine (LS-SVM), Proximal Support Vector Machine (PSVM) and Twin Support Vector Machine (TWSVM) in the classification phase. We have compared the performances of all these classifiers, where TWSVM outperformed all other classifiers with 100% accuracy.

  相似文献   

14.
Computational-mechanism design has an important role to play in developing complex distributed systems comprising multiple interacting agents. Game theory has developed powerful tools for analyzing, predicting, and controlling the behavior of self-interested agents and decision making in systems with multiple autonomous actors. These tools, when tailored to computational settings, provide a foundation for building multiagent software systems. This tailoring gives rise to the field of computational-mechanism design, which applies economic principles to computer systems design.  相似文献   

15.
目标跟踪与分类是现代跟踪系统的基本功能,不同的数据特性使得传统研究经常忽视二者之间的联系而将其分开处理。联合跟踪与分类研究则充分利用二者之间的耦合关系,使之互为补充,达到提高各自精度的目的。在分析联合跟踪与分类算法基本原理的基础上,按机理将其分为基于贝叶斯推理、D-S理论和贝叶斯风险框架下的联合跟踪与分类算法,并对三种框架下的算法进行了综述与性能比较。指出了联合跟踪与分类算法研究存在的问题及进一步研究的方向。  相似文献   

16.
不平衡分类问题研究综述   总被引:20,自引:0,他引:20  
实际的分类问题往往都是不平衡分类问题,采用传统的分类方法,难以得到满意的分类效果。为此,十多年来,人们相继提出了各种解决方案。对国内外不平衡分类问题的研究做了比较详细地综述,讨论了数据不平衡性引发的问题,介绍了目前几种主要的解决方案。通过仿真实验,比较了具有代表性的重采样法、代价敏感学习、训练集划分以及分类器集成在3个实际的不平衡数据集上的分类性能,发现训练集划分和分类器集成方法能较好地处理不平衡数据集,给出了针对不平衡分类问题的分类器评测指标和将来的工作。  相似文献   

17.
Cole  J. 《Computer》2005,38(9):103-107
The present treatment of stored critical information, especially personal portable storage, provides inadequate protection. Through many means, the IEEE is examining the issue of security in storage and working to make technologies and standards available that advance the interests of society and protect its many critical functions.  相似文献   

18.
Breast cancer is the second leading cause of death for women all over the world. Since the cause of the disease remains unknown, early detection and diagnosis is the key for breast cancer control, and it can increase the success of treatment, save lives and reduce cost. Ultrasound imaging is one of the most frequently used diagnosis tools to detect and classify abnormalities of the breast. In order to eliminate the operator dependency and improve the diagnostic accuracy, computer-aided diagnosis (CAD) system is a valuable and beneficial means for breast cancer detection and classification. Generally, a CAD system consists of four stages: preprocessing, segmentation, feature extraction and selection, and classification. In this paper, the approaches used in these stages are summarized and their advantages and disadvantages are discussed. The performance evaluation of CAD system is investigated as well.  相似文献   

19.
20.
Knowledge and Information Systems - With increasingly more people using online services, purchasing products, and reviewing them, it becomes crucial to have a system that can provide a crisp...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号