AbstractData mining techniques have been successfully utilized in different applications of significant fields, including medical research. With the wealth of data available within the health-care systems, there is a lack of practical analysis tools to discover hidden relationships and trends in data. The complexity of medical data that is unfavorable for most models is a considerable challenge in prediction. The ability of a model to perform accurately and efficiently in disease diagnosis is extremely significant. Thus, the model must be selected to fit the data better, such that the learning from previous data is most efficient, and the diagnosis of the disease is highly accurate. This work is motivated by the limited number of regression analysis tools for multivariate counts in the literature. We propose two regression models for count data based on flexible distributions, namely, the multinomial Beta-Liouville and multinomial scaled Dirichlet, and evaluated the proposed models in the problem of disease diagnosis. The performance is evaluated based on the accuracy of the prediction which depends on the nature and complexity of the dataset. Our results show the efficiency of the two proposed regression models where the prediction performance of both models is competitive to other previously used regression models for count data and to the best results in the literature. 相似文献
In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization. 相似文献
Along with the exponential growth of online video creation platforms such as Tik Tok and Instagram, state of the art research involving quick and effective action/gesture recognition remains crucial. This work addresses the challenge of classifying short video clips, using a domain-specific feature design approach, capable of performing significantly well using as little as one training example per action. The method is based on Gunner Farneback’s dense optical flow (GF-OF) estimation strategy, Gaussian mixture models, and information divergence. We first aim to obtain accurate representations of the human movements/actions by clustering the results given by GF-OF using K-means method of vector quantization. We then proceed by representing the result of one instance of each action by a Gaussian mixture model. Furthermore, using Kullback-Leibler divergence (KL-divergence), we attempt to find similarities between the trained actions and the ones in the test videos. Classification is done by matching each test video to the trained action with the highest similarity (a.k.a lowest KL-divergence). We have performed experiments on the KTH and Weizmann Human Action datasets using One-Shot and K-Shot learning approaches, and the results reveal the discriminative nature of our proposed methodology in comparison with state-of-the-art techniques.
Multimedia Tools and Applications - This study presents an unsupervised novel algorithm for color image segmentation, object detection and tracking based on unsupervised learning step followed with... 相似文献
The Quranic Arabic Corpus (http://corpus.quran.com) is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation including part-of-speech tagging, morphological segmentation (Dukes and Habash 2010) and syntactic analysis using dependency grammar (Dukes and Buckwalter 2010). The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year-old central religious text of Islam. This project contrasts with other Arabic treebanks by providing a deep linguistic model based on the historical traditional grammar known as i′rāb (?????). By adapting this well-known canon of Quranic grammar into a familiar tagset, it is possible to encourage online annotation by Arabic linguists and Quranic experts. This article presents a new approach to linguistic annotation of an Arabic corpus: online supervised collaboration using a multi-stage approach. The different stages include automatic rule-based tagging, initial manual verification, and online supervised collaborative proofreading. A popular website attracting thousands of visitors per day, the Quranic Arabic Corpus has approximately 100 unpaid volunteer annotators each suggesting corrections to existing linguistic tagging. To ensure a high-quality resource, a small number of expert annotators are promoted to a supervisory role, allowing them to review or veto suggestions made by other collaborators. The Quran also benefits from a large body of existing historical grammatical analysis, which may be leveraged during this review. In this paper we evaluate and report on the effectiveness of the chosen annotation methodology. We also discuss the unique challenges of annotating Quranic Arabic online and describe the custom linguistic software used to aid collaborative annotation. 相似文献
Finite mixture models have been applied for different computer vision, image processing and pattern recognition tasks. The majority of the work done concerning finite mixture models has focused on mixtures for continuous data. However, many applications involve and generate discrete data for which discrete mixtures are better suited. In this paper, we investigate the problem of discrete data modeling using finite mixture models. We propose a novel, well motivated mixture that we call the multinomial generalized Dirichlet mixture. The novel model is compared with other discrete mixtures. We designed experiments involving spatial color image databases modeling and summarization, and text classification to show the robustness, flexibility and merits of our approach. 相似文献
In this work we present an unsupervised algorithm for learning finite mixture models from multivariate positive data. Indeed, this kind of data appears naturally in many applications, yet it has not been adequately addressed in the past. This mixture model is based on the inverted Dirichlet distribution, which offers a good representation and modeling of positive non-Gaussian data. The proposed approach for estimating the parameters of an inverted Dirichlet mixture is based on the maximum likelihood (ML) using Newton Raphson method. We also develop an approach, based on the minimum message length (MML) criterion, to select the optimal number of clusters to represent the data using such a mixture. Experimental results are presented using artificial histograms and real data sets. The challenging problem of software modules classification is investigated within the proposed statistical framework, also. 相似文献