首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Much of the work on statistical machine translation (SMT) from morphologically rich languages has shown that morphological tokenization and orthographic normalization help improve SMT quality because of the sparsity reduction they contribute. In this article, we study the effect of these processes on SMT when translating into a morphologically rich language, namely Arabic. We explore a space of tokenization schemes and normalization options. We also examine a set of six detokenization techniques and evaluate on detokenized and orthographically correct (enriched) output. Our results show that the best performing tokenization scheme is that of the Penn Arabic Treebank. Additionally, training on orthographically normalized (reduced) text then jointly enriching and detokenizing the output outperforms training on enriched text.  相似文献   

2.
Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb–Subject–Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic–English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.  相似文献   

3.
We discuss the possibility of using multiple shift–invert Lanczos and contour integral based spectral projection method to compute a relatively large number of eigenvalues of a large sparse and symmetric matrix on distributed memory parallel computers. The key to achieving high parallel efficiency in this type of computation is to divide the spectrum into several intervals in a way that leads to optimal use of computational resources. We discuss strategies for dividing the spectrum. Our strategies make use of an eigenvalue distribution profile that can be estimated through inertial counts and cubic spline fitting. Parallel sparse direct methods are used in both approaches. We use a simple cost model that describes the cost of computing k eigenvalues within a single interval in terms of the asymptotic cost of sparse matrix factorization and triangular substitutions. Several computational experiments are performed to demonstrate the effect of different spectrum division strategies on the overall performance of both multiple shift–invert Lanczos and the contour integral based method. We also show the parallel scalability of both approaches in the strong and weak scaling sense. In addition, we compare the performance of multiple shift–invert Lanczos and the contour integral based spectral projection method on a set of problems from density functional theory (DFT).  相似文献   

4.
ABSTRACT

Anomaly detection (AD) is one of the most attracting topics within the recent 10 years in hyperspectral imagery (HSI). The goal of the AD is to label the pixels with significant spectral or spatial differences to their neighbours, as targets. In this paper, we propose a method that uses both spectral and spatial information of HSI based on human visual system (HVS). By inspiring the retina and the visual cortex functionality, the multiscale multiresolution analysis is applied to some principal components of hyperspectral data, to extract features from different spatial levels of the image. Then the global and local relations between features are considered based on inspiring the visual attention mechanism and inferotemporal (IT) part of the visual cortex. The effects of the attention mechanism are implemented using the logarithmic function which well highlights, small variations in pixels’ grey levels in global features. Also, the maximum operation is used over the local features for imitating the function of IT. Finally, the information theory concept is used for generating the final detection map by weighting the global and local detection maps to obtain the final anomaly map. The result of the proposed method is compared with some state-of-the-art methods such as SSRAD, FLD, PCA, RX, KPCA, and AED for two well-known real hyperspectral data which are San Diego airport and Pavia city, and a synthetic hyperspectral data. The results demonstrate that the proposed method effectively improves the AD capabilities, such as enhancement of the detection rate, reducing the false alarm rate and the computation complexity.  相似文献   

5.
A computer program based on a molecular dynamics–continuum hybrid method has been developed in which the Navier–Stokes equations are solved in the continuum region and the molecular dynamics in the atomistic region. The coupling between the atomistic and continuum is constructed through constrained dynamics within an overlap region where both molecular and continuum equations are solved simultaneously. The simulation geometries are solved in three dimensions and an overlap region is introduced in two directions to improve the choice of using the molecular region in smaller areas. The proposed method is used to simulate steady and start-up Couette flow showing quantitative agreement with results from analytical solutions and full molecular dynamics simulations. The prepared algorithm and the computer code are capable of modeling fluid flows in micro and nano-scale geometries.  相似文献   

6.
Identifying learners’ behaviors and learning preferences or styles in a Web-based learning environment is crucial for organizing the tracking and specifying how and when assistance is needed. Moreover, it helps online course designers to adapt the learning material in a way that guarantees individualized learning, and helps learners to acquire meta-cognitive knowledge. The goal of this research is to identify learners’ behaviors and learning styles automatically during training sessions, based on trace analysis. In this paper, we focus on the identification of learners’ behaviors through our system: Indicators for the Deduction of Learning Styles. We shall first present our trace analysis approach. Then, we shall propose a ‘navigation type’ indicator to analyze learners’ behaviors and we shall define a method for calculating it. To this end, we shall build a decision tree based on semantic assumptions and tests. To validate our approach, and improve the proposed calculation method, we shall present and discuss the results of two experiments that we conducted.  相似文献   

7.
Gear is one of the popular and important components in the rotary machinery transmission. Vibration monitoring is the common way to take gear feature extraction and fault diagnosis. The gear vibration signal collected in the running time often reflects the characteristics such as non-Gaussian and nonlinear, which is difficult in time domain or frequency domain analysis. This paper proposed a novel gear fault feature extraction method based on hybrid time–frequency analysis. This method combined the Mexican hat wavelet filter de-noise method and the auto term window method at the first time. This method can not only de-noise noise jamming in raw vibration signal, but also extract gear fault features effectively. The final experimental analysis proved the feasibility and the availability of this new method.  相似文献   

8.
Ecological restoration measures have been undertaken in loess hilly and gully regions since the 1970s to prevent soil loss and to improve the ecological environment in those regions. Orchard construction was the main ecological measure undertaken in the Luo-Yu-Gou watershed, and in this article we propose a coupled maximum a posteriori decision rule and Markov random field (MAP-MRF) framework for orchard identification based on landform and landscape factors. Support vector machine (SVM) classification was first performed to obtain initial classification results for the years 2003 and 2008. A series of factors including landform factor, landscape factor, and the spatial–temporal neighbourhood factor are used to obtain land-cover change information including the change in orchard class. Finally, field experiments were carried out in the case study region of the Luo-Yu-Gou watershed, and based on the experimental results, it was found that the quantity error and the allocation error of the classification results for 2008 were 0.0441 and 0.1037, respectively.  相似文献   

9.
This paper proposes a novel approach for identification of Takagi–Sugeno (T–S) fuzzy model, which is based on a new fuzzy c-regression model (FCRM) clustering algorithm. The clustering prototype in fuzzy space partition is hyper-plane, so FCRM clustering technique is more suitable to be applied in premise parameters identification of T–S fuzzy model. A new FCRM clustering algorithm (NFCRMA) is presented, which is deduced from the fuzzy clustering objective function of FCRM with Lagrange multiplier rule, possessing integrative and concise structure. The proposed approach consists mainly of two steps: premise parameter identification and consequent parameter identification. The NFCRMA is utilized to partition the input–output data and identify the premise parameters, which can discover the real structure of the training data; on the other hand, orthogonal least square is exploited to identify the consequent parameters. Finally, some examples are given to verify the validity of the proposed modeling approach, and the results show the new approach is very efficient and of high accuracy.  相似文献   

10.
Pre-processing is one of the vital steps for developing robust and efficient recognition system. Better pre-processing not only aid in better data selection but also in significant reduction of computational complexity. Further an efficient frame selection technique can improve the overall performance of the system. Pre-quantization (PQ) is the technique of selecting less number of frames in the pre-processing stage to reduce the computational burden in the post processing stages of speaker identification (SI). In this paper, we develop PQ techniques based on spectral entropy and spectral shape to pick suitable frames containing speaker specific information that varies from frame to frame depending on spoken text and environmental conditions. The attempt is to exploit the statistical properties of distributions of speech frames at the pre-processing stage of speaker recognition. Our aim is not only to reduce the frame rate but also to maintain identification accuracy reasonably high. Further we have also analyzed the robustness of our proposed techniques on noisy utterances. To establish the efficacy of our proposed methods, we used two different databases, POLYCOST (telephone speech) and YOHO (microphone speech).  相似文献   

11.
In this paper, by analyzing the worm’s propagation model, we propose a new worm warning system based on the method of system identification, and use recursive least squares algorithm to estimate the worm’s infection rate. The simulation result shows the method we adopted is an efficient way to conduct Internet worm warning.  相似文献   

12.
This paper provides a novel and efficient method for extracting exact textual answers from the returned documents that are retrieved by traditional IR system in large-scale collection of texts. The main intended contribution of this paper is to propose System Similarity Model (SSM), which can be considered as an extension of vector space model (VSM) to rank passages, It presents a method of formalized answer extraction based on pattern learning and applies binary logistic regression model (LRM), which seldom be used in IE to extract special information from candidate data sets. The parameters estimated for the data gathers with serious problem of data sparse, therefore we take stratified sampling method, and improve traditional logistic regression model parameters estimated methods. The series of experimental results show that the overall performance of our system is good and our approach is effective. Our system, lnsun05QAl, which participated in QA track of TREC 2005 obtained excellent results.  相似文献   

13.
This paper implemented an artificial neural network (ANN) on a field programmable gate array (FPGA) chip for Mandarin speech measurement and recognition of nonspecific speaker. A three-layer hybrid learning algorithm (HLA), which combines genetic algorithm (GA) and steepest descent method, was proposed to fulfill a faster global search of optimal weights in ANN. Some other popular evolutionary algorithms, such as differential evolution, particle swarm optimization and improve GA, were compared to the proposed HLA. It can be seen that the proposed HLA algorithm outperforms the other algorithms. Finally, the designed system was implemented on an FPGA chip with an SOC architecture to measure and recognize the speech signals.  相似文献   

14.
Incomplete data are often encountered in data sets used in clustering problems, and inappropriate treatment of incomplete data can significantly degrade the clustering performance. In view of the uncertainty of missing attributes, we put forward an interval representation of missing attributes based on nearest-neighbor information, named nearest-neighbor interval, and a hybrid approach utilizing genetic algorithm and fuzzy c-means is presented for incomplete data clustering. The overall algorithm is within the genetic algorithm framework, which searches for appropriate imputations of missing attributes in corresponding nearest-neighbor intervals to recover the incomplete data set, and hybridizes fuzzy c-means to perform clustering analysis and provide fitness metric for genetic optimization simultaneously. Several experimental results on a set of real-life data sets are presented to demonstrate the better clustering performance of our hybrid approach over the compared methods.  相似文献   

15.
Remote sensing image fusion is considered a cost effective method for handling the tradeoff between the spatial, temporal and spectral resolutions of current satellite systems. However, most current fusion methods concentrate on fusing images in two domains among the spatial, temporal and spectral domains, and a few efforts have been made to comprehensively explore the relationships of spatio-temporal–spectral features. In this study, we propose a novel integrated spatio-temporal–spectral fusion framework based on semicoupled sparse tensor factorization to generate synthesized frequent high-spectral and high-spatial resolution images by blending multisource observations. Specifically, the proposed method regards the desired high spatio-temporal–spectral resolution images as a four-dimensional tensor and formulates the integrated fusion problem as the estimation of the core tensor and the dictionary along each mode. The high-spectral correlation across the spectral domain and the high self-similarity (redundancy) features in the spatial and temporal domains are jointly exploited using the low dimensional and sparse core tensors. In addition, assuming that the sparse coefficients in the core tensors across the observed and desired image spaces are not strictly the same, we formulate the estimation of the core tensor and the dictionaries as a semicoupled sparse tensor factorization of available heterogeneous spatial, spectral and temporal remote sensing observations. Finally, the proposed method can exploit the multicomplementary spatial, temporal and spectral information of any combination of remote sensing data based on this single unified model. Experiments on multiple data types, including spatio-spectral, spatio-temporal, and spatio-temporal–spectral data fusion, demonstrate the effectiveness and efficiency of the proposed method.  相似文献   

16.
Wavelet based non-parametric additive NARX models are proposed for nonlinear input–output system identification. By expanding each functional component of the non-parametric NARX model into wavelet multiresolution expansions, the non-parametric estimation problem becomes a linear-in-the-parameters problem, and least-squares-based methods such as the orthogonal forward regression (OFR) approach, coupled with model size determination criteria, can be used to select the model terms and estimate the parameters. Wavelet based additive models, combined with model order determination and variable selection approaches, are capable of handling problems of high dimensionality.  相似文献   

17.
In this study, an improved image blind identification algorithm based on inconsistency in light source direction was proposed. And a new method defined as “neighborhood method” was presented, which was used to calculate surface normal matrix of image in the blind identification algorithm. For an image, there is an error function between its actual light intensity and calculated light intensity, and for different light source models, there are different constraint functions of light. Light source direction which makes both error function and constraint function get the minimum is the one we want to seek. On this basis, according to the error function and the corresponding constraint function, search means and the Hestenes–Powell multiplier method were used in the improved algorithm to calculate the light source direction for local and infinite light source images, respectively. Further, the authenticity of image can be determined by the inconsistency in light source direction of different areas in the image. Experimental results showed that the light source direction of different areas in an image could be calculated accurately, and then the image tampering can be detected effectively by the improved algorithm. Moreover, the performance of the improved algorithm of the proposed blind identification is superior to that of the existing one in terms of detection rate and time complexity.  相似文献   

18.
This paper uses an estimated noise transfer function to filter the input–output data and presents filtering based recursive least squares algorithms (F-RLS) for controlled autoregressive autoregressive moving average (CARARMA) systems. Through the data filtering, we obtain two identification models, one including the parameters of the system model, and the other including the parameters of the noise model. Thus, the recursive least squares method can be used to estimate the parameters of these two identification models, respectively, by replacing the unmeasurable variables in the information vectors with their estimates. The proposed F-RLS algorithm has a high computational efficiency because the dimensions of its covariance matrices become small and can generate more accurate parameter estimation compared with other existing algorithms.  相似文献   

19.
《Parallel Computing》2014,40(5-6):70-85
QR factorization is a computational kernel of scientific computing. How can the latest computer be used to accelerate this task? We investigate this topic by proposing a dense QR factorization algorithm with adaptive block sizes on a hybrid system that contains a central processing unit (CPU) and a graphic processing unit (GPU). To maximize the use of CPU and GPU, we develop an adaptive scheme that chooses block size at each iteration. The decision is based on statistical surrogate models of performance and an online monitor, which avoids unexpected occasional performance drops. We modify the highly optimized CPU–GPU based QR factorization in MAGMA to implement the proposed schemes. Numerical results suggest that our approaches are efficient and can lead to near-optimal block sizes. The proposed algorithm can be extended to other one-sided factorizations, such as LU and Cholesky factorizations.  相似文献   

20.
This paper presents an in-depth analytical and empirical assessment of the performance of DoubleBee, a novel hybrid aerial– ground robot. Particularly, the dynamic model of the robot with ground contact is analyzed, and the unknown parameters in the model are identified. We apply an unscented Kalman filter-based approach and a least square-based approach to estimate the parameters with given measurements and inputs at every time step. Real data are collected and used to estimate the parameters; test data verify that the values obtained are able to model the rotation of the robot accurately. A gain-scheduled feedback controller is proposed, which leverages the identified model to generate accurate control inputs to drive the system to the desired states. The system is proven to track a constant-velocity reference signal with bounded error. Simulations and real-world experiments using the proposed controller show improved performance than the PID-based controller in tracking step commands and maintaining attitude under robot movement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号