首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对中文矢量字库体积较大,在嵌入式设备上使用不便的问题,提出了一种新的矢量中文字库自动压缩方法.基于部件拼接和复用的思想,首先使用一种传统图形学方法将字库中的字形拆分成不同部件,之后计算每个字形的部件复用关系,最后使用模拟退火算法迭代优化拼接字形,生成压缩字库.实验结果表明,该方法能够在维持原始字库风格和字形不变的条件...  相似文献   

2.
Based on a pattern recognition study of human chromosomes a model is proposed for selection of attributes for automatic pattern recognition. The basic concept is a stepwise data compression, each step being monitored for loss of significant information by visual classification of the compressed pattern. The attributes resulting from a number of data compression steps will retain the information allowing the patterns to be classified automatically.  相似文献   

3.
Incremental learning techniques have been used extensively to address the data stream classification problem. The most important issue is to maintain a balance between accuracy and efficiency, i.e., the algorithm should provide good classification performance with a reasonable time response. This work introduces a new technique, named Similarity-based Data Stream Classifier (SimC), which achieves good performance by introducing a novel insertion/removal policy that adapts quickly to the data tendency and maintains a representative, small set of examples and estimators that guarantees good classification rates. The methodology is also able to detect novel classes/labels, during the running phase, and to remove useless ones that do not add any value to the classification process. Statistical tests were used to evaluate the model performance, from two points of view: efficacy (classification rate) and efficiency (online response time). Five well-known techniques and sixteen data streams were compared, using the Friedman’s test. Also, to find out which schemes were significantly different, the Nemenyi’s, Holm’s and Shaffer’s tests were considered. The results show that SimC is very competitive in terms of (absolute and streaming) accuracy, and classification/updating time, in comparison to several of the most popular methods in the literature.  相似文献   

4.
One of the simplest, and yet most consistently well-performing set of classifiers is the naïve Bayes models (a special class of Bayesian network models). However, these models rely on the (naïve) assumption that all the attributes used to describe an instance are conditionally independent given the class of that instance. To relax this independence assumption, we have in previous work proposed a family of models, called latent classification models (LCMs). LCMs are defined for continuous domains and generalize the naïve Bayes model by using latent variables to model class-conditional dependencies between the attributes. In addition to providing good classification accuracy, the LCM has several appealing properties, including a relatively small parameter space making it less susceptible to over-fitting. In this paper we take a first step towards generalizing LCMs to hybrid domains, by proposing an LCM for domains with binary attributes. We present algorithms for learning the proposed model, and we describe a variational approximation-based inference procedure. Finally, we empirically compare the accuracy of the proposed model to the accuracy of other classifiers for a number of different domains, including the problem of recognizing symbols in black and white images.  相似文献   

5.
李添正  王春桃 《计算机应用》2020,40(5):1354-1363
尽管当前已有众多二值图像的压缩方法,但这些方法并不能直接应用于加密二值图像的压缩。在云计算、分布式处理等场景下,如何高效地对加密二值图像进行有损压缩仍然是一个挑战,而当前鲜有这方面的研究。针对此问题,提出了一种基于马尔可夫随机场(MRF)的加密二值图像有损压缩算法。该算法用MRF表征二值图像的空域统计特性,进而借助MRF及解压缩还原的像素推断加密二值图像压缩过程中被丢弃的像素。所提算法的发送方采用流密码对二值图像进行加密,云端先后利用分块均匀但块内随机的下抽样方式及低密度奇偶校验(LDPC)编码对加密二值图像进行压缩,接收方则通过构造包含解码、解密及MRF重构的联合因子图实现二值图像的有损重构。实验结果表明,所提算法获得了较好的压缩效率,在0.2~0.4 bpp压缩率时有损重构图像的比特误差率(BER)不超过5%;而与针对未加密原始二值图像的国际压缩标准JBIG2的压缩效率相比,所提算法的压缩效率与其相当。这些充分表明了所提算法的可行性与有效性。  相似文献   

6.
大数据时代背景下,列存储数据库使用场景愈加增多,推动了列存储相关领域的研究进展。为解决现有列存储数据库压缩策略在压缩过程中遇到的数据离散程度大,分类粒度小,配套分类算法缺陷导致的学习成本高,压缩效率难以保证的问题,本文提出了一种基于排序的列区混合压缩策略,首先根据HBase特点设计了一种对各列数据进行排序的方法加强数据紧密度,然后根据数据特点分别使用混级区压缩策略和混级列压缩策略进行压缩策略推荐,在TPC-DS标准数据集上与前人策略进行比较,实验结果显示本文方法在压缩率、压缩/解压时间方面均有优异的表现,从而证明了本文方法的有效性。  相似文献   

7.
This paper presents a direct command generation technique for digital motion control systems. In this paradigm, higher-order differences of a given trajectory (i.e. position) are calculated and the resulting sequence is compacted via data compression techniques. The overall method is capable of generating trajectory data at variable rates in forward- and reverse-directions with the utilization of a linear interpolator. As a part of the command generation scheme, the paper also proposes a new data compression technique titled ΔY10. Apart from this new method, the performances of the proposed generator employing different compression algorithms (such as Huffman coding, Arithmetic coding, LZW, and run length encoding) are also evaluated through three test cases. The paper illustrates that the ΔY10 technique, which is suitable for real-time hardware implementation, exhibits satisfactory performance in terms of data compaction achieved in the test cases considered.  相似文献   

8.
Protein has a complicated spatial structure, and has chemical and physical functions which originate from this structure. It is important to predict the structure and function of proteins from a DNA sequence or amino acid sequence from the viewpoint of biology, medical science, protein engineering, etc. However, to data there is no way to predict them accurately from these sequences. Instead, some approaches attempt to estimate the functions based on an approximate similarity in the retrieval of sequences. We propose a new method for the similarity retrieval of an amino acid sequence based on the concept of homology retrieval using data compression. The introduction of compression by a dictionary technique enables us to describe the text data as ann-dimensional vector usingn dictionaries, which is generated by compressingn typical texts, and enables us to classify the proteins based on their similarity. We examined the effectiveness of our proposal using real genome data. This work was presented in part at the Sixth International Symposium on Artificial Life and Robotics, Tokyo, January 15–17, 2001  相似文献   

9.
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods. Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques. S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems. Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security. Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.  相似文献   

10.
11.
Data classification is usually based on measurements recorded at the same time. This paper considers temporal data classification where the input is a temporal database that describes measurements over a period of time in history while the predicted class is expected to occur in the future. We describe a new temporal classification method that improves the accuracy of standard classification methods. The benefits of the method are tested on weather forecasting using the meteorological database from the Texas Commission on Environmental Quality and on influenza using the Google Flu Trends database.  相似文献   

12.
Practical data compression in wireless sensor networks: A survey   总被引:1,自引:0,他引:1  
Power consumption is a critical problem affecting the lifetime of wireless sensor networks. A number of techniques have been proposed to solve this issue, such as energy-efficient medium access control or routing protocols. Among those proposed techniques, the data compression scheme is one that can be used to reduce transmitted data over wireless channels. This technique leads to a reduction in the required inter-node communication, which is the main power consumer in wireless sensor networks. In this article, a comprehensive review of existing data compression approaches in wireless sensor networks is provided. First, suitable sets of criteria are defined to classify existing techniques as well as to determine what practical data compression in wireless sensor networks should be. Next, the details of each classified compression category are described. Finally, their performance, open issues, limitations and suitable applications are analyzed and compared based on the criteria of practical data compression in wireless sensor networks.  相似文献   

13.
Identification of relevant genes from microarray data is an apparent need in many applications. For such identification different ranking techniques with different evaluation criterion are used, which usually assign different ranks to the same gene. As a result, different techniques identify different gene subsets, which may not be the set of significant genes. To overcome such problems, in this study pipelining the ranking techniques is suggested. In each stage of pipeline, few of the lower ranked features are eliminated and at the end a relatively good subset of feature is preserved. However, the order in which the ranking techniques are used in the pipeline is important to ensure that the significant genes are preserved in the final subset. For this experimental study, twenty four unique pipeline models are generated out of four gene ranking strategies. These pipelines are tested with seven different microarray databases to find the suitable pipeline for such task. Further the gene subset obtained is tested with four classifiers and four performance metrics are evaluated. No single pipeline dominates other pipelines in performance; therefore a grading system is applied to the results of these pipelines to find out a consistent model. The finding of grading system that a pipeline model is significant is also established by Nemenyi post-hoc hypothetical test. Performance of this pipeline model is compared with four ranking techniques, though its performance is not superior always but majority of time it yields better results and can be suggested as a consistent model. However it requires more computational time in comparison to single ranking techniques.  相似文献   

14.
In this paper, a multi-agent classifier system with Q-learning is proposed for tackling data classification problems. A trust measurement using a combination of Q-learning and Bayesian formalism is formulated. Specifically, a number of learning agents comprising hybrid neural networks with Q-learning, which we have formulated in our previous work, are devised to form the proposed Q-learning Multi-Agent Classifier System (QMACS). The time complexity of QMACS is analyzed using the big O-notation method. In addition, a number of benchmark problems are employed to evaluate the effectiveness of QMACS, which include small and large data sets with and without noise. To analyze the QMACS performance statistically, the bootstrap method with 95% confidence interval is used. The results from QMACS are compared with those from its constituents and other models reported in the literature. The outcome indicates the effectiveness of QMACS in combining the predictions from its learning agents to improve the overall classification performance.  相似文献   

15.
提出一种基于DPCM与Hilbert曲线的医疗图像无损压缩方法,通过差分脉码调制技术(DPCM)对图像进行预测处理,得到差值图像,再利用Hilbert曲线对医疗图像像素的进行扫描,得到图像的一维数据,然后分别用哈夫曼编码、游程编码和字典编码对一维数据进行压缩。实验结果显示Hilbert扫描可以增加像素的相关性,对提高压缩比有一定的贡献。  相似文献   

16.
This paper evaluates the predictive power of innovative and more conventional statistical classification techniques. We use Landsat 7 Enhanced Thematic Mapper Plus (ETM+), Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and airborne imaging spectrometer (HyMap) images to classify Mediterranean vegetation types, with and without inclusion of ancillary data (geology, soil classes and digital elevation model derivatives). When the number of classes is low, both conventional and innovative techniques perform well. For larger numbers of classes the innovative techniques of random forests and support vector machines outperform the other techniques. Compared to conventional techniques, classification trees, random forests and support vector machines proved to be better suited for the incorporation of continuous and categorical ancillary data: overall accuracies and accuracies for individual classes improve significantly when many, difficult to separate, classes are taken into account. Therefore, these techniques are definitely worth including in common image analysis software packages.  相似文献   

17.
This paper proposes a high capacity data hiding scheme for binary images based on block patterns, which can facilitate the authentication and annotation of scanned images. The scheme proposes block patterns for a 2 × 2 block to enforce specific block-based relationship in order to embed a significant amount of data without causing noticeable artifacts. In addition, two kinds of matching pair (MP) methods, internal adjustment MP and external adjustment MP, are designed to decrease the embedding changes. Shuffling is applied before embedding to reduce the distortion and improve the security. Experimental results show that the proposed scheme gives a significantly improved embedding capacity than previous approaches in the same level of embedding distortion. We also analyze the perceptual impact and discuss the robustness and security issues.  相似文献   

18.
Phil Vines  Justin Zobel 《Software》1998,28(12):1299-1314
With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including large-alphabet languages such as Chinese. It is thus attractive to compress text written in the large alphabet languages, but the general-purpose compression utilities are not particularly effective for this application. In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM). We propose several refinements to PPM to make it more effective for Chinese text, and, on our publicly-available test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly improve compression performance while using only a limited volume of memory. © 1998 John Wiley & Sons, Ltd.  相似文献   

19.
Encryption techniques ensure security of data during transmission. However, in most cases, this increases the length of the data, thus it increases the cost. When it is desired to transmit data over an insecure and bandwidth-constrained channel, it is customary to compress the data first and then encrypt it. In this paper, a novel algorithm, the new compression with encryption and compression (CEC), is proposed to secure and compress the data. This algorithm compresses the data to reduce its length. The compressed data is encrypted and then further compressed using a new encryption algorithm without compromising the compression efficiency and the information security. This CEC algorithm provides a higher compression ratio and enhanced data security. The CEC provides more confidentiality and authentication between two communication systems.  相似文献   

20.
All the various data hiding methods can be simply divided into two types: (1) the extracted important data are lossy, (2) the extracted important data are lossless. The proposed method belongs to the second type. In this paper, a module-based substitution method with lossless secret data compression function is used for concealing smoother area of secret image by modifying fewer pixels in the generated stego-image. Compared with the previous data hiding methods that extract lossless data, the generated stego-image by the proposed method is always with better quality, unless the hidden image is with very strong randomness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号