首页 | 本学科首页   官方微博 | 高级检索  
     

并行交叉的深度卷积神经网络模型
引用本文:汤鹏杰,王瀚漓,左凌轩.并行交叉的深度卷积神经网络模型[J].中国图象图形学报,2016,21(3):339-347.
作者姓名:汤鹏杰  王瀚漓  左凌轩
作者单位:同济大学计算机科学与技术系, 上海 201804;同济大学嵌入式系统与服务计算教育部重点实验室, 上海 200092;井冈山大学数理学院, 吉安 343009,同济大学计算机科学与技术系, 上海 201804;同济大学嵌入式系统与服务计算教育部重点实验室, 上海 200092,同济大学计算机科学与技术系, 上海 201804;同济大学嵌入式系统与服务计算教育部重点实验室, 上海 200092
基金项目:国家自然科学基金项目(61472281);上海市曙光计划(12SG23)
摘    要:目的 图像分类与识别是计算机视觉领域的经典问题,是图像检索、目标识别及视频分析理解等技术的基础。目前,基于深度卷积神经网络(CNN)的模型已经在该领域取得了重大突破,其效果远远超过了传统的基于手工特征的模型。但很多深度模型神经元和参数规模巨大,训练困难。为此根据深度CNN模型和人眼视觉原理,提出并设计了一种深度并行交叉CNN模型(PCCNN模型)。方法 该模型在Alex-Net基础上,通过两条深度CNN数据变换流,提取两组深度CNN特征;在模型顶端,经过两次混合交叉,得到1024维的图像特征向量,最后使用Softmax回归对图像进行分类识别。结果 与同类模型相比,该模型所提取的特征更具判别力,具有更好的分类识别性能;在Caltech101上top1识别精度达到63%左右,比VGG16高出近5%,比GoogLeNet高出近10%;在Caltech256上top1识别精度达到46%以上,比VGG16高出近5%,比GoogLeNet高出2.6%。结论 PCCNN模型用于图像分类与识别效果显著,在中等规模的数据集上具有比同类其他模型更好的性能,在大规模数据集上其性能有待于进一步验证;该模型也为其他深度CNN模型的设计提供了一种新的思路,即在控制深度的同时,提取更多的特征信息,提高深度模型性能。

关 键 词:图像分类  识别  深度CNN  Alex-Net  并行交叉国  人眼视觉
收稿时间:2015/7/28 0:00:00
修稿时间:2015/8/28 0:00:00

Parallel cross deep convolution neural networks model
Tang Pengjie,Wang Hanli and Zuo Lingxuan.Parallel cross deep convolution neural networks model[J].Journal of Image and Graphics,2016,21(3):339-347.
Authors:Tang Pengjie  Wang Hanli and Zuo Lingxuan
Affiliation:Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China;College of Mathematical and Physical Science, Jinggangshan University, Ji'an 343009, China,Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China and Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China
Abstract:Objective The classification and recognition of images play an important role in a number of applications, such as image retrieval, object detection, and video content analysis. Nowadays, a major breakthrough has been obtained based on deep convolution neural network (CNN) model, which has surpassed state-of-the-art methods for image classification and recognition, because the features extracted by CNN models are more discriminative and contain more semantic information than the traditional approaches. However, such CNN models as Alex-Net and ZFCNN-Net are extremely simple and incapable of extracting more information for representing images, while other models such as VGG16/VGG19 and GoogLeNet always have a huge number of neurons and parameters. Method In this work, a novel model named deep parallel cross CNN (PCCNN) is proposed, which can extract more effective information from images and has less neurons and parameters than other models. Inspired by the mechanism of human vision, which has two visual pathways and optic chiasma, the proposed PCCNN is designed based on the Alex-Net, which extracts two groups of CNN features in parallel through a couple of deep CNN data transform flows. Moreover, after the first fully connected layers in each stream, the information of two streams are fused together. The fused information is forwarded to the next two fully connected layers, and then the output information is fused again for more power representation features. Finally, for image classification, the Softmax regression function is employed with a 1024D image feature vector from the fusion of the two feature groups. Note that Alex-Net is used as the base model because of its simple architecture and its need to use fewer neurons. In the PCCNN model, the first stream is the original Alex-Net, and in the second stream, 6 instead of 4 is used as the stride in the first convolutional layer. The larger stride in convolutional layer has worse performance if only a single stream is used because of the greater number of missing information. However, when the two streams are combined, the proposed model has better performance than all the other models. In addition, because a larger stride is used in the second stream, the feature maps are smaller, and the number of neurons and parameters are not greatly increased. Result Some popular public datasets, namely Caltech101, Caltech256, and Scene15, are selected to evaluate the performance of our model. At the same time, some state-of-the-art models are implemented with the same settings for comparison. Experimental results demonstrate that the proposed PCCNN model achieves better performance in terms of image classification than these models, indicating that the features extracted with the PCCNN model are more discriminative and have stronger presentation ability. On the Caltech101 dataset, the accuracy reaches approximately 63% on top1 with PCCNN model, exceeding that of the VGG16 model by about 5% and that of the GoogLeNet model by about 10%. Moreover, in terms of the Caltech256 dataset, our model also has better performance than the other models with accuracy of 46.4% on top1, surpassing those of the VGG16 and GoogLeNet models by 5% and 2.6%, respectively. However, our model has worse performance on Scene15 dataset than GoogLeNet, but still has higher accuracy than when only a single Alex-Net is used. Conclusion The proposed PCCNN model has better performance than several state-of-the-art CNN models in terms of image classification and recognition, particularly on the medium-scale datasets, but on the small-scale dataset, the proposed model does not exhibit better performance. Hence, the model should be further tested on large-scale vision tasks, such as Imagenet or SUN dataset, which is the next work that the authors are planning to do. In fact, the PCCNN model is not only applicable to image classification and recognition, but it can also provide a novel thinking methodology for deep CNN model designing. In the deep CNN model, the deeper the architecture is, the more neurons and parameters exist, and the complexity also significantly increases. Thus, increase the width of the model can be increased to match the features and obtain better performance. Although this method also leads to an increase in the number of neurons and parameters, the rate of increase is slower than when more layers are added in the single model; furthermore, the model is more in line with the human visual physiological mechanism. Finally, the PCCNN model had great extendibility.
Keywords:image classification  recognition  alex-net  deep CNN  parallel cross  human vision
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号