PMID- 32998366 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20201030 IS - 1424-8220 (Electronic) IS - 1424-8220 (Linking) VI - 20 IP - 19 DP - 2020 Sep 28 TI - An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs. LID - 10.3390/s20195558 [doi] LID - 5558 AB - Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2x-4.0x faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator. FAU - Zhao, Yunping AU - Zhao Y AUID- ORCID: 0000-0002-5600-3740 AD - College of Computer, National University of Defense Technology, Changsha 410073, China. FAU - Lu, Jianzhuang AU - Lu J AD - College of Computer, National University of Defense Technology, Changsha 410073, China. FAU - Chen, Xiaowen AU - Chen X AD - College of Computer, National University of Defense Technology, Changsha 410073, China. LA - eng GR - 2018XK2102/Hunan Provincial Science and Technology Plan Project/ PT - Journal Article DEP - 20200928 PL - Switzerland TA - Sensors (Basel) JT - Sensors (Basel, Switzerland) JID - 101204366 SB - IM PMC - PMC7583864 OTO - NOTNLM OT - CNNs accelerator OT - hardware architecture OT - parallel computing algorithm COIS- The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. EDAT- 2020/10/02 06:00 MHDA- 2020/10/02 06:01 PMCR- 2020/10/01 CRDT- 2020/10/01 01:01 PHST- 2020/08/24 00:00 [received] PHST- 2020/09/17 00:00 [revised] PHST- 2020/09/26 00:00 [accepted] PHST- 2020/10/01 01:01 [entrez] PHST- 2020/10/02 06:00 [pubmed] PHST- 2020/10/02 06:01 [medline] PHST- 2020/10/01 00:00 [pmc-release] AID - s20195558 [pii] AID - sensors-20-05558 [pii] AID - 10.3390/s20195558 [doi] PST - epublish SO - Sensors (Basel). 2020 Sep 28;20(19):5558. doi: 10.3390/s20195558.