PMID- 36772658
OWN - NLM
STAT- PubMed-not-MEDLINE
DCOM- 20230213
LR  - 20230214
IS  - 1424-8220 (Electronic)
IS  - 1424-8220 (Linking)
VI  - 23
IP  - 3
DP  - 2023 Feb 2
TI  - WLiT: Windows and Linear Transformer for Video Action Recognition.
LID - 10.3390/s23031616 [doi]
LID - 1616
AB  - The emergence of Transformer has led to the rapid development of video 
      understanding, but it also brings the problem of high computational complexity. 
      Previously, there were methods to divide the feature maps into windows along the 
      spatiotemporal dimensions and then calculate the attention. There are also 
      methods to perform down-sampling during attention computation to reduce the 
      spatiotemporal resolution of features. Although the complexity is effectively 
      reduced, there is still room for further optimization. Thus, we present the 
      Windows and Linear Transformer (WLiT) for efficient video action recognition, by 
      combining Spatial-Windows attention with Linear attention. We first divide the 
      feature maps into multiple windows along the spatial dimensions and calculate the 
      attention separately inside the windows. Therefore, our model further reduces the 
      computational complexity compared with previous methods. However, the perceptual 
      field of Spatial-Windows attention is small, and global spatiotemporal 
      information cannot be obtained. To address this problem, we then calculate Linear 
      attention along the channel dimension so that the model can capture complete 
      spatiotemporal information. Our method achieves better recognition accuracy with 
      less computational complexity through this mechanism. We conduct extensive 
      experiments on four public datasets, namely Something-Something V2 (SSV2), 
      Kinetics400 (K400), UCF101, and HMDB51. On the SSV2 dataset, our method reduces 
      the computational complexity by 28% and improves the recognition accuracy by 1.6% 
      compared to the State-Of-The-Art (SOTA) method. On the K400 and two other 
      datasets, our method achieves SOTA-level accuracy while reducing the complexity 
      by about 49%.
FAU - Sun, Ruoxi
AU  - Sun R
AUID- ORCID: 0000-0002-4430-3279
AD  - Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 
      201210, China.
AD  - School of Information Science and Technology, ShanghaiTech University, Shanghai 
      201210, China.
FAU - Zhang, Tianzhao
AU  - Zhang T
AD  - Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 
      201210, China.
AD  - School of Electronic, Electrical and Communication Engineering, University of 
      Chinese Academy of Sciences, Beijing 100049, China.
FAU - Wan, Yong
AU  - Wan Y
AD  - State Key Laboratory of Geomechanics and Geotechnical Engineering, Institute of 
      Rock and Soil Mechanics, Chinese Academy of Sciences, Wuhan 430071, China.
FAU - Zhang, Fuping
AU  - Zhang F
AUID- ORCID: 0000-0002-8042-4609
AD  - Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 
      201210, China.
FAU - Wei, Jianming
AU  - Wei J
AD  - Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 
      201210, China.
LA  - eng
GR  - 51827814/National Natural Science Foundation of China/
GR  - 2021289/Youth Innovation Promotion Association/
PT  - Journal Article
DEP - 20230202
PL  - Switzerland
TA  - Sensors (Basel)
JT  - Sensors (Basel, Switzerland)
JID - 101204366
SB  - IM
PMC - PMC9919352
OTO - NOTNLM
OT  - Spatial-Windows attention
OT  - action recognition
OT  - linear attention
OT  - self-attention
OT  - transformer
COIS- The authors declare no conflict of interest.
EDAT- 2023/02/12 06:00
MHDA- 2023/02/12 06:01
PMCR- 2023/02/02
CRDT- 2023/02/11 01:45
PHST- 2022/12/13 00:00 [received]
PHST- 2023/01/28 00:00 [revised]
PHST- 2023/01/29 00:00 [accepted]
PHST- 2023/02/11 01:45 [entrez]
PHST- 2023/02/12 06:00 [pubmed]
PHST- 2023/02/12 06:01 [medline]
PHST- 2023/02/02 00:00 [pmc-release]
AID - s23031616 [pii]
AID - sensors-23-01616 [pii]
AID - 10.3390/s23031616 [doi]
PST - epublish
SO  - Sensors (Basel). 2023 Feb 2;23(3):1616. doi: 10.3390/s23031616.