Facial expression is a form of non-verbal communication, which expresses people's mental state and emotions and plays a crucial role in our daily communication. Facial expressions can usually be divided into six categories: happiness, sadness, fear, anger, disgust, and surprise. Researchers have achieved excellent recognition performance on macro expressions. Numerous expression recognition [1] systems are developed, which can reach more than 95% classification accuracy [2, 3]. Compared to expression studies, micro-expression has a shorter history. It was first proposed by Haggard et al. in 1966 [4], who argued that micro-expression is related to ego defense mechanisms and expresses repressed emotions. Ekman and Friesen also discovered micro-expression in 1969 [5].
Micro-expression is a rapid, unconscious, spontaneous facial movement that occurs when a person is experiencing strong emotions. Micro-expression, neither faked nor suppressed, is produced when people try to hide their inner emotions [4]. Micro-expression is characterized by a short duration, typically lasting 1/25 ~ 1/3s [6]. Another characteristic is low-intensity movement so that it does not occur simultaneously in the upper and lower part of the face.
Micro-expression is commonly applied in clinical diagnosis, emotional intelligence, judicial investigation, etc. Although the Micro-Expression Training Tool (METT) [7] has been developed to train professionals, the results of human recognition are still not ideal, with only 47% reported in the literature [8]. Hence MER needs to be realized automatically by a computer, which can handle large-scale MER tasks at an inexpensive cost whenever an efficient and stable model is trained [8].
MER mainly involves the establishment of micro-expression datasets, preprocessing techniques, and micro-expression recognition algorithms. Until now, only a few micro-expression datasets are available and according to the method of elicitation, they are classified into two categories: posed and spontaneous. This paper is conducted on the spontaneous micro-expression datasets entirely. The main publicly available spontaneous micro-expression datasets are SMIC [9], CASME [10], CASME II [11], CAS(ME)2 [12] and SAMM [13].
Significant progress has been made in MER based on the release of these datasets mentioned above. However, the current works still suffer from the excessive amount of model parameters and insufficient extraction of micro-expression features. To address the above deficiencies, we propose a novel network named MFE-Net, which reduces the number of model parameters significantly, obtains more critical and essential features, and suppresses useless information as well. The results on public benchmarks demonstrate that MFE-Net is viable for MER. The contributions of this paper are summarized as follows:
(1) We propose a novel MER network with three branches that have different convolution kernels to extract multi-scale micro-expression features.
(2) The channel attention SE and MHSA are embedded in Res-blocks to focus on the most informative channels and extract valuable features.
(3) Extensive experiments are conducted on multiple micro-expression datasets, and the results show that the proposed method outperforms or is comparable to the state-of-the-art methods on public and composite datasets.
The remainder of this paper is organized as follows. Section 2 introduces the related works. Section 3 details the proposed and theoretical derivation of our approach. A detailed description of the experiments is given in Section 4. Finally, Section 5 draws a brief conclusion of our approach.