The main goal of Pedestrian Attribute Recognition (PAR) is to identify various attributes of pedestrians captured in video surveillance. Due to the numerous categories of pedestrian attribute labels, the complex and easily overlooked correlations among attributes, PAR is a challenging task. Traditional methods usually treat each attribute independently, ignoring the possible intrinsic correlations between attributes.We design a pedestrian attribute recognition network ACMFNet which can fuse pedestrian attributes uniqueness features and attribute correlation features. Specifically, we propose an attribute correlation query module (ACQM), which are used to learn discriminative attribute features. Then, we construct a mask fusion module (MFM) to automatically learn the importance of the image feature and attribute correlation feature. To better distinguish the modality differences between images and attribute texts, we propose modality prompt. Experimental results show that our method can significantly enhance the network’s ability to recognize pedestrian attributes. On three pedestrian attribute recognition datasets PA100K, PETA, and UAV-Human, our proposed method shows competitive performance compared to the state-of-the-art methods. Our source code is available at \url{https://github.com/luffy-op/ACMFNet.