FocusNet is a new Multi-Scale Parallel CNN model. The model is a parallel backbone architecture with a self-attention mechanism. The attention-based feature-fusion technique averages out the parallel backbone's feature maps. This model introduces the desirable properties of ViT into the CNN architecture. We validate the working of FocusNet by conducting extensive experiments on the CIFAR-10 Dataset. We display that the induction of self-attention in the model's backbone and our attention-based feature-fusion technique dramatically improve the model's accuracy.