Scene Text Recognition (STR) has significantly enhanced the efficiency of information acquisition and interaction in natural environments. However, it also introduces potential security risks, such as the unauthorized recognition and extraction of sensitive textual information from the environment, including personal identification numbers, license plates, and other confidential data. To address privacy protection in scene text, recent research has proposed using minimal pixel perturbations to safeguard textual information, making the content observable but difficult to extract accurately. However, such perturbation attacks are easily noticeable to the human eye, allowing adversaries to counteract them with defensive measures such as filtering small perturbations. Existing methods fail to simultaneously ensure high visual quality and make the perturbation attacks imperceptible. In this study, we propose a novel scene text adversarial sample generation method incorporating up-sampling. This method achieves a high attack success rate while increasing the payload applied to the image, preserving the perturbed image quality, and improving the stealthiness of the adversarial samples. To further enhance the quality of the perturbed images, we introduce the Adaptive Local Search Attack (ALSA), which utilizes adaptive perturbation based on visual quality and perceptual loss to ensure that the perturbed image remains as similar as possible to the original image in human vision, which can further enhance the stealthiness of adversarial samples and make perturbation attacks difficult to detect. Our experimental results show that the proposed method maintains high visual quality while achieving a better protection success rate across various text recognition models compared to existing methods.