Although deep learning can automatically extract features in relatively simple tasks such as image analysis, the construction of appropriate representations remains essential for molecular predictions due to intricate molecular complexity. Additionally, it is often expensive, time-consuming, and ethically constrained to generate labeled data for supervised learning in molecular sciences, leading to challenging small and diverse datasets. In this work, we develop a self-supervised learning approach via a masking strategy to pre-train transformer models from over 700 million unlabeled molecules in multiple databases. The intrinsic chemical logic learned from this approach enables the extraction of predictive representations from task-specific molecular sequences in a fine-tuned process. To understand the importance of self-supervised learning from unlabeled molecules, we assemble three models with different combinations of databases. Moreover, we propose a new protocol based on data traits to automatically select the optimal model for a specific predictive task. To validate the proposed representation and protocol, we consider 10 benchmark datasets in addition to 38 ligand-based virtual screening datasets. Extensive validation indicates that the proposed representation and protocol show superb performance.