Whereas accelerated attention beclouded early stages of the coronavirus spread, knowledge of actual pathogenicity and origin of possible sub-strains remained unclear. By harvesting the Global initiative on Sharing All Influenza Data (GISAID) database (https://www.gisaid.org/), between December 2019 and August 20, 2020, a total of 157 human SARS-CoV-2 (complete) genome sequences processed by gender, across 6 continents of the world, were analyzed. We hypothesized that data speaks for itself and can discern true and explainable patterns of the disease. Identical genome diversity and pattern correlates analysis performed using a hybrid of biotechnology and machine learning methods corroborate multiple emergence of SARS-CoV-2 sub-strains and explained the diversity of the SARS-CoV-2. Interestingly, some viral sub-strains progressively transformed into new sub-strain clusters indicating varying amino acid and strong nucleotide association derived from same origin. A novel approach to cognitive knowledge mining from enriched genome datasets and output targets labeling, helped intelligent prediction of emerging or new viral sub-strains.