Figure S1. (A) Experimental design of MMTV-Neu bulk and single cell RNA sequencing. (B) Enrichr analysis29,30 of differentially expressed genes (DEG) in MMTV-HER2 early lesion (EL) and primary tumor (PT) 7-day spheres bulk RNAseq. Full table in STable 2. Orange, terms mentioned in (C) Biological negative controls used for FACS gating strategy. FvB mammary gland (MG) was used to set the EL and PT gate and FvB lungs for eL and LL DCCs (see Figure 1F). (D) Percentage of epithelial (EpCAM+Eng-), hybrid (EpCAM+Eng+) and mesenchymal (EpCAM-Eng+) populations in CD45-HER2+ MMTV-HER2 EL, PT and eL (early lungs) and LL (late lungs) DCCs after tissue dissociation (representative FACS plots in Figure 1F). Figure S2.(A) Distribution of Epithelial (Ep) and Mesenchymal (M) scores (gene lists in STable 4, showed in Figure 2A) in MMTV-HER2 lung DCC clusters. Cell clusters were sub-grouped as M-like (1-4, higher M-like score), Hybrid (5-8) and Ep-like (9-15). (B) Normal lung cells (grey), eL (early lungs, blue) and LL (late lungs, orange) DCCs single-cell RNAseq sample distribution per cluster. Unsupervised clustering on the DEGs was performed using a previously described batch-aware algorithm34. (C) Heatmap of UMI counts of selected genes (gene lists in STable 4) in MMTV-HER2 normal lung cells and eL and LL DCCs single-cell RNAseq. N1-10 are clusters enriched in non-cancer (HER2-) lung cells and excluded in the analysis. Clusters 1-15have with less than 16% of non-cancer (HER2-) lung cells, so non-cancer lung cells were excluded further analysis but these clusters were considered cancer cell clusters. (D) Distribution of gene modules B and D (M-like) in all DCC clusters. Dots represent single cells color-coded by cluster (left), sample origin (eL or LL, middle) and sub-gourp (Ep-like, hybrid, M-like, right). Gene module lists in STable 4. (E) Distribution of gene modules I (Ep-like) and D (M-like) in all DCC clusters. Dots represent single cells color-coded by cluster (left), sample origin (eL or LL, middle) and sub-gourp (Ep-like, hybrid, Mlike, right). Gene module lists in STable 4. (F) Heatmap of UMI counts of selected genes (gene lists in STable 4) in MMTV-HER2 eL (early lungs,) and LL (late lungs) DCCs single-cell RNAseq after unsupervised clustering on the DEGs and down-sampling to 500 UMI per cell. ‘Per cell’ representation of Figure 2B heatmap, which shows UMI averages. Figure S3. (A) mRNA expression of ZFP281, its predicted targets (Figure 3A) and EMT genes in EL vs. PT cells, EL shCt, EL shZFP281 and PT ZFP281-OE. Red, upregulated genes; Blue, downregulated genes; *p-value <0.05. (B) Representative images of ZFP281 (1st column, green), E-cadherin (2nd column, green) and Twist1 (3rd column, green) protein expression in consecutive sections of FvB mammary gland (FvB MG,biological negative control) and MMTV- HER2 EL and PT tissues. HER2 expression in red. Arrows point to FP281+EcadlowTwist1+ cells in EL. Dashed arrow points to ZFP281+ adipocytes (internal control). Scales, 20 um. Figure S4. (A) Heatmap of combined RNAseq and ChIP seq data from EL/PT cells. 504 genes show higher ZFP281 binding and higher expression in EL vs PT cells; 118 genes show higher ZFP281 binding and higher expression in PT vs EL cells; 41 genes show higher expression while lower binding in PT vs EL cells; 63 genes show higher expression while lower binding in EL vs PT cells. Gene lists in STable 10. (B) Venn diagram of EL/PT RNAseq (Figure 1A), ZFP281 node (Figure 3A) and ChIPseq (Figure 4B) data. Targets of ZFP281 in EL cells and EpiSCs were identified from ChIP-seq data and further used to compare with EMT, Wnt, FGFR, and cell cycle arrest genes. (C) Representative tracks of EL/PT ChIPseq (Figure 4B-C). Example genes, Snai1, Tgfbr1, Vim, Zeb1, Cdk2, and Cdkn1a are used to show the difference binding between EL and PT cells. (D) Frequency of ZFP281 target (ChIP) score, summarizing the averaged expression of ZFP281 targets, in all cells analyzed by scRNAseq (Figure 2).(E) Distribution of gene modules I (Ep-like) and B (M-like) in all DCC clusters. Dots represent single cells color-coded by ZFP281 target scores (low, red to high,green). Figure S5. (A) Column of representative images of the mammosphere phenotype of EL, PT and EL shControl±DOX cells. Scale 50 um. (B) Quantification of mammosphere (MS) frequency of EL, PT and EL shControl±DOX cells. Graph shows n=3, mean, SEM and 2-tailed Mann-Whitney test. (C) Quantification of mammosphere (MS) size, as number of cells per sphere after dissociation of EL, PT and EL shControl±DOX spheres. Graph shows n=3, mean, SEM and 2-tailed Mann-Whitney test. (D) EpCAM (epithelial marker) and Eng/CD105 (mesenchymal marker) expression in EL, PT and EL shControl±DOX cells. Representative experiment of n=3 biological replicates. (E) Fold-change of Ep-like (EpCAM+Eng-), hybrid (EpCAM+Eng+) and M-like (EpCAM-Eng+) populations in EL over PT and EL shControl±DOX spheres. Graph shows n=3, mean, SEM and 2-tailed Mann-Whitney test. (F) Column of representative images of 3D-matrigel organoids and invasive phenotype of EL, PT and EL shControl±DOX cells. Scale 50 um. (G) Quantification of percentage of 3D-matrigel spheroids with invasive protusions per condition. Graph shows n=4, mean, SEM and 2-tailed Mann-Whitney test. Figure S6. (A) Quantification of Ki67+ cells in lung metastasis 5 months after EL shZFP281 sphere injections. Graph shows n=5 mice/condition, median and 2-tailed Mann-Whitney test. (B) Quantification of Ki67+ cells in lung metastasis 5 months after PT Control or PT ZFP281-OE sphere injections. Graph shows n=5 mice/condition, median and 2-tailed Mann-Whitney test.