General genome features
The genome of P. vexans isolate PV4 was sequenced by nanopore strand-sequencing. Statistics of genome sequencing and assembly are showed in Table 1 and 2. A total of 7,557,554,305 bp raw data was obtained, of which 7,284,861,206 bp clean data left after filtering low-quality data. The assembled complete genome was about 59.78 Mb with 51.24% G+C content and 121.85×Coverage. The contig N50 was 5,171,527 bp.
Repetitive elements identify
3,552 annotated repetitive elements were identified in the genome of PV4 (Table 3). The information about their annotation source, type, loci and attributes were shown in Table S1. They were generally classified in Class I (Retrotransposons, 738), Class II (DNA Transposons, 369), Potential host gene (455) and SSR (1,990). Major types were LTR/Copia (258) and MITE (218) in Class I and Class II separately. There were 4,931 unknown repetitive elements.
Gene prediction and annotation
As shown in Table 4, 15,034 genes with 26,919,018pb total length and 1,790bp average length were predicted. The total number of exons, CDS and Intro were 42,753, 27,719 and 42,689, separately. The average number of exons, CDS and Intros were 2.84, 1.84 and 2.84, respectively.
A total of 14,181 predicted genes were annotated with NCBI nr (14,116), GO (5,841), KEGG (3,729) and other four database (Table S2). Moreover, 95 predicted genes did not significantly match any known genes.
Nr homologous species distribution analysis showed that PV4 had most homologous genes with Togninia minima (21.97%), Pestalotiopsis fici (7.45%) and Colletotrichum gloeosporioides (6.36%) (Fig 1). Functional categorization and distribution of predicted genes by GO annotation are showed in Fig. 2. Distribution of annotated genes in KEGG database is shown in Fig. 3. Biosynthesis of amino acids (145), Carbon metabolism (124) and Ribosome (102) had the most annotated genes.
Carbohydrate-Active enzymes (CAZy) Database was employed to find genes encoding carbohydrate-activated enzymes which could be plant cell-wall degrading enzymes. 1,206 genes were annotated and separated in 6 CAZy type which was Auxiliary Activities (AAs, 253), Glycoside Hydrolases (GHs, 483), Glycosyl Transferases (GTs, 128), Polysaccharide Lyases (PLs, 43), Carbohydrate Esterases (CEs, 197) and Carbohydrate-Binding Modules (CBMs, 102) (Table 5).
Pathogen-host interactions (PHI) database was used to find more information about genes related to pathogen-host interactions. Result was showed in Table S3. Protein subcellular location analysis predicted that there were 1,786 signal peptides, 3,223 transmembrane proteins, 1,394 secreted proteins, and 134 effector proteins (Table S4).