Fungal culture preparation
The pathogen was isolated from the shot hole infected leaves of stone fruits viz., plum, peach, apricot, and cherry and almonds among nut crops grown in University orchard of SKUAST-K, Shalimar, Srinagar (J&K). The purified fungal culture was maintained on Asthana and Hawker’s and potato dextrose agar (PDA) 10,14 media. On the basis of morpho-cultural characteristics, the pathogen was identified as Wilsonomyces carpophilus synonym Thyrostroma carpophilum Nabi 1,10. The pathogenicity of these isolates was carried out by detached leaf technique on their respective hosts 15 followed by their cross infectivity on different stone fruits including almond.
DNA isolationfor whole genome sequencing
The most virulent isolate of the pathogen based on minimum incubation time and symptom development, was selected for whole genome sequencing. The DNA of the pathogen isolate was extracted using XcelGen DNA isolation Kit (Xceleris, Ahmedabad, India) according to the manufacturer instructions. The quality and quantity of extracted DNA was checked using a Qubit 2.0 Fluorimeter (Life Technologies Ltd., Paisley, UK). The integrity of DNA (DIN) was checked using Bioanalyser 2100 (Agilent Technologies, Santa Clara, CA).
Library preparation and genome sequencing
The DNA Library was prepared using NEBNext Ultra DNA Library Prep Kit (Biolabs, England). The library preparation process was initiated with 200ng DNA. The adapters were ligated to both ends of the DNA fragments. These adapters contain sequences essential for binding dual-barcoded libraries to a flow cell for sequencing and PCR amplification. To ensure maximum yield from a limited amounts of starting material, a high-fidelity amplification step was performed using PCR Master Mix.
The whole genome of plant pathogenic fungus W. carpophilus was decoded using Illumina HiSeq and PacBio sequencing technologies. De Novo assembly of high quality paired end reads was accomplished using Velvet v1.2.10 and the assembly was optimized at Kmer-79 (Supplementary Table 2.) (Fig. 7). Further, scaffolding was performed on pre-assembled contigs taking long reads of PacBio using SSPACE-LongRead v1.1. We aligned Illumina short reads on PacBio long reads (a hybrid approach) using PBJelly software and GapCloser v1.12 to increase the precision of base calling.
Gene prediction and annotation
The assembled genome was subjected to gene prediction using Augustus v2.5.5 for the identification of coding sequences. The predicted protein coding genes were subjected to similarity search against NCBI's non-redundant (nr) database using Uniprot, KOG and Pfam database of BLASTP algorithm with an e-value threshold of 1e-5. Simultaneously, all the proteins were searched for similarity against BLASTP with an e-value threshold of 1e-5. Comparative analysis of gene annotation in different database was carried out using http://www.interactivenn.net/. Gene Ontology (GO) annotation was obtained using nr database through Blast2GO command line v-1.4.1. GO sequence distributions helps in specifying all the annotated nodes comprising of GO functional groups. Genes associated with the similar functions were assigned to same GO functional group. The GO sequence distribution was analyzed for all the three GO domains i.e. biological processes, molecular function and cellular components.
Secretome mining
The secretool was used to predict W. carpophilus secretome that enables secretome predictions out of amino acid sequence files (http://genomics.cicbiogune.es/SECRETOOL/Secretool.php). The Signal-P (v4.1) and WoLF PSORT (v0.2) were used to identify signal peptides and extracellular localizations in total of 10901 protein coding genes. The TMHMM (v2.0) and PredGPI (http://gpcr.biocomp.unibo.it/predgpi/pred.htm) were used to eliminate sequences with transmembrane domains, ER-retention signal and GPI (glycosylphosphatidyl inositol)-anchors, respectively (Fig. 8).
Simple sequence repeats
A high-throughput SSR search to identify mono- to hexa- nucleotide SSR motifs was performed using MIcroSAtellite (MISA) identification tool (http://pgrc.ipk-gatersleben.de/misa/download/misa.pl) with default parameters. The default parameters were used so that di-nucleotide pattern should appear at least six times, whereas tri-, tetra-, penta- and hexa- nucleotide motifs should appear five times.
Pathway analysis
Pathway analysis, ortholog assignment and mapping of genes to the biological pathways were performed using KEGG automatic annotation server (KAAS). All the gene sequences were compared against the KEGG database using BLASTP with threshold bit-score value of 60 (default).
Identification of tRNAs and rRNAs in the genome
To identify probable tRNA genes, we used tRNAscan-SE that allows detection of unusual tRNA species with accurate prediction of secondary structures. It includes both prokaryotic and eukaryotic selenocysteine tRNA genes, tRNA-derived repetitive elements and pseudogenes. The RNAmmer 1.2 was used for rRNA gene identification.
Collection of diseased planting material / samples
Necessary permission whenever required was obtained, and all the guidelines and legislation were followed for the collection of diseased planting material or samples from University orchard, SKUAST-K, Shalimar, Srinagar (J&K), India