Cucurbitaceae are among the most economically, nutritionally important and large plant families (100 genera and 1000 species) [1][2]. This family includes cucumber, melon, watermelon, pumpkins and squashes [3]. Cucurbita genus contains earliest domesticated plant species [4], like C. moschata Duchesne, C. maxima Duchesne and C. pepo L. These species are appreciated due to their nutritional and medical properties as they are able to reduce the risk of coronary heart disease, as well as blood glucose and serum cholesterol levels [5]. This is due to the presence of phytoconstituents such as tannins, flavonoids, and terpenoid cucurbitacins, which provide them with antioxidant, anti-inflammatory, antidiabetic, and other properties [6]. Although the primary and most consumed part of the plant is the mature fruit, the leaves, flowers, and shoots are also used for culinary purposes [7]. Additionally, the seeds are an important source of protein, essential fatty acids, and linoleic acid [8]. Moreover, plants belonging to the Cucurbitaceae family are used as a nutritional supplement in livestock, poultry, and aquaculture [9].
Cucurbita moschata is an important vegetable in America. It was cultivated since the pre-Colombian era and probably was domesticated in Mexico and South America independently [10]. Currently, it constitutes an important part of the traditional polyculture systems in Mexico and Peru together with corn, beans, among others. In Peru there is a squash landrace called “loche”. This landrace has been cultivated at least since the Chimú culture, and currently is grown exclusively and traditionally in the north coast of Peru, in the geographical departments of Lambayeque and is unknown elsewhere (Andres et al., 2006). Interestingly, this squash is vegetatively propagated, possesses low genetic diversity, and mostly lacks of seeds in fruits and also presents warts in its skin [12]. In addition, this crop represents an important component in the northern Peruvian gastronomy, and is also part of the economy and culture of northern Peru.
Chloroplast plays an import role into vital metabolic events, including photosynthesis, lipid synthesis and amino acid. Angiosperm plastid genomes exist in linear form and most commonly circular form [13]. Its size varies between 120 ± 150 kb, with a quadripartite structure containing a large and small single copy regions (LSC and SSC) separated by two inverted repeats regions (IRA and IRB). The chloroplast includes 110 ± 130 genes involved in photosynthesis, translation and transcription [14]. Due to conserved gene, content, organization and order makes them well suited for evolutionary studies [15], phylogenetic analyses [16], population structure [17], structural rearrangements, pseudogenes or additional mutation events [15] and genetic engineering studies [18]. Currently, most cp genomes of important Cucurbitaceae species were sequenced. Cong et.al [19] revealed that C. ficifolia possesses 157,533 bp in length, a pair of inverted repeats regions (IRs) of 25,639 bp, separated by large single copy (LSC) (88,112 bp) and small single copy (SSC) (18,143 bp). This cp genome encodes 86 protein-coding genes, eight rRNA genes and 36 tRNA genes. Finally, maximum likelihood phylogenetic analysis revealed that C. ficifolia is a base clade of genus Cucurbita and closer to C. maxima. Cucumis melo ‘Shengkaihua’ is another important species whose cp genome has already been revealed. Using next-generation sequencing, the entire cp genome was obtained, possessing 156,017 bp in length with typical structure: LSC (86,335 bp), SSC (18,088 bp), separated by a pair of 25,797 bp (IR regions). This genome contained 133 genes, including 88 protein-coding genes, 37 tRNA genes, and eight rRNA genes. The GC content of the genome is 36.9%. The phylogenetic tree reconstructed by 24 chloroplast genomes revealed that C. melo is most related to Cucumis melo var. inodorus.
The increased use of next-generation sequencing technologies has allowed access to a large amount of nucleotide data, facilitating comparative studies to better understand phylogenetic hypotheses [20]. Zhang et al. [21] pointed out that Cucurbitaceae is the fourth most important economic plant family, mainly distributed in tropical and subtropical regions. They compared and described the complete cp genome sequences of ten representative species from Cucurbitaceae. The cp genomes of the ten species ranged from 155,293 bp (C. sativus) to 158,844 bp (M. charantia). Phylogenetic analysis strongly supported the position of Gomphogyne, Hemsleya, and Gynostemma as the relatively original lineage in Cucurbitaceae. On the other hand, [22] carried out a comparative analysis of the cp genome of nine varieties of Cucumis melo, which represented the morphological diversity of two subspecies, Cucumis melo ssp. melo and C. melo ssp. agrestis. This study demonstrated that the cp genome of melon is relatively conserved, and the phylogenetic results indicated that ssp. melo and ssp. agrestis formed a monophyletic group, providing a quick and simple method to identify and differentiate them. Therefore, large-scale comparisons of cp sequences are of great interest, as they provide solid evidence for taxonomic studies, species identification, and understanding the mechanisms underlying evolution in Cucurbitaceae.
To date, C. moschata is still considered a neglected crop [23] because the agricultural, economic and biological importance of this cucurbit is still unknown. Here we sequenced for the first time the complete cp genome of this Peruvian orphan crop and compared it with other eight important species within the Cucurbitaceae family. This work adds valuable information to the complete chloroplast genomics of Cucurbitaceae, providing a solid foundation for the development of DNA barcoding at the species level, the use of microsatellites (SSRs) as polymorphic molecular markers, as well as for studies on the evolution and molecular identification of C. moschata cultivars.