Development and validation of microsatellite markers for an endangered dragonfly, Libellula angelina (Odonata: Libellulidae), with notes on population structures and genetic diversity

The Bekko Tombo, Libellula angelina Selys, 1883 (Odonata: Libellulidae), is listed as an endangered species in South Korea, and is classified as a critically endangered species by the International Union for Conservation of Nature (IUCN). An assessment of the genetic diversity and population relationships of the species by molecular markers can provide the information necessary to establish effective conservation strategies. In this study, we developed 10 microsatellite markers specific to L. angelina using the Illumina NextSeq 500 platform. Forty-three samples of L. angelina collected from three localities in South Korea were genotyped to validate these markers and to preliminarily assess the population genetic characteristics. The 10 markers revealed 4–11 alleles, 0.211–0.950 observed heterozygosity (H O), and 0.659–0.871 expected heterozygosity (H E) in the population with the largest sample size (n = 20), thereby validating the suitability of these markers for population analyses. Our preliminary assessment of the population genetic characteristics appears to indicate the following: presence of inbreeding in all populations, an isolation of the most geographically distant population (Seocheon), and a lower H O than H E. The microsatellite markers developed in this study will be useful for studying the population genetics of L. angelina collected from additional sites in South Korea and from other regions.


Introduction
The Bekko Tombo (Libellula angelina Selys, 1883; Odonata: Libellulidae), distributed throughout northern China, Japan, and Korea (Inoue, 2004;Jung, 2012), is listed as an endangered species in South Korea. It occurs in several areas in the western region of South Korea (Figure 1; e.g. Incheon, Gwangmyeong, Ansan, Yongin, Anseong, Seocheon, Gimje, and Jeonju); however, because of the reduction in population and extinction, there are only a limited number of stable populations of L. angelina at present (Shin et al., 2012). Natural ponds and swamps rich in organic matter are the main habitats for Bekko Tombo species in South Korea (National Institute Figure 1. The distribution and sampling localities of Libellula angelina in South Korea, with the pairwise results of genetic distance (F ST ) in a geographical context. The distribution localities are marked in dark grey, and sampling localities are marked in green. General locality names are as follows: 1, Incheon Metropolitan City; 2, Gwangmyoung City, Gyeonggi-do Province; 3, Ansan City, Gyeonggi-do Province; 4, Yongin City, Gyeonggi-do Province; 5, Anseong City, Gyeonggi-do Province; 6, Seocheon City, Chungcheongnam-do Province; 7, Gimje City, Jeollabuk-do Province; and 8, Jeonju City, Jeollabuk-do Province. This map was acquired from the Korea National Spatial Data Infrastructure Portal. The asterisk indicates statistical significance, (p < 0.05).
of Biological Resources, 2013), but such habitats are declining because of urban development and expansion, which are accompanied by reclamation and contamination.
At the international level, L. angelina has been classified as critically endangered since 1986 by the International Union for Conservation of Nature (IUCN). According to Inoue (2004), who designated the species as endangered in Japan, the prime habitats necessary for the maintenance of sustainable L. angelina populations in the country are old and stable ponds, with moderate growth of emergent vegetation and an area of open water, in lowland hill areas.
Populations decline, however, as a consequence of anthropic developments that destroy and degrade such habitats, besides introducing predators that threaten native species (Inoue, 2006). Furthermore, several dragonfly species including L. angelina, which were once common in lentic habitats, are now reported as endangered, because of rapid and extensive changes and degradation of agricultural habitats in Japan (Kadoya, Suda, & Washitani, 2009).
Estimation of population genetic characteristics such as population structures, genetic diversity, and genetic isolation is important for establishing a conservation strategy for endangered species (Moritz, 2002;Palsbøll, Berube, & Allendorf, 2007). The genetic diversity and differentiation of L. angelina in the Okegayanuma area of Japan was previously determined using random amplified polymorphic DNA analyses; the species was found to present low genetic diversity among and within populations and exhibited no significant genetic difference among populations (Takahashi, Fukcui, & Tubaki, 2009). However, the South Korean populations of L. angelina have never been subjected to population genetic analyses.
In this study, we developed 19 new microsatellite markers from L. angelina. To our knowledge, the present study is the first of its type for this species and other congeneric species. Given the limited access to this endangered species and its rarity, population genetic analyses were performed based on the examination of a limited number of samples from three South Korean localities, and the validity of the markers was tested using the largest population.

Sampling and DNA extraction
Adult L. angelina were sampled from three localities in the western region of South Korea in June 2016 ( Figure 1). Seocheon (36.027328°N, 126.717994°E; n = 13) is the southernmost collection locality, and is situated approximately 135 km south of Ansan (37.272664°N, 126.581689°E; n = 20) and approximately 160 km south of Gwangmyoung (37.458503°N, 126.869561°E; n = 10). Gwangmyoung and Ansan are located approximately 35 km from each other. The latter locality is a small offshore islet (34.39 km 2 in area and less than 1 km from the nearest mainland in South Korea). For each location, we obtained the necessary collection permits from the respective authorized environmental offices.
Total DNA was extracted from the hind legs of the collected L. angelina using a Wizard Genomic DNA Purification Kit (Promega, Madison, WI, USA) as per the manufacturer's instructions and was stored at -20°C until use. Voucher specimens were deposited at the National Institute of Biological Resources, Incheon, South Korea (accession numbers NIBRGR0000412973-NIBRGR0000413014, NIBRGR0000413016).

Development of microsatellite markers and genotyping
For the construction of a DNA library, one specimen collected from Ansan was used. DNA quality and concentration were assessed using a NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Genomic DNA (200 ng) was sheared into fragments of approximately 550 bp using a Covaris S220 ultrasonicator (Covaris, Woburn, MA, USA) and then processed to produce an Illumina paired-end library using a TruSeq Nano DNA Library Kit (Illumina, San Diego, CA, USA). Size and concentration of the prepared library were confirmed using an Agilent 2100 Bioanalyzer system (Agilent Technologies, Santa Clara, CA, USA) and a quantitative PCR-based KAPA library quantification kit (KAPA Biosystems, Wilmington, MA, USA), respectively. The library was sequenced on an Illumina NextSeq 500 system (San Diego, CA, USA) using 150 base-length read chemistry in a paired-end mode.
For assembly, sequencing errors were discarded using the error correction module of ALLPATHS-LG (Gnerre et al., 2011). Then, genome assembly was performed using IDBA_UD  (Peng, Leung, Yiu, & Chin, 2012) with the pre-correction option. To identify reliable assemblies, short reads were remapped to assembled sequences using Bowtie2 (Langmead & Salzberg, 2012); only assembled scaffolds with a depth of ≥ 10 × and coverage ≥ 95% were retained for microsatellite marker identification.

Data analyses
To validate the markers, selected genetic diversity measures, such as number of alleles, observed heterozygosity (H O ; Weir, 1990), and expected heterozygosity (H E ; Weir, 1990), were calculated per population for each locus using GenAlEx ver. 6.5 (Peakall & Smouse, 2012). F IS (Hartl & Clark, 1997), which is a measure of the deficiency in heterozygosity resulting from non-random mating, was estimated per population for each locus and also for each population for all loci using GenAlEx ver. 6.5 (Peakall & Smouse, 2012). Allelic richness (AR)-standardized for variation in sample size-was calculated per population for each locus using FSTAT 2.9.3.2 (Goudet, 2001). Genotypic linkage disequilibrium (LD) between all pairs of loci, as well as deviation of genotypic frequencies from the Hardy-Weinberg equilibrium (HWE) per population per locus, were tested using GENEPOP Web ver. 4.2 (Raymond & Rousset, 1995;Rousset, 2008) with the Markov-chain approach modified from Guo and Thompson (1992) using 10,000 steps of dememorization and iteration. The 95% significance levels for HWE and linkage disequilibrium (LD) tests were adjusted using a Bonferroni correction (Rice, 1989). The fixation index (F ST ) (Weir & Cockerham, 1984) between all pairs of populations was estimated based on the infinite allele model of mutation using Arlequin v. 3.5 (Excoffier & Lischer, 2010). The significance of the F ST between all pairs of populations was calculated using Fisher's exact test based on 10,000 permutations. Principal coordinates analysis (PCoA) via covariance with standardization of the population genetic distances was performed to detect and plot the relationships between populations using GenAlEx ver. 6.5 with default parameters (Peakall & Smouse, 2012). STRUCTURE ver. 2.3.3 (Pritchard, Stephens, & Donnelly, 2000) was employed to identify the true number of subgroups (K) using the method described by Evanno, Regnaut, & Goudet (2005). An admixture model with correlated allele frequencies was used, with the K-value ranging from 1 to 10. Ten independent runs were performed for each K-value, with a burn-in period of 10,000 iterations, followed by 50,000 iterations for data collection. The structure result was visualized using the web-based tool, STRUCTURE HARVESTER ver. 0.6.8 (Earl & von Holdt, 2012).

Sequencing and microsatellite selection
The sequencing of 150 bp paired-end reads from the Illumina library resulted in a total of 160,044,678 reads (Table 2). A total of 251,215 assembled scaffolds with an average length of 3,192.18 bp were retained for microsatellite marker identification (Table 2). Trinucleotide repeats were the most abundant class of microsatellites (45,160 regions) in the L. angelina genome, followed by dinucleotide (11,536 regions), tetranucleotide (5089 regions), pentanucleotide (201 regions), and hexanucleotide (94 regions) repeats (Table 2). Sequencing data were deposited in the Sequence Read Archive of GenBank (accession number SRR8432568). After testing the candidate microsatellites for the availability of primer sites, amplification efficiency, degree of polymorphism, and specificity for target loci, 10 markers were eventually selected for use in subsequent genotyping (Table 1).
The analysis of 43 genotyped L. angelina samples from three South Korean populations (20, 10, and 13 L. angelina from Ansan, Gwangmyoung, and Seocheon, respectively) revealed that the observed number of alleles at each locus ranged from 6 to 18, and availability ranged from 0.93 to 1 ( Table 1). Tests of genotypic LD showed no significant association of alleles among the 10 loci after applying Bonferroni correction, indicating that all loci can be considered as independent markers. The GenBank accession numbers of the 10 loci are listed in Table 1. At each locus level, the observed number of alleles, H O , and H E were 4-13, 0.211-0.950, and 0.659-0.871, respectively, in the L. angelina samples at Ansan, which had the largest sample size (20 L. angelina samples; Table 3), thereby validating the suitability of the markers for population analyses. In the samples from Ansan, six of the 10 loci showed significant deviation from the Hardy-Weinberg equilibrium.

Population genetic analysis
The allelic patterns across populations indicated an absence of obvious differences among populations in terms of mean total number of alleles per population, number of effective alleles,  (Figure 2). Within-population gene diversity, which corresponds to H E in diploid data, ranged from 0.784 to 0.815 ( Figure 2). Lower H O than H E and a positive estimate of F IS , which is an evidence for the existence of inbreeding, were detected in all populations (Figure 2). The PCoA based on the first principal coordinate showed that the L. angelina population at Seocheon (the most distantly located region-at least 135 km-from the other two populations) showed a substantial divergence that accounted for 94.79% of the variation (Figure 3). Furthermore, the remaining two populations at Ansan and Gwangmyoung (located only 35 km from each other), did not form an immediately close group based on the second principal component, which accounted for 5.21% of the variation (Figure 3). Pairwise F ST estimates also supported the result of PCoA analysis, highlighting the significant genetic differentiation of the L. angelina population at Seocheon from that at Ansan and Gwangmyoung, whereas no significant genetic difference was found between the L. angelina populations at Ansan and Gwangmyoung (Figure 1). An examination of the likelihood scores from 10 replicate runs across K-values from 1 to 10 indicated that the optimal K-value was 2, suggesting the presence of two genetic groups (Figure 4). The assignment results from K = 2 showed that both gene pools were found in all populations, although the frequency of each gene pool in each population differed. This result indicates that the three populations of L. angelina shared the same gene pools, although the PCoA and F ST supported the isolation of L. angelina population at Seocheon from the two remaining populations.  In conclusion, a suite of polymorphic microsatellite markers specific to L. angelina was developed using a next-generation sequencing technique. Considering the results presented in this study, these markers will be useful for studies on the genetic structure of undiscovered South Korean and Asian populations of L. angelina. This is particularly relevant considering the limited number of previous population genetic studies for this endangered species (e.g. Takahashi et al., 2009). Although the results of this study are based on a limited number of samples, lower H O than H E , positive estimates of F IS , and the substantial isolation of the L. angelina population at Seocheon from the other two populations (besides the lesser distance between the two remaining populations) collectively suggest that the South Korean populations of L. angelina consist of small, more or less isolated, and inbreeding populations. This result is consistent with field observations based on which the species was classified as endangered. Nevertheless, the shared gene pools in all populations indicates that the isolation of the L. angelina population at Seocheon from the other populations may not be sufficient for creating an independent gene pool. As more samples are being collected from different regions of Asian countries, including South Korea, more thorough data analyses of population genetics will be possible.