Molecular genetic variation in sheep of the central Volga area inhabited by Finno-Ugric peoples

Miika Tapio, Mikhail Ozerov MTT Agrifood Research Finland, Biotechnology and Food Research, FI-31600 Jokioinen, Finland Haldja Viinalass Institute of Veterinary Medicine and Animal Sciences of the Estonian University of Life Sciences, 51014 Tartu, Estonia Tatyana Kiseliova All-Russian Research Institute of Animal Genetics and Breeding, Russian Academy of Agricultural Sciences, 189620 St. Petersburg-Pushkin, Russia Juha Kantanen MTT Agrifood Research Finland, Biotechnology and Food Research, FI-31600 Jokioinen, Finland, email: juha.kantanen@mtt.fi


Introduction
The northern European area, extending from Iceland and Fennoscandia to the Urals, has ancient indigenous sheep that belong to the northern short-tailed breed group.In contrast, the majority of sheep in other parts of Europe belong to thin-tailed breed groups with long or medium-long tails (e.g.Ryder 1983).The short-tailed breeds in the Nordic and Baltic countries are well documented (e.g.FAO 2006) and an extensive set of these breeds were recently molecularly characterized and assessed for conservation prioritisation (Tapio et al. 2005b, Tapio 2006).Among the Russian short-tailed sheep the characterization is lacking and the Romanov breed is the only short-tailed sheep type that has been documented in detail (Semyonov and Selkin 1989).In recent molecular analysis (Tapio et al. 2005b, Tapio 2006) this breed was, however, represented by an imported Romanov population in Lithuania.That study also assessed one local non-institutionalised sheep population, the Viena sheep, from Russian Karelia.In our terminology, 'an institutionalised breed' has a herd book or other pedigree recording system and its' breed characters are determined and controlled by a breeding society.The Romanov and the Viena sheep have been suggested to be of high conservation value (Tapio et al. 2005b).The study, however, had two limitations.Firstly, the sheep in the vast area lying between Moscow and the Urals were not included.Secondly, the Romanov sample was taken from outside the country of origin and the population might have experienced a genetic bottleneck, resulting in decreased variation and, possibly, resulting in a wrongly inferred conservation value.For these reasons, there is a need to characterize Russian short-tailed sheep more extensively.
The Romanov breed is the only Russian institutionalised sheep breed in the northern short-tailed breed group.The main area of distribution of the breed has been the Yaroslavl Region north of Moscow.Earlier, there was another short-tailed breed, the Nolinsk, in the Kirov Region further to the east (Litovchenko and Yesaulova 1972).Although the Nolinsk became extinct, the central Volga area has several non-institutionalised sheep types without herd books.The non-institutionalised sheep types could be very important for the maintenance of Russian sheep genetic resources as the Nolinsk became extinct and the number of pure-bred Romanovs in Russia has sharply decreased (Marzanov and Samorukov 2006).
The central Volga area contains Finno-Ugric republics.Although the Finno-Ugric peoples in the area are linguistically close to other Finno-Ugric peoples, such as the Finns and the Estonians, genetic studies support genetic propinquity with the geographically neighbouring (Finno-Ugric or non-Finno-Ugric) peoples (Rosser et al. 2000, Bermisheva et al. 2002).This does not support the idea that Finno-Ugric peoples would have remained as completely isolated islands separated from the surrounding Slavic populations.Thus, the Finno-Ugric and Russian areas in the north should form a continuum where genetic and cultural links have crossed the ethnic barriers.Consequently, the genetic variation in native sheep may have been shaped by isolation by distance, and the sheep owned by Finno-Ugric peoples should be analyzed together with the Romanov sheep.
Our study focuses on non-institutionalised central Volgaic sheep from the rural areas of four autonomic Republics of the Russian Federation (Komi, Mari El, Mordovia and Udmurt) (Table 1, Fig. 1).The analysis aims to determine whether the sheep populations are relatively purebred, old indigenous populations or highly crossbred with some exotic breeds.Phenotypic features of the sampled animals (including semi-long tails, Table 1) suggest some influence from exotic breeds.However, since it is not known if the influence is from long-tailed or from semi-long tailed breeds, and since artificial selection may have operated, these observations are not sufficient for inferring the degree of admixing.For comparison, the characterization of the four central Volgaic varieties was done together with four additional non-institutionalised sheep varieties (the Russian Viena sheep, the Saaremaa and Ruhnu in Estonia, and the Grey Finnish Landrace), as well as with four populations of institutionalised breeds as reference breeds (the Estonian Whitehead and Blackhead Sheep, the Finnsheep, and two samples of the originally central Russian Romanov breed) (Table 1, Fig. 1).The Estonian reference breeds are partly of local origin, but have been graded to British breeds and later crossed with some other common west European breeds.They represent "cosmopolitan western breeds" in the current study.The Finnsheep and the Romanov are predominant native breeds of their regions and are included as purebred local reference breeds.The currently described Russian Romanov population is the first sample of the breed from the site of origin analyzed for its microsatellite variation.1, Fig. 1) were taken from sheep from several flocks and villages and sampling within a flock was random.In addition, 27 Romanov sheep were sampled in the Yaroslavl region in central Russia, which constitutes the geographical origin of the breed.The remaining data for the 225 individuals from 8 populations (Estonian Whitehead, Estonian Blackhead, Estonian Ruhnu and Estonian Saaremaa, Finnsheep, Grey Finnish Landrace, and Romanov in Lithuania) and the sample preparation and semiautomatic fragment typing methods were described by Tapio et al. (2003Tapio et al. ( , 2005aTapio et al. ( , 2005b)).

Genetic variation at 20 microsatellite loci (
Genetic variability was quantified as the total number of alleles, unbiased expected heterozygosity or gene diversity (Nei 1987), and mean allelic richness (El Mousadik and Petit 1996) corresponding to the mean of the expected numbers of alleles over loci for a sample size of 20 diploid individuals or 40 chromosomes (i.e. r[40]).These estimates were calculated using FSTAT 2.932 (Goudet 1995).The same software was used to measure and test among-breed and within-breed fixation indices, θ and f of Weir and Cockerham (1984) corresponding to Wright's F ST and F IS , respectively.Theta (θ) quantifies excess of homozygotes in the total population due to population differentiation, and small-f (f) quantifies the excess of homozygotes within each population or the average over the populations.
The structure of genetic relationship among populations was investigated to discover whether some local sheep varieties appear as intermediates of the reference breeds.is located east of the geographical origin of any of the reference breeds, including the Romanov (Fig. 1).Assuming isolation by distance, the purebred native sheep populations should form a genetically remote group, while an intermediate position would suggest them to be crossbreds.Relationship structure was explored using principal components analysis (PCA), NeigborNet (Bryant and Moulton 2004) and the Population Graph method (Dyer and Nason 2004) 1967).The Population Graph method explores the covariation in the whole population set.The method initially considers all n(n -1)/2 links between n populations and it subsequently removes the connections whose exclusion does not significantly reduce the fit between the population network model and the data.Thus, the remaining links among the populations are necessary to explain the data, while a missing bond between a pair of populations indicates that the potential similarities in their gene pools are caused indirectly through other connecting populations as described in the network.Network construction was done using an online population graph server (available online at http://dyerlab.bio.vcu.edu/).The network was added on to the PCA plot, with PCA results determining the coordinates for the population node.
Graph structure was tested using binomial testing described by Dyer and Nason (2004).Graph size and node centrality were explored using Agna 2.1.1 (Benta 2003, available online at http://www.geocities.com/imbenta/agna/).The size of the graph is the largest value among the shortest paths between every pair of populations.Evaluating the closeness of a node to the center of the graph was done using the Bavelas-Leavitt centrality measure (Bavelas 1948).For each node, this measure is the ratio of the sum of the shortest paths to and from the node to the sum of all of the shortest paths in the entire graph.
The allele frequency data of each non-institutionalised sheep type was fitted separately to an admixture model using three reference breeds (the Estonian Whitehead, the Finnsheep and the Russian

Results
In the 336 sheep analyzed 259 alleles were detected at the 20 microsatellite loci (Table 2), averaging 13.0 alleles per locus.The total gene diversity was on average 0.80 over loci.The Estonian Ruhnu was the least variable and the local Udmurtian sheep was the most variable population irrespective of the diversity measure used (Table 3).The withinpopulation gene diversity and allelic richness [r40] ranged from 0.52 to 0.79 and from 3.21 to 6.40, respectively.Significant positive f values for the Estonian Saaremaa local (0.177), Komi local (0.231), Udmurtian local (0.097) and Viena local (0.218) indicate that these populations are subdivided (Table 3).Over the populations, some locus-wise f estimates (Table 3) and the combined f (0.063, P < 0.05) were also significantly positive.
The overall differentiation was substantial: Weir and Cockerham (1984) θ indicated 9.4% of the variation to stem from among population variation.The pairwise differentiation estimates varied from 0.018 (between the Whitehead and Blackhead Sheep in Estonia) to 0.249 (between the Estonian Ruhnu and the Komi local sheep).All pairwise θ values, except the one between the Udmurtian and the Mordovian sheep (0.037), were significantly different from zero.The Ruhnu and the Komi local were highly diverged from all other populations (Table 4).
In the PCA plot, the local sheep populations, except the Grey Finnish Landrace, had a more central location than the institutionalised breeds, while the configuration in general revealed three groupings (Fig. 2).The first axis (explaining 17.2% of variance between populations) separated a Volgaic/ Central Russian group from the breeds in the Baltic-Finnic region.The two Romanov populations were the furthest from the Baltic-Finnic sheep.Within the Baltic-Finnic sheep, the second PCA axis (12.5%) separated an Estonian and a Finnish group from each other.There were two populations that did not fall into any of the three groups, central Russian/Volgaic, Estonian or Finnish group.The Viena sheep from Russian Karelia, bordering Finland, appeared intermediate between Finnish and central Russian/Volgaic groups and neighboured the Finnish group in the plot.Furthermore, the Estonian Ruhnu sheep in the Baltic-Finnic group was equidistant from the Estonian and Finnish clusters.3. Sample sizes (N), gene diversity (He), allelic richness corresponding to the expected number of alleles in a sample of 20 diploid individuals (r[40]).Withinpopulation fixation index (f) for testing excess of homozygotes in the sample.Based on population graph analysis (Fig. 2), the table indicates the number of links to other breeds (Degree) and Bavelas-Leavitt Centrality (Centrality) measuring connectedness of a breed and the closeness of a breed to the network centre, respectively.Three last columns indicate mY admixture coefficients (bootstrap average coefficients ± standard deviation) assuming local varieties being formed by admixture of the three indicated breeds.For Komi and Ruhnu populations, coefficients were recalculated without the parental population making a negative contribution.Standard deviation over loci for the diversity estimates is in parentheses.
If the external branches are ignored, constructed NeighborNet graph (Fig. 3) was remarkably similar to the PCA plot.A notable difference was that the cluster of three central Volgaic varieties, Mari, Mordovian and Udmurtian local, is located slightly closer to the Estonian Whitehead Sheep than to the Russian Romanov.
Population Graph-network (Fig. 2) regularly connected breeds that neighboured each other in the PCA plot.Analysis of the network structure revealed that the Udmurtian, Mari and Komi local sheep in the centre of the PCA plot also had the largest network centrality values (7.16,Table 3).The network contained 28 links between populations, which constituted 36% of the 78 possible links.A population was connected to 2-5 other populations with an average degree of 4.3 (Table 3).At the maximum, the shortest path between a pair of populations included going through two other populations, which meant using three links (i.e.graph size = 3).The binomial network structure test agreed with the main PCA observation, indicating that the Russian populations are separated from the Baltic-Finnic ones.There was a significant lack of links crossing the border of the Russian Federation (test-wise P = 0.017), while there was no significant network subdivision into Finnish (P = 0.45) or Estonian (P = 0.18) sub-networks.
The Volgaic local sheep populations were geographical outliers, but located centrally in the PCA plot, NeigborNet graph as well as in the net-  work.This differed from the expectation based on isolation by distance, and suggested that the local populations have admixed origin.Admixture estimates assumed each local sheep type to have originated as a mixture of three breeds in the region: the Estonian Whitehead, the Finnsheep and the Russian Romanov.It should be emphasised that this analysis was descriptive and did not aim to prove that the local populations originated from these three specific breeds, but they were used to represent three wider diverged gene pools and in addition, the Finnsheep and the Romanov represent purebred native breeds in their original breeding regions.The three breeds fitted well as hypothetical parental populations because they were located on the rim of the PCA plot (Fig. 2).In Russia, the estimated contribution of Estonian Whitehead type ancestry to the local sheep type was 34-70%, with the 34% contribution to Viena local being noticeably lower than that to the other Russian populations (Table 3).The Romanov type influence varied from 0 to 56% and the Finnsheep type influence varied from 0 to 34%.The Romanov type contribution to the Komi local was noticeably high suggesting a higher proportion of indigenous Russian ancestry than in other central Volgaic populations.In Estonia, the Estonian Whitehead influence on the local sheep types was estimated to be very large, and the values ranged from 61 to 73%, while the remaining ancestry was attributed to Finnsheep.The Romanov type ancestry has a negligible influence on the the studied Estonian sheep.In Finland, the admixture analysis suggested the Finnsheep type ancestry to be the most important one (56%) for the Finnish Grey Landrace.

Discussion
We There were three different types of evidence suggesting Volgaic varieties to have been influenced by common western breeds.First, this was suggested by the appearance of the sheep (Table 1).The most obvious sign of crossbreeding was the tail-length, which was at least twice that of purebred short-tailed breeds in the Nordic countries.As in an earlier study (Tapio et al. 2005b), assessing northern European sheep diversity, a long tail appears to be a good indicator of crossbreeding.
Second, the analysis of relationships among the sheep varieties revealed a central, rather than a peripheral, genetic position for the Volgaic sheep populations.This deviation from the pattern expected under isolation by distance can be interpreted as a sign of crossbreeding.However, results of an earlier study on the variation of sheep mitochondrial DNA in these populations (Tapio et al. 2006), as well as mitochondrial and Y-chromosomal studies in humans (Bermisheva et al. 2002, Rosser et al. 2000), support the idea that geographical and genetic affinities co-occur in the area.Since the phenotypic traits already indicate some level of crossbreeding due to the influence from exotic breeds (likely from western Europe, Semyonov & Selkin 1989), the relationship pattern can be considered as evidence for substantial non-native ancestry.
Finally, the levels of within-population variability, and fitting of the admixture model with three reference breeds as parental populations, supported the idea concerning the influence of cosmopolitan breeds: the Udmurtian and Mari local populations, together with the synthetic Estonian Whitehead Sheep, were the most variable populations.They were even more variable than the Finnsheep, which was earlier shown to be the most variable one among 32 northern European breeds (Tapio et al. 2005b).The variable Estonian breed is known to have an admixed ancestry.In the case of the variable Volgaic varieties (Mari, Mordovian and Udmurtian local), the fitting of the admixture model, considering the Estonian Whitehead as one ancestor, suggested that over 60% of the ancestry comes from this "cosmopolitan western breed".A similar proportion was obtained for the long-tailed Estonian Saaremaa sheep and for the Estonian Ruhnu sheep with variable tail-length, while the estimated Whitehead contribution for the presumably purebred, short-tailed Finnish and Russian-Karelian varieties was approximately half of this (Table 3).Within-breed subdivision may elevate the diversity estimates if subpopulations maintain distinct samples of ancestry.However, several of the studied Volgaic varieties did not show significant subdivision and comparison to Viena sheep suggests that subdivision is not sufficient to explain the observation.The frequency based admixture estimates are likely to be biased towards equal contribution from the suggested parental populations (Bertorelle andExcoffier 1998, Dupanloup andBertorelle 2001), but the estimates suggesting high Whitehead and low Finnsheep contribution for the three highly variable Volgaic varieties clearly deviate from this symmetrical pattern and support major "western" influence in the populations.Notably, the indications for such a crossbred ancestry are weaker for the Komi variety and this result is supported by the NeighborNet graph.
The admixture result suggests primarily a western (i.e.Estonian Whitehead) origin for the central Russian varieties.This agrees with the neighbour net graph (Figure 3), where three out of four varieties are located closer to Estonian Whitehead than Russian Romanov.On the other hand, it disagrees with the PCA plot configuration (Fig. 2), where the varieties are proximal to the Romanov sheep.Non-neutrality or non-amplifying microsatellite alleles could be an explanation for the disagreement, but, for example, excluding the four loci with the largest excess of homozygotes within-populations (INRA23, MAF48, OarCP34 and OarFCB304; Table 1) had only a minor influence on admixture estimates and the PCA plot.The disagreement between the admixture estimates and the PCA plot rather stems from the plotting method used, which is highly influenced by sharing of rare alleles but less by frequency differences in the common alleles.In addition to the neighbour net plot, a principal coordinates plot based on pairwise θ values (Table 4), would be consistent with the admixture estimates, but this would require excluding the two highly diverged populations, Ruhnu and Komi local (not presented).
Two cases of negative contributions were observed (Table 3): Finnsheep to Russian Komi local, and Romanov to Estonian Ruhnu.Negative estimates might occur for a variety of reasons (Bertorelle and Excoffier 1998, Alvarez et al. 2004).In our case, a probable cause for negative estimates is that the given parental population has not in reality contributed to the admixed population (Bertorelle and Excoffier 1998) or a derived hybrid population is treated as a parental population (Dupanloup et al. 2004).Negative estimates might also indicate violation in the assumed model and suggest e.g.reciprocal gene flow (Bertorelle and Excoffier 1998).Since the negative estimates did not differ significantly from zero, we here considered negative estimates as evidence of zero contribution, and recalculated the estimates without the respective parental population.This differs from the interpretation by Alvarez et al. (2004), who suggested that negative contributions are indicative of ongoing admixture.Our interpretation to exclude the parental population showing a negative contribution was supported by model-based clustering results (Pritchard et al. 2000), assuming three populations and updating allele frequencies based on the three reference populations (data not reported in detail): the contribution estimates of Dupanloup and Bertorelle (2001) and those of Pritchard et al. (2000) demonstrated a strong linear relationship (r = 0.87).The two methods indicated otherwise matching estimates but the model-based clustering suggested more equal contributions from the parental populations: the respective ranges were 0-0.73 and 0.10-0.59,and zero ancestral contribution sensu Dupanloup and Bertorelle (2001) corresponded to contribution of 0.18 sensu Pritchard et al. (2000).
The Lithuanian and Russian Romanov populations were significantly differentiated but resembled each other more than any other breed.A relatively closer relationship between an Egyptian Romanov and the currently included Lithuanian Romanov was observed in the previous study by Tapio et al. (2003).Blott et al. (1998) demonstrated a similar genetic pattern among the different national populations of the widely spread Hereford cattle breed.Furthermore, the three Romanov populations (Egyptian, Lithuanian and Russian) have all shown remarkably similar levels of molecular variability.This suggests that breed comparisons based on molecular genetic variation are not necessarily always sensitive to the population samples studied.
Our results together with the observed phenotypes indicate that the non-institutionalised sheep varieties in the central Volga area have been influenced by exotic breeds.It seems that pure ancient varieties can be found only in the most peripheral regions, such as the Viena Karelia in the present study area.The apparent crossbred ancestry makes the central Volgaic varieties less interesting for a conservation programme focussed on northern short-tailed sheep though the breeds may still harbour ancient alleles not present in other breeds.Extensive crossbreeding highlights the importance of the purebred Romanov and Viena sheep.Among the central Volgaic populations, the less admixed Komi variety appears as the most interesting candidate for a conservation programme.

Fig. 1 .
Fig. 1.Geographical locations of the studied north-eastern European sheep populations.

Fig. 2 .
Fig. 2. Two first principal components (PCs) describing relationship structure among the studied north-eastern European sheep populations.Ellipses represent three clear clusters: Russian, Estonian and Finnish groups.Links between populations are the population graph network explaining the covariation structure among the studied populations. 21

-
Table 4. Theta (θ)  estimates between population pairs below the diagonal and statistical testing above the diagonal asterisk indicating significant differentiation and NS indicating no differentiation.Results based on 1560 permutations.

Fig. 3 .
Fig. 3. NeigborNet graph based on Chord distances describing relationship structure among the studied northeastern European sheep populations.

Table 1 .
The Volgaic region The phenotypic characters of the studied sheep.
(Huson and Bryant 2006))994)Cs are uncorrelated.Analysis was performed for correlations of standardized allele frequencies according toCavalli-Sforza et al. (1994)using Ade-4(Thioulouse et al. 1997).NeighborNet is similar to the common Neigbor joining method, but by showing reticulations it can represent alternative trees in presence of distinct phylogenetic signals, which may arise, for instance, from gene flow between populations (seeBryant and Moulton  2004 for details).NeighborNet was constructed using SplitsTree 4(Huson and Bryant 2006)and was based on Chord distance (Cavalli-Sforza and Edwards . When applied to allele frequency data, each successive principal component (PC) aims at explaining the maximum variance in the allele frequency table.
studied four non-institutionalised local central Volgaic sheep types from the rural areas of Russian Finno-Ugric republics and compared them with institutionalised and non-institutionalised sheep types from the Nordic-Baltic and central Russian area.Based on the peripheral geographical location compared with other included populations, central Volgaic local sheep populations were anticipated to form a diverged group of short-tailed sheep populations, which would be very important for conservation of northern short-tailed sheep genetic resources.The present results show that the Volgaic sheep populations are highly variable and that this diversity is likely to have resulted from a mixture of local and exotic ancestry.Noticeable exotic influence makes the central Volgaic populations less attractive for conservation programmes for northern short-tailed sheep diversity.However, they may still harbour ancient unique alleles.For characterization and maintenance of this genetic diversity, the Komi local can be considered as the most interesting central Volgaic population due to its less extensive exotic origin.The evident crossbreeding highlights the importance coordinated conservation programmes, which should at least consider the Romanov, the northern Karelian non-institutionalised Viena sheep population and the Komi local.