Assessing the genetic differentiation of Holstein cattle herds in the Leningrad region using Fst statistics

Holstein bulls and semen have been imported to Russia from Western countries since the 1970s. The objective of our study was to examine the effect of this introgression on genetic diversity between various commercial Holstein herds in the Leningrad region. A total of 803 Holstein cows from 13 herds were genotyped using the Illumina BovineSNP50 v.2 array. The pairwise Hudson’s estimator of Fst values between 13 herds varied from 0.002 to 0.015, which is less than values usually obtained between dairy cattle breeds (> 0.1). The mean of these pairwise Fst values revealed differences between herds depending, mainly, on the proportion of common sires shared between the herds. In addition, we investigated the cause of negative Fst values. Based on our results, these negative values could be interpreted as an excess of within-herd genetic diversity over the between-herds genetic diversity. Our results show that introgressions of Holstein genes into Russian Black and White cattle of the Leningrad region have created genetic separation between herds similar with those for Jersey cows in USA, Australia and New Zealand.


Introduction
Minimizing the rate of inbreeding and maintaining genetic diversity are important goals of all successful dairy cattle breeding operations.Pedigree information is very helpful in the management of population genetic diversity.With the new single nucleotide polymorphism (SNP) array technology, massive SNP data can be used for more accurate estimation of genetic diversity both within and between herds.
The Leningrad region is the leading dairy farm region in Russia, with approximately 60 000 cows of holsteinized Black and White cattle.Dutch, Danish, and Swedish Black and White bulls and heifers were imported to Russia during the 1930s.The Black and White breed was officially approved in Russia in 1959.To improve the genetic potential of Black and White cattle, local breeders have used Holstein bulls and semen imported from USA (since 1978) and the Netherlands (since 2002).Because of the importation of Holstein semen, the current commercial Russian Black and White cattle population can be considered Holstein.Understanding the effect on genetic diversity of the introgressions of Holstein genetics into Russian Black and White cattle at the herd level is therefore interesting.
Several tools have been developed to provide estimates of genetic diversity among populations.One of them, Wright's F st fixation index, is a widely used measurement of population diversity.The obtained degree of differentiation within and between populations depends on the power of F st statistics.The methods and issues associated with the estimation of genetic diversity using F st statistics have been covered in many excellent reviews (e.g.Weir and Cockerham 1984, Weir and Hill 2002, Holsinger and Weir 2009, Meirmans and Hedrick 2011, Leinonen et al. 2013).Several F st estimators are currently available, making the choice difficult (Bhatia et al. 2013).The most commonly used estimators are those presented by Weir and Cockerham (1984) and Nei (1986).The first one is sensitive to sample size and the second one consistently overestimates F st (Bhatia et al. 2013).An advantage of the above mentioned F st estimators is that they are easy to calculate and do not require any assumptions concerning the shape of the distribution, other than mean and variance.Another example, the Hudson's estimator (Hudson et al. 1992), is not sensitive to sample size ratios, does not systematically overestimate F st , and is accurate and stable under various ascertainment schemes (Hudson et al. 1992, Bhatia et al. 2013).The Hudson's estimator of F st and the corresponding block-jackknife estimator for the standard error (SE) of F st have been implemented in the EIGENSOFT 6.0.1 software package (Patterson et al. 2006).
Our objective was to examine the effect of the introgressions of Holstein genes into Russian Black and White cattle on genetic diversity measured as F st between commercial dairy herds within the Leningrad region.

Animals and SNP genotyping
Thirteen dairy farms with Russian Holstein cattle from the Leningrad region with the highest milk production (milk yield from 6700 kg to 11300 kg) were selected for our study.From each herd, 44 -85 randomly selected cows were genotyped with either the Illumina BovineSNP50 v.2 chip (526 cows born in 2013) or with a custom made Illumina BovineSNPIDB v.3 chip (303 cows born in 2011-2013).In total, 8.5 -15% of the cows were genotyped from each herd.All genotyping was performed in Ireland (Weatherbys Co. UK).PLINK 1.9 software (Pursell 2007) was used for quality control.Imputation of the BovineSNP50 v.2 chip data to BovineSNPIDB v.3 chip data was carried out with the Beagle software (Browning and Browning 2016).In the first quality control step, SNPs with quality scores (gencall in the Illumina's Genome Studio) less than 0.7 were removed.Next, all SNPs with more than 5% of missing genotypes, SNPs that deviate from the Hardy-Weinberg equilibrium at p < 0.0005, and SNPs with minor allele frequency less than 0.01 were removed from the data.Only autosomal chromosomes were considered.After imputing, 41 593 SNPs remained for the analysis.The requirement for the sample (genotyped cow) call rate was 0.995, resulting in the rejection of 26 samples, and thus the final number of genotyped cows in the data was 803.
The full data, later called Sample 1, included all 803 cows from the 13 herds.The proportion of cows with a common sire varied from 0% to 35% in the set of 78 pairwise comparisons (hereafter called Pairwise set) made between these 13 herds.In Sample 2, cows with a common sire were removed from the Pairwise set.Comparing the results from Samples 1 and 2 allowed us to determine the sensitivity of Hudson's F st statistics.Moreover, the construction of these two sample sets enabled us to assess the effect that cows with common bulls had on between-herd F st values.
F st values were estimated with EIGENSOFT 6.0.1 software (Patterson et al. 2006).The Hudson's estimate of F st (Bhatia et al. 2013) for a single SNP is: where n i is the sample size and is the sample allele frequency in herd i (i ϵ [1,2]).The final pairwise F st estimates between herds were averaged across 41 593 SNP.The standard errors of the Hudson's F st estimate were calculated using a block-jackknife approach implemented in EIGENSOFT.The standard error of the mean for herd i was calculated as = , where n = 13, and

Pairwise comparisons between herds
The pairwise F st values calculated between 13 herds for Samples 1 and 2 are shown in Table 1 and the summary of these comparisons in Table 2. Pairwise F st values between the herds in Sample 1 varied from 0.002 to 0.011, and from 0.002 to 0.015 in Sample 2. On average, the F st values were 0.0014 higher in Sample 2 (mean F st = 0.0066) than in Sample 1 (mean F st = 0.0052) after the exclusion of cows with a common sire in both herds (Table 2).As expected, the removal of these cows increased the between-herds genetic differences up to 0.005 points.
The pairwise comparisons of herd 4 with the other 12 herds showed the largest F st values compared to all other herd comparisons (Table 1).We assume that this is caused by the heavy use of bulls from the Netherlands between 2000 -2007 in herd 4, while bulls imported mainly from the USA and Canada were with more recent ancestry used in the other 12 herds (Table 2).On the other hand, the pairwise comparisons of herd 3 (in Sample 1) with all other herds revealed the smallest F st values (Table 1).This relates to the high correlation between the proportion of cows with the same sire in different herds and the F st values (Table 2).This is confirmed by the strong negative correlation (-0.64) between the proportion of cows with the same sire and pairwise herd F st values for herd 3 (Table 2).
Herd 7 is characterized by a very small proportion (5 of them had zero proportion) of cows with common sires in other herds (Table 1).Herd 7 therefore, had the smallest difference between the mean pairwise F st values between Samples 1 and 2 (0.0003) among the studied herds (Table 2).On the contrary, herd 8 had a large number of cows with common sires in other herds (Table 1).However, this led to only a moderate increase in the mean F st value (0.008) after the exclusion of these cows (Table 2).In Sample 1, herd 8 is the second most similar to the other 12 herds after herd 3 (mean F st = 0.0038 for herd 3 vs.mean F st = 0.0039 for herd 8), and it remains the most similar to the other herds after the exclusion of cows with common sires (mean F st = 0.0047, Sample 2, Table 2).
In Sample 1, all other 78 pairwise combinations of the mean F st between 13 herds were statistically significant (p ≤ 0.05), except between herd 1 and herds 2, 6, and 12, between herd 2 and herds 6 and 12, between herds 3 and 8, between herd 5 and herds 10 and 11, between herds 6 and 12, and between herds 10 and 11 (Table 2).The insignificant comparisons relate well with the proportion of common sires shared between the herds (compare insignificant data for Sample 1 and Sample 2).

Limits related to F st estimates
In theory, F st value should be zero if a population (here a herd) is compared to itself.However, all the F st values in Sample 1 were negative (Table 3).To find an explanation for the negative values, let us assume that p 1 = p 2 and n 1 = n 2 = n in equation ( 1), then F st = -1/( n-1) (Equation 2).In equation ( 2), F st depends on the number of animals n.The limit value of F st is 0, when n reaches infinity.Thus, Hudson's estimator is not defined (sample size independent) when calculating F st values for one population.
To further investigate the cause of the negative F st values, we carried out a hypothetic simulation where two herds were compared with differing proportions of common cows.We selected herd 2 as one of the herds because it had the largest number of cows (85 cows).Herd 4 was selected as the other herd because it differed genetically the most from the other herds (Table 2).First, we calculated the F st value of herd 2 with itself (Table 3).Then, step by step, we replaced 17.6%, 35.3%, 53.0%, 71.0%, and up to 88.2% of the cows in herd 2 with the cows from herd 4, which resulted in five artificial herds (4a -4e in Table 3).Each replacement was performed with the same sample of 15 cows, i.e. 17.6% corresponds to one set of 15 cows, 35.3% corresponds to two sets of the same 15 cows etc.We should remember that Hudson's estimator can be written as F st = 1-(H w / H b ), where H w is the mean number of differences within populations, and H b is the mean number of differences between populations (Bhatia et al. 2013).Firstly, if a population (here a herd) is compared to itself then F st = 1-(H w / 0), that F st is indefinite.Secondly, mathematically the result of the simulation can be proposed through equation 3.
(Equation 3) , where is the herd 2 residual genetic difference in artificial herds 4a -4e, is the herd 4 added genetic differences in artificial herds 4a -4e and H b is between herd 2 and artificial herds 4a -4e genetic differences.Based on the results in Table 3, we observed a threshold, below which Hudson's estimator calculates negative F st values.In our case, the threshold was reached when both herds shared approximately 50% of the same cows.Thus, Hudson's estimator before this threshold estimates within genetic difference larger than between herds genetic difference and F st values become negative.In other words, the estimation of F st values depend on both the within herds differences, and between herds differences.Thus, if the average difference within a population is greater than the average difference between populations, the F st values will be negative.As soon as H w is equal to H b , the F st value becomes zero, with a threshold reached when H w is equal to H b .In our case, all pairwise F st values are positive.This means that within-herd genetic differences were smaller than between the herds.

Discussion
Numerous studies have been carried out to estimate the genetic diversity between cattle breeds and populations using F st (e.g.McKay et al. 2008, Gautier et al. 2010, Wilkinson et al. 2011, Edea et al. 2013, Rothammer et al. 2013, Cauas-Alvarez et al. 2015, Kelleher et al. 2016).These studies have observed varying F st values, from very small between different populations of the same breed, e.g. the F st value between Guernseys bulls and cows from the UK and the islands was 0.006 (Cooper et al. 2016), to relatively large values between dairy and beef breeds, e.g. the F st value between Braunvieh and Galloway breeds was 0.16 (Rothammer et al. 2013).Jersey cattle is another example of the genetic differences between different populations of the same breed (Howard et al. 2015).The mean F st value for Australian Jersey cows versus US Jersey cows was 0.008 and the average F st value for US Jersey cows versus New Zealand jersey cows, and Australian Jersey cows versus New Zealand Jersey cows were 0.029 and 0.009, respectively.
In our study, between-herd F st values (on average 0.006) were lower than values between different populations of Jersey cows (Howard et al. 2015) and considerably lower than values between different breeds.Thus, we can conclude that herds in the Leningrad region are less diverse than various populations of the same breed globally.This is a consequence of using the same bulls or bulls with the same origin in different herds.
Negative F st values are obtained occasionally.Certain authors do not report such values.However, for example Wilkinson et al. (2011) claimed that the F st estimator by Weir and Cockerham (1984), can get negative values if alleles, drawn at random from within a population, are less similar to one another than those drawn from different populations.Negative F st values were reported in our study, and based on Equation 2 and a simple simulation (Table 3) we show that there is a limit value depending on proportion of shared genetics after which negative of F st values can be expected.

Conclusions
F st -statistics was carried out to reveal genetic diversity of Holstein cows between herds in the Leningrad region of Russia.The observed between-herd F st values ranged from 0.002 to 0.015.These estimates are smaller than those obtained between various populations of the same breed, and far smaller than those obtained between breeds.In addition, the absence of negative F st values indicates that within-herd genetic differences are smaller than between herds.The exclusion of cows with common sires slightly increased the pairwise F st values, but its effect was small.We believe that the methodology can be used to monitor the between-herd genetic diversity of cattle populations within and across countries.This approach can also be applied to other farm animals and is useful in breed conservation.

Table 1 .
Estimates of F st statistics between herds F st values for Sample 1 are above the diagonal and F st values for Sample 2 are below the diagonal.SE of the estimates varied from 0.0002 to 0.0005; b Proportions of cows born from common bulls are in parentheses. a

Table 2 .
Mean of F st values between a particular herd and other herds using either Sample 1 (F ST Country of origin of the sires of the genotyped cows, NL = the Netherlands; b MSE varied from 0.00006 to 0.0001 depending on herd; c Pearson's correlation between the proportion of cows with the same sire and pairwise herd F st values for Sample 1; * p ≤ 0.001, other differences between F st values had p ≤ 0.0001 Table3.F st values when a herd is compared to itself and when herd 4 is compared to herd 2 with different proportions of cows from herd 4 a a Standard error; b Percent of herd 4 cows in herd 2