Bias in genetic evaluation using random regression test-day model

The effect of an upgraded Finnish evaluation model on bias in estimated breeding values for protein yield was investigated. Evaluations based on repeatability animal model and on random regression test-day model without and with heterogeneous variance adjustment were compared. Comparisons were based on the average difference between pedigree indices and the future estimated breeding values, based on own or on daughter performance records. This was defined as empirical bias. The pedigree indices were computed from reduced data sets where four years of the most recent data were excluded. Results showed an upward bias in the protein yield pedigree indices for Ayrshire young sires of 2.2 kg, 2.5 kg and 1.8 kg from the repeatability animal model, random regression test-day model and random regression test-day model with heterogeneous variance adjustment, respectively. Pedigree indices for daughters of young sires were upward biased, whereas pedigree indices for daughters of proven sires were slightly underestimated when heterogeneous variance was not accounted. Inclusion of test-day yields from the fourth lactation onwards increased the bias. Moving from repeatability animal model to random regression test-day model did not reduce the bias, whereas adjustment of heterogeneous variance reduced bias.


Introduction
Genetic evaluation for production traits in Finnish dairy cattle has always relied on the most advanced methods available.In 1990, the sire model evaluation was replaced by a single trait repeatability animal model evaluation (Strandén and Mäntysaari 1992).Ten years later, a multiple-trait random regression (RR) test-day (TD) model (Lidauer et al. 2000) was implemented and this model was replaced by the joint Nordic test-day model in 2006 (Mäntysaari et al. 2006, Lidauer et al. 2006).The TD model was adopted because it was considered advantageous over the repeatability animal model (RPAM) for several reasons: first and later lactations are considered as two different traits and the breeding value for each trait is presented by a RR function, which accommodates breeding values for part lactations, 305-day yield, and persistency.Further, it accounts for the stage of pregnancy and allows better modelling of the herd environment.The joint Nordic test-day model added at least two more improvements to the Finnish dairy cattle yield evaluation.It incorporates additional sire information through modelling their daughter's performances in the neighbouring countries and it accounts for heterogeneous variance.Better modelling of the environmental effects in the TD model and adjustment for heterogeneous variance should reduce a possible bias in the breeding values.Uimari and Mäntysaari (1993) studied the reliability of the Finnish repeatability animal model evaluation by comparing pedigree indices (PI), i.e., the average of the parents estimated breeding values (EBV), with final proofs and found a potential bias in PI for young sires.They explained that the bias in PI was caused by a bias in EBV for bull dams.Mäntysaari and Sillanpää (1993) showed that a redefinition of the herd effect in the model to be a herd × parity interaction was the most efficient way of reducing the bias.This was not practical because of the small herd sizes.About 20% of the cows would have been alone in their cow comparison group.Consequently, the herd effect was modelled with two components: a fixed herd × period of five calving years × parity group effect, and a random herd × year × parity group effect, where parity group classes were two, one for first and another for second and third lactation.Lidauer and Mäntysaari (1996) reported that this redefinition reduced the bias considerably.
The use of TD yields instead of 305-d lactation yields increases the amount of observations about tenfold.However, this does not ease the modelling of the herd environment when herd size is small.In typical TD models the herd environment is preferably modelled by a herd × TD × parity interaction effect, but this is not possible for small herds.Modelling the herd effect without a parity interaction is one possibility.It increases the accuracy of EBV for cows from small herds as found by Emmerling (2001), but it may cause potential bias as found by Uimari and Mäntysaari (1993).Hence, the same concept of random subclasses within fixed main effects as was used for modelling the herd environment in the Finnish RPAM was adopted for the Finnish TD model.
The objective of the study was to quantify bias in PI's when firstly upgrading the Finnish RPAM to a reduced rank RR TD model (RRM) and secondly upgrading the RRM to a RRM that accounts also for heterogeneous variance.PI's of young sires, daughters of young sires, and daughters of proven sires were investigated.

Data
The TD and 305-d lactation yield records from the three dairy cattle breeds Ayrshire, Holstein-Friesian, and Finncattle were obtained from the national milk recording of Finland.The TD data consisted of records from all lactations of cows that calved the first time within the period January 1988 through June 2000.A TD record comprised of milk, protein and fat yield, where the milk yield was mandatory.The TD yields recorded before days in milk (DIM) 8 and after 365 were excluded.All cows with observations were required to have the first lactation records to avoid bias in the genetic trend.There were a total of 1,113,629 cows, which had on average 9.4 [14.3], 4.4 [6.7], and 4.4 [6.7] TD yields on milk, protein, and fat yield in the first [later] lactations, respectively, giving a total of 26,399,237 TD records (Table 1).The 305-d data set with protein yield records from the first three lactations contained 1,956,007 observations of all those cows which had at least four TD observations on milk yield in the TD data set.The pedigree data comprised of 1,558,850 cows and 11,825 bulls of the three breeds Ayrshire (79%), Holstein-Friesian (20%), and Finncattle (1%).The genetic differences between unknown parents were described by phantom parent groups categorized by breed, time period, and selection path.

Repeatability animal model
The 305-d protein yields were modelled by a singletrait RPAM including the first three lactations.The model from the national dairy cattle evaluation used until 1999 (Strandén and Mäntysaari 1992) was used, and was the reference model in this work: where y is a 305-d protein yield.The fixed effects were calving age × days open × parity (CADP), calving year × calving season (CYS), and herd × period of five years of calving × parity group (H5YP).The random effects were a random herd × year of calving × parity group (hyp), a genetic animal effect (α), a permanent environmental effect (π) and a measurement error (ε).There were 6 calving age and 6 days open classes per parity, and 6 calving season classes per year (February-March, April-May…).Parity group classes were two, one for the first lactation and another for the second and third lactation.The heritability was 0.3, the repeatability 0.5, and the variance ratio between residual variance and herd-year variance was 1.82.

Random regression test-day model
The TD yields (y) were described by a multiple-trait multiple-lactation reduced rank RRM as was used until 2006 in the national dairy cattle evaluation (Lidauer et al. 2000) and for all the later lactations as, where trait F is the first lactation milk, protein or fat yield and trait L is the later lactation milk, protein, or fat yield.Thus, there were 6 traits in the statistical model.The model for later lactation TD yields described observations from different lactations as repeated observations.The fixed effects were calving age × parity (CAP), days carried calf × parity group (DCCP), production year × month (YM), herd × production year (HY) and the stage of lactation.The latter was described by a regression function of DIM d, nested within calving year × calving season × parity group (CYSP).Corresponding covariables for DIM d were φ(d)' = [ c 1 c 2 c 3 c 4 c 5 ], where c 1 , c 2 , and c3 represent a second order Legendre polynomial at DIM d, and c 4 and c 5 are exponential terms exp(-p 1 d) and exp(-p 2 d), where p 1 was 0.05 for milk yield and 0.10 for protein and fat yield, and p 2 was 0.06, 0.01, and 0.35 [0.04, 0.20, and 0.35] for milk, protein, and fat yield of first [later] lactation, respectively.There were 9 calving age classes within each of the first four parities and one calving age class for each of the remaining parities to account for the differences in the intercepts.Further, there were 5 parity group classes: first, second, third, fourth and fifth plus later parities; 5 days carried calf classes; and 3 calving season classes: November to February, March to June, and July to October.
The random effects were herd × TD (htd), RR function for additive genetic animal effect (a), RR function for animal environmental effects (p and w) and measurement error (e).The RR functions were defined across the traits and within the lactations.

The daily breeding value of a cow o on a DIM d in the first lactation was
where the covariables in s(d) F were defined within the trait F. Similarly, the animal environmental RR function within the first lactation was The breeding values and animal environmental effects for the later lactations had corresponding covariables and coefficients of their own.In addition, each later lactation had a within lactation specific RR function which modelled the lactation specific animal environmental effects.The covariables in s(d) and t(d) were derived from the eigenfunctions representing the dominant eigenvalues in full fit covariance functions (CF) (Kirkpatrick et al. 1990) applied to all traits.Thus the genetic value of an animal was described by a vector of 12 RR coefficients and the animal environmental effects across all traits and within lactations were described by a vector of 12 RR coefficients plus 6 RR coefficients for each later lactation.
Let c be the vector of all htd effects, a the vector of all additive genetic animal effects, p the vector of all animal environmental effects across traits and within lactations, w the vector of all animal environmental effects within each later lactation, and e the vector of the measurement errors.The covariance matrix of these effects was assumed to be: where A is the numerator relationship matrix.The (co)variance matrices D a , D p , D w and R, as well   The applied method was the same as explained for the first lactation traits by Mäntysaari (1999) and Lidauer et al. (2003).An extension of the method to all lactations is described by Emmerling et al. (2002).The used variances for the random htd and residual effects are presented in

Random regression test-day model using data from first three lactations only
In contrast to the RPAM all lactations are described by the RRM.TD observations from the fourth lactation onwards were excluded in an additional evaluation (RRM-3) to quantify how much the TD observations from the fourth lactation onwards affect bias.

Analysis of bias
The empirical bias was defined to be the difference of earlier EBV and more recent EBV, where additional data has been accumulated.Two evaluations were carried out for each model.For the earlier evaluation, all 305-d observations from calvings after February 1996 and all TD observations after June 1996 were excluded.The 305-d yield data were cut four months earlier to make the information in both reduced data sets comparable (Table 1).The number of TD observations increased from the earlier to the recent test-day model evaluation by 56% and 75% for the first and the later lactations, respectively.The larger increase for the later lactations was because the cows with observations were required to have first lactation observations to avoid bias in genetic trend due to selection.
From the TD model evaluation, 305-d equivalent breeding values were calculated as a sum of all daily breeding values from DIM 8 to 312.The EBV from the earlier and recent evaluation were standardized to yield same mean and standard deviation for cows born in 1987.The EBV for the first and later lactations were equally weighted to get a combined breeding value.The bias was averaged over Ayrshire animals of three groups: young sires, daughters of young sires, and daughters of proven sires.The groups were defined as follows: a young sire was born within 1993 to 1995, had a Finnish herd book sire, had no daughters with observations in the earlier data set, and received an EBV with a reliability of over 0.9 in the recent evaluation; a daughter of a young sire was born within 1995 to 1997, had no observation in the earlier data but had at least four TD records in the recent data and was a daughter of a bull from the young sire group; a daughter of a proven sire was born within 1995 to 1997, had no observations in the earlier data but at least four TD records in the recent data and was a daughter of a Finnish herd book sire born within 1986 to 1990 that received an EBV with a reliability of over 0.9 in the earlier evaluation, and had at least 100 more daughters in the recent evaluation.The early EBV for target animals were based on pedigree information only, and therefore they are called PI.By definition the expected value of a PI is equal to the expected value of the final proof.From here onwards the term bias is used for the deviation of the PI mean from its final proof.The bias in PI was studied for first lactation protein yield, later lactation protein yield, and for the combined protein yield, calculated as an average of first and later lactation protein yield.

Results
Overall, better modelling of the environmental effects by the RRM did not reduce bias compared to the RPAM.However, the observed bias was significantly smaller when adjusting for heterogeneous variance (Table 4).PI's were upward biased for young sires and daughters of young sires, whereas PI's were downward biased for daughters of proven sires.When comparing the results from the RRM-3 evaluation with that one based on RPAM, only small differences in the bias was found.PI's of young sires were biased upwards by 2.5 kg and 2.2 kg in the RRM-3 and RPAM evaluations, respectively.The same holds for the daughters of young sires, for which bias in evaluation RRM-3 and RPAM was similar for all birth year groups.Both models (RRM-3 and RPAM) underestimated the PI of daughters of proven sires but here the differences between models were more apparent (Table 4).The inclusion of all lactations into the breeding value estimation (RRM) increased the bias in young sires about 0.2 kg.The increase was found in both, PI's of first lactation as well as in PI's of later lactations.
For the daughters of proven sires the PI's were about 0.2 kg less underestimated when including later lactations.
Accounting for heterogeneous variance (M-RRM) reduced bias significantly.Bias in PI's of young sires reduced by 35% compared to the RRM evaluation and was the smallest when comparing all four evaluations.PI's of daughters of young sires were less upward biased in the oldest year group and slightly upward and downward biased in the younger year groups.PI's of daughters of proven sires showed no bias or only small bias in either one or the other direction.
Later lactation PI's of young sires and of daughters of young sires were more upward biased than their first lactation PI's.The 30% larger standard deviation in breeding values for the later lactation protein yield may explain only a part of the difference.PI's for young sires were 1.24 kg (57%) more upward biased than their PI's for first lactation.For PI's of daughters of young sires the difference was even larger (Table 4).However, using either first three lactations (RRM-3) or all lactations (RRM) did not affect the difference in the bias between the PI's for first and the PI's for the later lactations.On the contrary, adjustment for heterogeneous variance resulted similar bias across lactations.Heterogeneous variance adjustment reduced the bias in first lactation young sire PI's by 23% and in later lactation young sire PI's by 42%.
The size of the bias in the PI's for daughters of young sires was best expressed in daughters with full records in the recent data.The largest bias was found for daughters born in 1995, which had on average 22 TD observations, whereas the bias was considerable smaller for daughters born in 1997, which had on average 11 TD-observations.For instance, from the RRM evaluation with all data, the combined PI's for daughters of young sires were overestimated by 2.4 kg in the birth group 1995 but only by 0.9 kg in the birth group 1997.Estimates for the birth year 1995 are closer to true bias since dams EBV from both evaluations were more reliable.Dams of the daughters had the possibility to have at least the first lactation completed in the earlier evaluation and at least the fifth lactation completed in the recent evaluation.

Discussion
The use of TD-yields in the breeding value estimation yielded about the same size of bias in PI's as when 305-d yields were used.This was contrary to our expectation that a better modelling of the herd environment by the TD model would remove some sources of the bias.Apparently, the seasonal changes in the herd environment, which were modelled in the RRM but not in the RPAM, had no effect on the magnitude of the bias, unless the effect was equalled out by other effects of the RRM.Lidauer et al. (2003) found a 4 to 9% increase in the standard deviation of cow EBV's when using a TD model rather than a 305-d yield model for the breeding value estimation.They used the same heritability for both models and argued that the increase in the standard deviation was due to the better modelling of the environment, which increases the reliabilities of the EBV's.In our study the heritability in RPAM was higher than the one used in RRM.However, the standard deviations of young sires' EBV's were of the same magnitude in both, RRM and RPAM evaluation; 12.4 kg and 11.7 kg, respectively.Thus, the RRM must have compensated the lower heritability by a better modelling of the environment.However, this did not reduce bias.A reason might be that the herd TD was modelled as random effect in RRM, which follows the same concept as in RPAM of modelling random subclasses (herd-year) within fixed main classes (herd-5-year periods).Another reason might be that the RRM gives more weights to own performances as was shown by Mrode et al. (2006).This will put more emphasize on Mendelian sampling deviations, which increases bias in case of preferential treatment.
The bias in 305-d yield PI's found here was less than reported by Uimari andMäntysaari (1993 and1995).They found a 5.2 kg and 13.6 kg upward bias in PI's of Ayrshire young sires in their first and later investigation, respectively.They argued that the difference was a result of the higher heritability used in the later investigation (0.25 versus 0.30) and the different set of bulls used.Based on a follow-up study (Mäntysaari and Sillanpää 1993) the model was modified to RPAM used in this study.Lidauer and Mäntysaari (1996) used the same modified model and heritability as used here (RPAM), to investigate bias in PI.They reported an upward bias of 2.2 kg in the PI's for daughters of Ayrshire young sires, which is in agreement with our result of a 2.1 kg upward bias for the birth year group 1995 when applying RPAM.Further, they found that the PI's for daughters of proven sires were almost unbiased (0.2 kg).The corresponding value from our study for the birth year group 1995 was -0.6 kg for RPAM but was -1.5 kg for RRM.The found downward bias can have several reasons.The breeding values of proven sires from the earlier evaluation were required to have an accuracy of over 0.9 and therefore should not be biased.Also the dams of the daughter group born in 1995 received from the earlier evaluation breeding values based on own first and later lactation information.However, in progeny testing it is common that young sires are mated with cows that perform below the herd average.If these cows and their daughters face a below average herd environment their sire's first proof might be undervalued.In the case that the progeny tested sire is selected to produce a second crop of daughters, it will be mated with cows that perform mainly above the herd average, and the daughters might enjoy a more preferable herd environment, which in turn would rise the sires breeding value.Similarly, Uimari and Mäntysaari (1993) found that the EBV's for proven sires increased by 1.4 kg when adding data with the second crop daughters.This would explain to some extent the downward bias in the PI's but it does not explain the difference in bias of 0.9 kg found between RPAM and RRM.One reason could be the better ability of RRM to model lactation in progress, which yields a faster increase in standard deviation of EBV when information accumulates (Lidauer et al. 2003).
The inclusion of TD observations from the fourth lactation onwards increased the bias in the PI's of young sires between 5 and 11%.The inclusion of the later lactations increased the amount of observations for the young sires' bull dams, but not for the young sires' daughters.Thus, the increase in bias was caused by an increased upward bias in the EBV's of bull dams.Whenever a young cow is selected to become a bull dam, it will belong to the most valuable cows of the herd.Naturally, the farmer will provide best management to this cow group.Consequently, the deviation of the milk yield of a bull dam from the herd mean is not only affected by the cow's genetic potential.This is referred as preferential treatment (e.g.Kuhn et al. 1994), which is one source of bias in EBV's.An estimation of breeding values based on observations from the first lactation only would diminish this problem but at the cost of discarding most of the available records.The better management for bull dams explains to some extent the higher bias in the later lactation PI.A bias in PI caused by preferential treatment will be more pronounced with RRM, since RRM puts more weight on yield and less on pedigree infor-mation compared to RPAM.Mrode et al. (2006) showed that upgrading a single trait animal model to a single trait RRM increases significantly the contribution of the cow's own performance to the cow's EBV.Applying a heritability of 0.57 for both models, they found for all cows without progenies an average increase in the contribution of the yield deviation from 69% to 86%, whereas the contribution of pedigree information decreased from 31% to 14%.
A significant source of the found bias was caused by heterogeneous variance across herds.Accounting for heterogeneous variance reduced bias by 35%.Similar, Meuwissen et al. (1996) reported a 38% reduction in bias of repeatability animal model PI's when accounting for heterogeneous variance.In our study bias was larger in later lactation PI's but also reduction of bias due to heterogeneous variance adjustment was larger in later lactation PI's.This implies that there is a larger source for heterogeneous variance in the later lactations, which agrees with preferential treatment of bull dams in their later lactations.The effect of heterogeneous variance found here is in agreement with the finding by Uimari and Mäntysaari (1995) who found that PI's were highly biased for bulls coming from small herds with a high within-herd variance.Contrary to our study, Van Steenbergen et al. (2006) reported a very low benefit of heterogeneous variance adjustment to reduce bias.They found a bias in bull dam EBV's for protein yield of 18.1 kg when evaluation was based on a RR TD model without adjustment for heterogeneous variance.When adjusting for heterogeneous variance, the bias was reduced by 4% only.They suggested normalizing the distribution of the residuals and Mendelian sampling deviations by a Gaussian transformation, which yielded a reduction in bias of 33%.
Too high heritabilities may be another source of bias in EBV's was found by Uimari and Mäntysaari (1993).However, the heritabilities used in this study are moderate compared to heritabilities often reported for other RRM (De Roos et al. 2004, Thompson et al. 2005).

Conclusions
The use of TD observation instead of 305-d yield observations for the breeding value estimation did not reduce a bias in pedigree indices.The better modelling of the herd environment increased the standard deviation of the cows' EBV's, which in turn also increased the size of the bias.Including the TD observations from the fourth lactation onwards yielded a moderate increase in the bias for young sires.This was expected because later lactation yields of bull dams are often inflated due to a more preferable environment.Bias in bull dam EBV's was significantly reduced when accounting for heterogeneous variance, which in turn reduced bias in PI's of young sires.
s(d) F , s(d) L , t(d) F and t(d) L were derived using a two step approach (van derWerf et al. 1998) that derived CF based on multiple-trait estimates.For the multiple-trait variance component estimation TD observations from five different DIM windows along the course of the first and of the second lactation were defined as ten different traits.The DIM windows within a lactation were: DIM5-20, 31-60, 121-150, 211-240, and 301-330.
Variance in kg 2 (diagonal) and correlation between traits (upper triangle) for first lactation milk (M F ), protein (P F ) and fat (F F ), and later lactations' milk (random herd × test-day and random residual effects used in the test-day model.

F
day yields by a sample of days in milk (DIM).

F
not significant; ** P < 0.01; for all other values P < 0.001 Table4.Mean difference between breeding values for protein yield of earlier evaluation (PI) and recent evaluation (EBV recent ) for different groups of Ayrshire by different models, traits, and data sets.Single trait repeatability animal model (RPAM) including 305-d lactation yields from first three lactations; random regression testday model when including test-day (TD) observations from first three lactations (RRM-3); from all lactations (RRM); and multiplicative RRM including observations from all lactations (M-RRM).Traits in test-day models were: first lactation protein (P

Table 2
. Daily heritabilities, and genetic and phenotypic correlations across lactations and traits for a sample of DIM are given in Table3.Heritabilities for 305-d lactation yields constructed from daily variances (DIM = 15, 45, …, 285) were 0.42, 0.28, and 0.29 [0.34, 0.27, and 0.30] for milk, protein, and fat yield of first [later] lactations, respectively.The "combined heritability" for the average of ten TD observations from the first plus ten TD observations from the second lactation was 0.44, 0.33, and 0.35 for milk, protein, and fat, respectively.