Transcriptome profiling of Finnsheep ovaries during out-of-season breeding period

1Green Technology, Natural Resources Institute Finland (Luke), Myllytie 1, FI-31600 Jokioinen, Finland 2Department of Biology, University of Eastern Finland, Yliopistonranta 1 E, FI-70211 Kuopio, Finland 3Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Gerda Ulls Väg 26, 75007 Uppsala, Sweden 4CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beichen West Road No. 1–5, Chaoyang District, 100101 Beijing, China email: menghua.li@ioz.ac.cn


Introduction
The native Finnsheep breed demonstrates several economically important and physiologically interesting traits such as large litter size (i.e.multiple births) and out-of-season breeding and, thus, has been exported to more than 40 countries (Maijala 1988).The large litter size in Finnsheep (on average 2.4-2.9 lambs) is most likely associated with increased ovulation rate during the oestrus cycle and successful subsequent embryonic and foetal development (Maijala and Österberg 1977, Fahmy and Dufour 1988, Li et al. 2011, Vinet et al. 2012).Mutations in functional genes such as bone morphogenetic protein 15 (e.g.BMP15 on OARX), bone morphogenetic protein receptor, type 1B (BMPR1B on OAR6) and growth differentiation factor 9 (GDF9 on OAR5) have been shown to account for the increase in ovulation rate and litter size in sheep (McNatty et al. 2005, Vinet et al. 2012, Mullen et al. 2013, Våge et al. 2013).
The majority of Finnsheep ewes also have an exceptional ability to breed out-of-season, i.e. they can conceive all the year around instead of only once in autumn, typical for most breeds, allowing year-round breeding (Sormunen-Cristian and Suvela 1999).Functional genes accounting for the off-season oestrus in Finnsheep have not yet been identified although a few candidate genes for the trait have been found in some other prolific sheep breeds.For example, Chen et al. (2012) identified a set of candidate genes for out-of-season breeding in the Hu sheep (an indigenous Chinese breed), such as neurotrophic tyrosine receptor type 2 (NTRK2), nidogen 1 (NID1) and serine protease inhibitor clade E member 2 (SERPINE2).Moreover, Notter et al. (2003) and Mateescu and Thonney (2010) found another candidate gene, melatonin 1A receptor (MTNR1A), for out-of-season breeding in the Dorset sheep.
Earlier investigations showed that functional genomics elements underlying complex traits can be efficiently explored by whole-transcriptome analysis, using next-generation sequencing technologies (Twine et al. 2011, Husseneder et al. 2012, Ramayo-Caldas et al. 2012, Xu et al. 2013).We have commenced a sheep fertility study which is based on whole-genome transcriptome and single nucleotide polymorphism (SNP) marker analyses.In the present study, we present our first results on transcriptome profiling of Finnsheep ovaries and test the applicability of experimental design and methods prior to analyses of a larger, main data set.The ovaries were collected during the out-of-season breeding period (June) from two Finnsheep ewes.To the best of our knowledge, this is the first transcriptome profiling study on the exceptionally prolific Finnsheep breed, which represents a valuable animal genetic resource for the global sheep industry.

Sample collection
Animal handling procedures and sample collection were conducted following both the Finnish laws and EU directives regarding animal experimentation and occupational health.Ovarian mRNA expression profiles were investigated in two Finnsheep ewes.The two ewes were born in January 2011 on the same farm located in western Finland and were closely related (coefficient of relation between the ewes was 0.375 as estimated from the pedigree records).One ovary from each of the ewes was collected after slaughter during June 2012.The samples were decoded and hereafter are referred as S48 and S50.

RNA isolation and sequencing
Ovaries were dissected into 4 sections and stored in RNAlater reagent (Qiagen, Valencia, CA, USA) at -20 °C until use.Total RNA was extracted using the RNeasy plus mini kit (Qiagen, Valencia, CA, USA).RNA concentration and RIN (RNA Integrity Number) values were checked using a Bioanalyzer 2100 (Agilent Technologies).cDNA libraries were prepared and sequenced using Illumina HiSeq 2000 (Illumina, San Diego, CA) at the Institute of Molecular Medicine Finland as detailed in Koskela et al. (2012), generating 100-bp paired-end reads.The RNA-Seq sequencing data have been deposited in the Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra/) with the accession number SRX791949.

Transcriptome assembly and annotation
The used bioinformatics pipeline is illustrated in Figure 1.The quality of reads was assessed using FastQC version 0.10.1 (Simon Andrews).The sheep reference genome (Oarv 3.1) and reference genes were downloaded from Ensembl database (Ensembl Release 74).We mapped the obtained reads to the ovine reference genome and gene sequences using Tophat v2.0.9 (Kim et al. 2013).Cufflinks v2.1.1 (Trapnell et al. 2010) was used to assemble transcripts from the RNA-Seq alignments.In addition, two different assemblies for each sample were compared and combined with reference transcriptome to identify unknown transcripts in the data using the Cuffcompare program of Cufflinks.All reads that exclusively mapped to ovine genes were quantified using HTSeq-count software package (Anders et al. 2015).Gene expression levels of read counts obtained from HTSeq-count were measured in the R statistical environment using the DESeq Bioconductor package (Anders and Huber 2010).We used Blast2GO (Conesa et al. 2005) for retrieving Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and analyzing annotation data.Ensembl BioMart (Kinsella et al. 2011) was used to retrieve Gene Ontology (GO) terms associated with all expressed genes and updated annotation for Ensembl IDs with missing information.We annotated unknown transcripts, i.e. all transcripts that did not match to sheep genes, by implementing BLAST search and InterPro query in Blast2GO.Genes associated with fertility and out-of-season breeding were analyzed in detail.Both the nucleotide and amino acid variants in three major candidate genes (GDF9, BMP15 and BMPR1B) were identified and assessed visually using the program Integrative Genomics Viewer (IGV) version 2.3.26(Thorvaldsdóttir et al. 2013).

Transcriptome sequencing and assembly
A total of 192554342 RNA-Seq reads representing the two studied samples, S48 (86966348 reads) and S50 (105587994 reads), were generated by Illumina sequencing.The overall quality of the reads was very good with an average quality score across all bases greater than 28.Thus, we did not perform any filtration and trimming of the data.GC contents for S48 and S50 were 54% and 55%, respectively.High quality reads were aligned to the ovine genome using Tophat v2.0.9.Without any filtering for base quality scores, 68.6% and 68.2% of the RNA-seq reads from the S50 and S48, respectively, were mapped to the ovine reference genome.

Gene expression profiling
The recent ovine reference genome (Oarv3.1,December 2013) of Ensembl was used for annotating and counting reads that mapped to genomic features.Altogether 14875 genes were expressed in both samples with the base mean expression values of >10.The ten most highly expressed genes (COL1A1, COL1A2, COX1, ENSOARG00000000004, ENSOARG00000006149, COL3A1, EEF2, ACTB, INHA and SERPINE2) contributed to 9.1% (2272788) of the reads in the S48 and 11.5% (3691546) of the reads in the S50, respectively.INHA, a candidate gene for fertility and SERPINE2, a proposed candidate gene for out-of-season breeding, were the two exceptional highly expressed genes in S50.
Out of 14875 Ensembl genes expressed in this study 2261 did not have genes names.We converted IDs with missing gene names into RefSeq protein ID using BioMart and extracted the corresponding gene names from the Ref-Seq database.We were able to find gene names for 1465 Ensembl IDs including 562 genes with uncertain functions (those starting with LOC and 202 duplicated IDs).

Gene annotation
Genes expressed in the data were associated with 80261 GO terms which can be classified into three categories biological processes, cellular component and molecular functions (Fig. 2).The majority of these terms (41663) were associated with biological processes whereas 19895 terms were related to cellular components and 18703 to molecular function.We further expanded three main categories into lower level classification to understand what kind of processes and functions were linked with the genes expressed in the data.From the list of top ten terms in each category, it was found that protein binding is the most common ontology term which is represented by 5572 genes (Table 1).The analysis revealed that expressed genes were mainly involved in differentially regulating transcription processes followed by developmental processes such as cell proliferation and embryonic development.Similarly, nucleus is the most frequent cellular component term annotated to expressed genes.The obtained transcripts and genes were related to 129 KEGG pathways.Table 2 shows the ten most abundant pathways with corresponding number of sequences and enzymes.The results indicated that the purine metabolism pathway comprised the highest number of sequences (566 in the S48; 531 in the S50) and enzymes (58 in the S48; 54 in the S50) in both samples.The number of enzymes in most of the pathways was same although the number of sequences was different.Altogether 5606 unknown transcripts were present in the data with 3060 sequences known in other species and the rest 2546 did not match any sequences in the non-redundant (nr) sequence database.Out of 3060 transcripts that shared sequence similarity with genes found in other species, gene ontology terms were available for 374 transcripts.Interestingly, BLAST searching showed that as many as 225 transcripts were part of the Orf virus (ORFV) genome (Fig. 2).In the list of transcripts, the ORFV was the third most common genome after bovine and yak.Presence of the viral genes was verified using BLAST search for each assembled transcriptome (S48 and S50) separately.
Fig. 3. Distribution of unknown transcripts that did not align to the ovine genes in Ensembl database.

Analysis of candidate genes
We further examined the expression level of the 17 putative candidate genes, ten for fertility and seven for out-of-season breeding (Table 3).Genes such as INHA, INHBA and SERPINE2 have differential mean values and seem to be up-regulated in S50.However, we were unable to generate any conclusions regarding these genes due to the limited number of samples.B4GALNT2 one of the candidate genes for high litter size, and SERPINE2, another candidate gene for out-of-season breeding were not expressed in the data.Mutations in GDF9, B4GALNT2, BMPR1B and BMP15, have been shown to affect litter size and species-specific rates of ovulation (Davis 2005, Fogarty 2009, Demars et al. 2013, Våge et al. 2013).Compared to the ovine reference genome, five SNP mutations were present in GDF9.At the amino acid level, instead of the V371M mutation recently reported by Våge et al. (2013), we found the K241E substitution.Similarly, five SNPs were detected in the BMPR1B gene with three non-synonymous mutations (Q249R, K299E and E461G).However, the BMP15 gene was found to be monomorphic and no causative mutations were detected in the data.

Discussion
The ovary is one of the key tissues in a complex regulatory network affecting fertility and breeding in mammalian species.We profiled ovarian transcriptomes using biopsies of two ewes of the native Finnsheep, one of the most fertile sheep breeds in the world.Using the recent Ensembl annotation, we identified a total of 14875 genes expressed in the data.The genes expressed in the data represented the majority of the genes found in ovaries of other mammals such as human and mouse.This result is in good agreement with the recent study by Bonnet et al. (2013), where 15349 genes were identified in the sheep juvenile ovary by applying a laser capture dissection approach.Due to smaller sample size, although we found genes which were differentially expressed between two samples, no further analysis were done.However, these findings will be considered in future experiments.
The samples were collected during the out-of-season breeding period.All six candidate genes -NTRK2, NID1, FOXO1, HTRA1, SERPINE2, PPAP2B -which were putatively associated with out-of-season breeding capability (Chen et al. 2012) were expressed in the samples.These putative genes associated with out-of-season breeding have not yet been well supported by experimental validations and thus the results need to be supplemented with more samples and experimental studies.In addition to the genes associated with out-of-season breeding, we were interested in the expression patterns of fertility genes and their polymorphisms in the Finnsheep.Nine of the ten putative genes (GDF9, BMP15, BMPR1B, INHA, INHBA, PRLR, FSHR, ESR1, ESR2) associated with fertility were expressed in both ewes, but the recently suggested candidate gene B4GALNT2 was not expressed.In the two samples, the causative mutations at the fertility genes GDF9, BMP15 and BMPR1B affecting ovulation rate and litter size (Vinet et al. 2012, Våge et al. 2013) were not found.This observation needs to be verified with a larger dataset, but the results support the hypothesis that the mutations at major genes may not explain the exceptional prolificacy of the Finnsheep.
In addition to the candidate genes, the data provide insights into biological pathways in the sheep ovary during the out-of-season breeding period.Follicles, the functional units of the ovary, are made up of an oocyte surrounded by one or more layers of somatic cells, namely theca cells and granulosa cells.Early stage development of follicles is mediated by two members of the transforming growth factor β (TGF-β) family, GDF9 and BMP15 (Dong et al. 1996, Carabatsos et al. 1998, Galloway et al. 2000).
In the data, 5606 unknown transcripts were also expressed (Fig. 2).These matched to genomic sequences of related ruminant species (taurine cattle, yak and goat), but also to domestic pig and ORFV.ORFV is one of the four viruses in the genus Parapoxvirus (PPV) of Poxviridae family, causing contagious ecthyma (orf), primarily in sheep and goat (Crumbie 1998).Recent studies have shown the importance of retrovirus in sheep reproduction and development (Dunlap et al. 2006).In addition, endogenous retroviruses have been used as genetic markers to explore sheep domestication history (Chessa et al. 2009).Thus, further characterization of those transcripts might provide valuable insights on adaptation of Finnsheep in extreme climate or its exceptional reproductive performance compared to other breeds.

Conclusions
Here we present the first transcriptome profiling of the Finnsheep, which is a valuable sheep breed for global sheep farming.In the two studied ovaries from Finnsheep ewes we identified transcripts of 14875 genes.Also, majority of the putative candidate genes related to fertility and out-of-season reproduction were present in both samples.Presence of the majority of putative candidate genes associated with out-of-season breeding in our data not only strengthens previous findings, but also enables further research in this area.Presence of ORFV genes in the data suggests its retro-viral existence in the genome of sheep.Our results showed that ovaries can be a very good candidate tissue for reproductive genomics studies utilizing high throughput sequencing techniques and bioinformatics methods.

Fig. 1 .
Fig. 1.RNA-Seq pipeline.Flowchart illustrates the bioinformatics pipeline used to analyse the RNA-Seq data.Arrows indicate data flow; analysis steps are marked by hexagons; rectangles indicate output data and rounded rectangles indicate tools and programs.
Fig. 2. Gene Ontology categories.The figure represents the pie chart distribution of three GO categories, namely, biological processes, cellular components and molecular function.All 14875 genes expressed in the data were used for the ontology analysis.

Table 1 .
Gene Ontology analysis.The table represents top ten GO terms for biological processes, cellular component and molecular function.

Table 2 .
KEGG pathway analysis.The table shows top ten KEGG pathways and corresponding number of gene sequences (#Seqs) and enzymes (#Enzs) in the data.

Table 3 .
Expression levels of the candidate genes.The first nine genes are associated with fertility and the rest with outof-season breeding.Base mean values are the weighted mean values computed by DESeq and p adj is the adjusted p value.