A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 297 Snp_blup_rel: software for calculating individual animal SNP-BLUP model reliabilities Hafedh Ben Zaabza, Esa A. Mäntysaari and Ismo Strandén Natural Resources Institute Finland (Luke), FI-31600 Jokioinen, Finland e-mail: hafedh.benzaabza@luke.fi The snp_blup_rel program computes model reliabilities for genomic breeding values. The program assumes a sin- gle trait SNP-BLUP model where the breeding value can include a residual polygenic (RPG) effect. The reliability calculation requires elements of the inverse of the mixed model equations (MME). The calculation has three steps: 1) MME calculation, 2) MME coefficient matrix inversion, and 3) reliability computation. When needed, the inverted matrix can be saved after step 2. Step 3 can be used separately to new genotypes which do not contribute informa- tion to Step 2. When an RPG effect is included, an approximate method based on Monte Carlo sampling is applied. This reduces the MME matrix size and allows including many genotyped individuals. The program is written in For- tran 90/95, and uses LAPACK subroutines which enable multi-threaded parallel computing. The program is efficient in terms of computing time and memory requirements, and can be used to analyze even large genomic data. Thus, the program can be used in calculating model reliabilities for large national genomic evaluations. Key words: genomic evaluation, model reliability, SNP markers Introduction The dairy cattle evaluation community is increasingly relying on estimated breeding values (EBV) based on genom- ic information. Two equivalent genomic models (Strandén and Garrick 2009) are available for calculating genomic EBV and their reliabilities: genomic best linear unbiased prediction (GBLUP) and marker effects models. Currently the most common genomic markers are single nucleotide polymorphisms (SNP), and the marker effects models are, thus, generally used to predict SNP effects (SNP-BLUP). Calculation of genomic reliability for individual EBV by GBLUP requires inverting the coefficient matrix of the mixed model equations (MME) that include the inverse of the genomic relationship matrix. These matrix inversions become infeasible as the number of genotyped animals increases (Fernando et al. 2016). In contrast, the MME matrix size of SNP-BLUP is bounded by the number of SNP markers. Thus, SNP-BLUP can scale to large numbers of genotyped animals better than GBLUP. In SNP-BLUP, the inverse of the MME coefficient matrix provides the prediction error (co)variances (PEV) of the SNP solutions, which can thereafter be used to derive PEV for the EBV. While the computation of reliabilities with GBLUP requires the inclusion of all animals into the genomic relationship matrix, the SNP-BLUP model involves two stages: 1) PEV for the SNP effects are estimated using the genotypes of phenotyped animals, and 2) reliabil- ity of the EBV of each genotyped animal is computed using the PEV of the SNP effects without the need to con- sider its phenotype. Typically, genetic variation cannot be completely detected by SNP markers because of the incomplete linkage dis- equilibrium between the used SNP markers and the quantitative trait loci of the evaluated trait. A residual poly- genic (RPG) effect is often included to account for the variance not captured by SNP markers. However, including an RPG effect into the SNP-BLUP model leads to an increase in the size of the MME by the number of genotyped animals (Liu et al. 2016). This can be avoided by an approximation based on Monte Carlo (MC) sampling of pseu- do markers that construct the relationships among animals (Ben Zaabza et al. 2020). In this paper we will describe an efficient program, called snp_blup_rel, which allows the calculation of reliabili- ties of EBV by SNP-BLUP, both with and without an RPG effect. A single trait SNP-BLUP with a general mean is as- sumed. If multiple traits (with different training population and/or variance components) need to be analyzed, then each trait will require its own inverse of the coefficient matrix of the MME. The snp_blup_rel program has been used in earlier studies (Liu et al. 2017, Ben Zaabza et al. 2020), but here we concentrate on describing the program itself. Firstly, we present some theoretical background and the algorithm used in the implementation. Then, we describe the most important options, and give examples of genomic reliability calculations. Manuscript received June 2020 A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 298 Theoretical background EBV reliability in a genomic model Consider a GBLUP model (1) where y is an n×1 data vector, 1 n is a vector of n ones, µ is the unknown overall mean, u G is a n×1 vector of addi- tive genetic effects; e is a n×1 vector of residuals. Assume that and where is the addi- tive genetic variance, and is the residual variance. The R matrix can be assumed to be diagonal with elements , where wi is the weight for observation i. When both the marker and the pedigree information influence the genomic relationship matrix G w , it can be written G w =(1-w)ZZ´+wA 22 , where w is the RPG proportion, Z is an n by m marker matrix of centered and scaled genotypes, and A 22 is the pedigree relationship matrix among the gen- otyped animals. Common centering and scaling approaches for the Z matrix are called VanRaden (2008) methods 1 and 2. In the Z matrix, column k for a marker has value , , and for genotypes BB, AB, and AA, respectively, where p k is the (base population) allele frequency of marker k. In method 1, , and in method 2, . An equivalent SNP-BLUP model to the GBLUP model (1) is (Ben Zaabza et al. 2020) (2) where , and , u is a vector of the unknown pseudo marker effects, g is a vector of the SNP marker effects, and a is a vector of the effects for the RPG based breeding values. Thus, u G =Wu. It is as- sumed that and . MME for model [2] can be written as (3) where and . Denote the inverse of the coefficient matrix in MME (3) by where Cuu is the submatrix associated with the genetic effects û. Then, the estimated genomic breeding values are . Reliability for the estimated genomic breeding value of genotyped animal i is (4) where {G w } ii is diagonal element i in matrix G w , and w i is the row in the W matrix corresponding to the i-th geno- typed animal. Note that the dimension of MME is m+n, i.e., size of this matrix increases by the size of number of genotyped animals. Residual polygenic effect by Monte Carlo The SNP-BLUP model (2) can be approximated with the aid of Monte Carlo (MC) sampling (Ben Zaabza et al. 2020): (5) where , U is an n×n MC matrix of the MC samples for genotyped animals, is a vector of (m+n MC ) unknown (pseudo) marker effects, of which m are due to the SNP markers and n MC approximate the RPG breeding values. Breeding values are u G ≈Sv. It is assumed that . We use MC samples y=1nµ+uG+e, σu 2 σe 2 1 sk �1-2pk� 1 sk �0-2pk� 1 sk �2-2pk� sk=�� 2pl�1-pl� m l=1 sk=�m2pk�1-pk� y=1nµ+Wu+e, � 1n ' R-11n 1n ' R-1W W'R-11n W 'R-1W+λΩ-1 ��µ� u� � = � 1n ' R-1y W'R-1y �, ri 2=1-λ wiC uuwi ' {Gw}ii , y=1nµ+Sv+e, S=��(1-w)Z √wU� v'=�g' gp ' � W=[�(1-w)Z √wIn] g~N(0, Imσu 2) a~N(0, A22σu 2) λ= σe 2 σu2 Ω=� Im 0 0 A22 � �C μμ Cμu Cuμ Cuu � uG�=Wu� v~N(0, Im+nMCσu 2) uG~N(0,Gwσu 2) e~N(0,Rσe 2) Rii = 1 wi u' = [g' a'] A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 299 , where a i~N(0,A22), i=1,…,nMC, and nMC is the number of MC samples. Note that the variance of a is A 22 =Var(a)=E(aa’)-E(a)E(a)’=E(aa’), which is equal to UU’, where the additive genetic variance is assumed to be equal to 1 (see Ben Zaabza et al. 2020). The MME for the MC-SNP-BLUP model (5) is (6) Denote the coefficient matrix of this MME by C s and its inverse matrix elements as . Approximate re- liability for EBV of animal i is , (7) where s i is row i in S, and {G s } ii is diagonal element i in the genomic relationship matrix G=SS´=(1-w)ZZ´+wUU´. When n MC is large, UU’ tends towards A 22 . For more details on this model, see Ben Zaabza et al. (2020). Algorithm and implementation In the snp_blup_rel program, centering of the Z matrix can be based either on the current genotypes or on given allele frequencies, e.g., base population allele frequencies estimated by a method like the GLS model (McPeek et al. 2004, Strandén et al. 2017). Ideally, the base population allele frequencies are used in the genomic relation- ship matrix (VanRaden 2008, Liu et al. 2017). The RPG effect can be included into the model by two alternative approaches. In the exact approach, the pre- calculated A 22 matrix is read into memory, inverted, and used in the MME (3). In the other approach, MC sampling is used to approximate the RPG effect as in model (5) and MME (6). When size of the A 22 matrix is large, comput- ing time by the exact approach can be substantially longer than in the approximate MC-SNP-BLUP approach, be- cause the computing time depends on the size of the MME coefficient matrix. However, in contrast to the exact approach, the Monte Carlo-based sampling method to estimate the reliability of the SNP-BLUP model including the RPG effect is very fast, because the sampling can be implemented efficiently by using the full pedigree and retaining only samples pertaining to the genotyped animals (Ben Zaabza et al. 2020). A disadvantage of MC-SNP-BLUP is that the computed reliabilities are inflated when the number of MC samples is too low. The inflation is tolerable when the number of MC samples is about the number of SNP markers and the proportion of RPG effect w is about 20% or less. In addition, the minimum number of MC samples depends on the number of genotyped animals and on the population structure (Ben Zaabza et al. 2020). When the RPG pro- portion w is high, the reliability calculations depend more on MC sampling. Thus, a large RPG effect proportion w needs many MC samples to reduce the inflation. However, the number of MC samples needed to give a suffi- ciently accurate genomic reliability approximation is typically much less than the number of genotyped animals (Ben Zaabza et al. 2020). In snp_blup_rel, all matrix inversions, such as the inversion of the MME coefficient matrix and the A 22 matrix, use LAPACK (Anderson et al. 1999) subroutines DPOTRF and DPOTRI. When computing the MME, the W matrix in (3) or S matrix in (6) can be stored in memory which allows the use of DGEMM subroutine in making the matrix prod- uct W´R-1W or S´R-1S efficiently. The snp_blup_rel implementation employs LAPACK subroutines from the Intel Math Kernel Library (Intel Math Kernel Library Reference Manual 2014). These allow taking advantage of parallel computing on a multi-core computer. Input files A short description of the input files and some options are in Tables 1 and 2. The snp_blup_rel program can be given six different input files depending on the model and computational approach: 1) A genotype data file. This always needs to be given. In the file, the first column is the individual ID code, fol- lowed by the marker genotypes. The file format is described later. 2) An optional (-a) file having the allele frequencies can be given to center the genotypes. The file format is de- scribed later. � 1n ' R-11n 1n ' R-1S S'R-11n S 'R-1S+λIm+nMC ��µ� v� � = � 1n ' R-1y S'R-1y � ri 2=1-λ siCs vvsi ' {Gs}ii Cs -1= � Cs µµ Cs µv Cs vµ Cs vv� U= 1 �nMC [a1 ⋯ anMC ] A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 300 3) An optional (-o) data file is needed when weights or observations are specified. If this file is not given, all geno- typed animals are assumed to have an observation. The data file must include at least the individual ID code. The default column for the ID code is one. When no column for observation weight is given, default weight is one. Thus, when a data file is given, genotyped animals that appear in the data file can be considered reference ani- mals and those without observation can be considered candidate animals. 4) An optional (-A) file having A 22 matrix is needed when the RPG effect is accounted exactly in the model reliabil- ity calculations. The file format is described later. 5) An optional (-ped) pedigree file is needed when an RPG effect is accounted by the MC-SNP-BLUP approach. The file has three columns having integer numbers for individual, sire, and dam ID codes. 6) An optional (-F) file having inbreeding coefficients can be given to account for inbreeding coefficients (Mende- lian sampling term) when simulating the MC samples in MC-SNP-BLUP. In this file, the first column has the indi- vidual ID code, and the inbreeding coefficient is by default in the third column. The marker data in the genotype file is used to build the Z matrix. By default, the file is assumed to have the allele counts: zero, one and two for each genotype. There is some flexibility in the genotype file format. However, for the option “-m raw” the genotypes are floating point numbers which allows reading a pre-calculated user speci- fied Z matrix. By default, the genotype file is assumed to have space delimited numbers for the genotypes of an individual on the same line. The first column in the genotype file is always the ID code number. By default, the genotypes are assumed to be space separated floating point numbers. The option “-int” informs that the genotypes are space separated inte- ger numbers. However, the option “-nospace” assumes genotypes to be single digit integer numbers without the space separator. Note that this option assumes that the first column has the ID code number which is separated by at least one space from the marker data. Alternatively, when all columns in the file are integers, the space re- quirement can be relaxed by giving a Fortran type format. For example, “-FMT ‘(i7,1x,50000i1)’” assumes a sev- en-digit ID code, one space, and 50000 single digit genotypes. Giving the right number of markers, here 50000, is not critical. In fact, when a too large number is given, the program calculates and uses the number of markers on the first row in the genotyped file. The option “-s first last jump” allows selecting a subset of markers in the specific region to be included in the analysis from the genotype file. The ‘-s’ option allows omitting any non-gen- otype columns before the first genotype column. For example, giving “-s 4 0 0” will start from column 5 onwards all columns as markers when the zeros are used for default values (i.e. “last” replaced by the number of markers, and “jump” by one). The pedigree relationship matrix A 22 has to be provided in a file if the RPG effect is accounted exactly (model [2]). Two matrix formats are supported. In the co-ordinate format (“-A file”), the file has non-zero elements of the up- per or lower triangle of the matrix but not both. Each line has three space separated values. The first two are the integer valued ID codes to identify the individuals which the relationship coefficient concerns. The third column has the relationship value. Table 1. Main options for snp_blup_rel describing the data format and desired models Options Description -nthr n number of threads. Default from variable MKL_NUM_THREADS -o data_file name of the data file. Each line has an ID number of animal with observation -wt col_num column number of weights in the data_file -e col_num column number of effective daughter contributions (EDC) for weight=EDC/λ, -h2 v value of animal model heritability. Assumes -h2s v value of sire model heritability. Assumes -m method genomic data centering/scaling method (PvR1, PvR2, raw …) -a afreq_file afreq_file is file for allele frequencies by markers -s fst lst jmp subsetting by every jmpth marker in the genotype file starting from first (fst) to last (lst) marker column λ= σe 2 σu2 λ= 1-h2 h2 λ= 4-h2 h2 A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 301 Note that each value given in the relationship matrix is added to the upper and lower triangular matrix in memory. So, if the same position or its transpose, e.g., (1,2) and (2,1), is referred to on two different lines, the sum of their values is used. In the lower triangle dense format (“-Alower file”), the first row has size of the matrix. The second row has the ID code numbers. The rest of the file has the lower triangle of the A 22 matrix. An example of this file format is given in Supplementary Materials. Matrix A 22 can be computed by any means that the user desires, e.g., RelaX2 program (Strandén and Vuori 2006) computes the A 22 matrix and allows storing it in either of the formats. Output files The snp_blup_rel program provides an output file which has several columns for each individual. The first column has the individual ID code, the second has the weight of the observation, the third has the PEV, the fourth has the diagonal of the G matrix, and the fifth has the calculated reliability. The user must provide the name of this output file. In addition, the program provides a detailed summary of the analysis to standard output when the program is executed. When the screen output is directed to a file, it is easy to obtain data and model information such as the number of SNP markers and genotyped individuals. Options The snp_blup_rel program offers several options (Tables 1, 2 and 3). The list of options is printed to standard out- put when the program is started without any arguments, or if it is started with the option “-info”. Because the list of options including the data files and result files can become long on a command line, it may be convenient to store the instructions in a file. By choosing the option “-f o_file”, snp_blup_rel reads the options from file “o_file”. Table 2 presents the different options commonly used in both SNP-BLUP and MC-SNP-BLUP models. These options can generally be divided into three types: phenotype input file, genotype input file and method options, and output options. The option “-o” specifies the name of the phenotype data file in which the program finds the ID code numbers of animals with observations. When no phenotype data file is given, each genotype record is assumed to have one observation. When a data file is provided using the “-o” option, the file must have at least a column having the ID codes of animals with data. The default value for the ID code column number is 1, but the option “-id column” can be used to change this. Optional columns include weight of observation (option “-wt column”) and, alterna- tively, effective daughter contribution (EDC, option “-e column”). Generally, the weight or EDC of an observation in the data file should be a positive number. If this is not the case, the weight value is set to zero, i.e., the obser- vation is deleted. The marker data are used to form the marker matrix Z required in the MME and calculation of the model reli- abilities. The option “-m method” allows specifying the available centering and scaling methods. The option “-m PvR1” refers to VanRaden method 1, and “-m PvR2” to VanRaden method 2, which were described in the Mate- rial and Methods section. As already mentioned, the option “-m raw” means using the genotype values stored in the genotype data file. The option “-m 101” means centering by assuming all allele frequencies to be 0.5, and “-m center” means centering by VanRaden method 1. Note that both VanRaden method options (“-m PvR1” and “-m PvR2”) lead to scaling, whereas the use of the other methods by “-m” does not lead to scaling unless specifically requested by the “-c“ option (Table 3). Table 2. Snp_blup_rel options when including RPG effect using exact SNP-BLUP or MC-SNP-BLUP model Options Description -ped file pedigree file when residual polygenic effect is included in the model -F file inbreeding coefficient file for the Mendelian term in MC sampling -Fcol fc define the inbreeding coefficient column in the –F file. Default is 3. -w define the proportion of the RPG in SNP-BLUP model. Default is zero. -MC n define the Monte Carlo sampling size -A file file having pedigree-based relationship matrix A 22 in co-ordinate format. Row has format: -Alower file file having pedigree-based relationship matrix A 22 . Matrix is in lower triangle dense format (see example). A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 302 Scaling factors in the VanRaden methods (“-m PvR1” and “-m PvR2”) are based on allele frequencies which are calculated from the data by default or read from a file by the option “-a afreq_file”. The “afreq_file” should have two columns (marker number, allele frequency) where the allele frequency must be between zero and one. The marker numbers should be consecutive from one to the number of markers. Scaling factor can be changed from the default with the option “-c kval”, Table 3. The “kval” has six alternatives: 1) “2pq” means dividing by as in VanRaden method 1, 2) “m” means dividing by the square root of the number of markers as in VanRaden method 2, 3) “m2” means that the division of markers by , 4) “dA” allows multiplication by the square root of the trace of A 22 divided by the trace of ZZ´, i.e., , 5) “one” means that the diagonals of ZZ´ will be scaled to be on average one (Forni et al. 2011), and 6) “no” means no scaling. Note that when “-m PvR2” is used, each marker is scaled by but the default marker scaling by can be changed using the “-c” option. The RPG effect is included into the model by the option “-w w”, where w is the RPG proportion and must be be- tween zero and one. In order to include the RPG effect, two approaches have been implemented in snp_blup_rel. In the exact approach based on model (2), the A 22 matrix has to be provided and is read from an external file (“-A Afile” or “-Alower Afile”). In the MC based approach by MC-SNP-BLUP model (5), the pedigree information has to be given and is read from a pedigree file using the option “-ped file”. The default number of MC samples is (number of markers)/2, which can be optimal for analysis of a single breed with animals genotyped using, or imputed to, a 50K SNP chip (see Ben Zaabza et al. 2020). However, the number of Monte Carlo samples can be specified by the option “-MC n”, where n is a positive number. The RPG effect always requires specification of either the pedigree or A 22 -file, but can be further specified by a few other options, which are provided in Table 2. For example, in or- der to account for inbreeding coefficients in MC sampling, the inbreeding coefficients need to be given in a file, and the file is named by the option “-F Ffile”. In the “Ffile”, the first column must have the ID code of individual. By default, the inbreeding coefficient is in the third column which can be changed by the option “-Fcol column”. Note that when a pedigree file is given, the inbreeding coefficients file also has to be given. The snp_blup_rel program has been designed to be flexible in terms of memory need. First, there are the options to use approximations, e.g., “-s” and the MC-SNP-BLUP approach. Second, there are three options that control the amount of used memory. The option “-memhigh” stores the Z marker matrix and the MC samples into memory in matrix S (or W in case of model [2]) and uses the efficient DGEMM subroutine in LAPACK to calculate in (7) as a matrix by matrix product . This is the default option. The option “-mem” also has the S matrix in mem- ory but computes by individual such that there is no need to store the result from product . Finally, the option “-memlow N” reads the Z matrix in blocks of N individuals into memory and computes in (4) by in- dividual. Note that this option is currently limited to pure SNP-BLUP and to SNP-BLUP with an exactly accounted RPG effect. Naturally, when model (2) is used, only the Z matrix part in W is in memory and the identity matrix part does not use additional memory. A useful feature of the snp_blup_rel program is its ability to reuse the once calculated inverse of the MME coef- ficient matrix in the calculation of reliabilities for candidate animals. The PEV matrix of SNP effects is included in the inverse of MME and can be written into a file using option “-iCout inv_MME_file”. Subsequently, snp_blup_ rel allows reading the submatrix of the inverse of MME corresponding to the SNP-markers by the option “-iCin inv_MME_file”. Hence, the calculation of the PEV of genomic breeding values for a new set of genotyped animals can be done quickly without the need to invert the coefficient matrix of the MME when no new phenotypes are Table 3. Choices for the scaling of marker columns by option -c kval in the Z matrix Value kval Description 2pq divide by , default for VanRaden method 1 (-m PvR1) m divide by the number of markers or , default for VanRaden method 2 (-m PvR2) m2 divide by dA scale Z such that the average diagonal of ZZ’ will be equal to the average diagonal of A 22 one scale Z such that the average diagonal of ZZ’ will be one no no scaling, default for “-m raw” and “-m 101” �� 2pl�1-pl� m l=1 �m/2 �� 2pl�1-pl� m l=1 �m/2 �𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡(A22) 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡(ZZʹ)⁄ �2pk�1-pk� siCs vv SCs vv wiC uu SCs vv siCs vv √m √m √m A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 303 available. The default format of the inverse MME matrix file is binary. However, when the file name ends in “.txt”, the matrix is written in coordinate or ijv format as a regular text file. This is slower to write and read than the bi- nary format but allows importing the matrix to other programs. Examples We illustrate the use of the snp_blup_rel program in calculating genomic reliabilities in SNP-BLUP and MC-SNP- BLUP models by a small example, see Supplementary Materials for the data. In the first example, the individual reliabilities are calculated by a SNP-BLUP model. The genotypes are in the file markers.snp. We use VanRaden method 1, and a heritability of 40%. This analysis is carried out by the command: Reliabilities and other information are written to file First_rel.out. The file has five columns (Table 4). The first col- umn has the ID code, and the last column has the reliability. In practice, the genotypes of candidate animals can become available after genomic breeding values and their re- liabilities have been calculated for the animals with phenotypes. In that case, the heavy computations have been already done, and the calculation of candidate animal breeding values can be based on already calculated infor- mation when it has been stored to a file. The following program calls describe a simple set up for an evaluation system where the second call for the candidate animals is done later. The first call calculates the reliabilities as in the first example, but requests storing of the inverse of the MME coefficient matrix: snp_blup_rel -iCout First_iMME.out -m PvR1 -h2 0.4 markers.snp First_rel.out In the second call, it is possible to use this inverse matrix for the candidate animals, which in this simple example are the reference animals, without the need to build and invert the MME matrix again: snp_blup_rel -iCin First_iMME.out -m PvR1 markers.snp First_rel_again.out Note that now the program has to be informed that the centering and scaling is by VanRaden method 1 (“-m PvR1”) as it was the case in first stage call. In practice, a different set of genotyped animals is given. This step can be done quickly multiple times for new sets of candidate animals. The inverse of the MME coefficient matrix is stored by default in binary format in order to allow a fast read of this file. Table 4. Output file from the snp_blup_rel program for the example case. The columns are animal ID code (ID), weight of observation (obs.wt), prediction error variance (PEV), diagonal of the genomic relationship matrix (diag(G)), and model reliability (r2). ID obs.wt PEV diag(G) r2 11 1.00 0.31 1.01 0.54 12 1.00 0.21 0.67 0.54 13 1.00 0.31 0.96 0.51 14 1.00 0.19 0.79 0.63 15 1.00 0.30 1.16 0.62 16 1.00 0.31 1.24 0.63 17 1.00 0.18 0.60 0.54 18 1.00 0.15 0.62 0.63 19 1.00 0.29 1.03 0.58 20 1.00 0.30 1.13 0.61 21 1.00 0.15 0.50 0.56 snp_blup_rel -m PvR1 -h2 0.4 markers.snp First_rel.out A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 304 In the second example, individual reliabilities are calculated by a SNP-BLUP model having an RPG effect with 10% weight. The pedigree relationship matrix A 22 is in the lower triangle dense format in file A22L.mat (see Supplemen- tary Materials). The command line has two additional options (“-w 0.1” and “-Alower A22L.mat”) to indicate the RPG effect in the model and the use of the exact RPG effect approach. This analysis is carried out by the command: snp_blup_rel -m PvR1 -h2 0.4 -w 0.1 -Alower A22L.mat markers.snp Second_rel_ w10.out In the third example, the RPG effect is approximated by MC sampling. In addition to the options of the first exam- ple, the number of MC samples (-MC) is set to 10000, pedigree (-ped) is in example.ped, and inbreeding coeffi- cients (-F) are in the file example.inbr. In this example, all animals have inbreeding coefficient of zero. This analy- sis is carried out by the command: snp_blup_rel -m PvR1 -h2 0.4 -ped example.ped -w 0.1 -MC 10000 -F example. inbr markers.snp Third_rel_MC_w10.out In practice, number of MC samples is often less than the number of SNP markers but in this small example, accu- rate MC approximation requires a high number of MC samples. Results from the second and third example have been combined in Table 5. RAM use Table 6 presents the computer memory requirements with different numbers of SNP markers, defined as observed peak RAM need. A dataset of 19757 genotyped animals was used where every animal had one observation. The number of SNP markers was increased from 5000 to 46914. Table 7 shows peak RAM for 46914 SNP markers when the number of genotyped individuals was increased from 5000 to 19757. In nearly all scenarios of the number of individuals, the required amount of peak RAM when using the total number of SNP markers (46914), was larger than the peak RAM used with a fixed number of genotyped animals (19757) and smaller number of SNP markers. Table 5. Results from snp_blup_rel analysis with a 10% RPG effect for the example case. Columns are for two approaches: exact (e) and MC sampling (MC). The columns are animal ID code (id), weight of observation (obs.wt), prediction error variance (PEV), diagonal of the genomic relationship matrix (diag(G)), and model reliability (r2). id obs.wt PEV e PEV MC diag(G) e diag(G) MC r2 e r2 MC 11 1.00 0.33 0.33 1.01 1.01 0.52 0.52 12 1.00 0.23 0.23 0.70 0.70 0.50 0.50 13 1.00 0.33 0.33 0.96 0.96 0.49 0.49 14 1.00 0.22 0.22 0.81 0.81 0.59 0.59 15 1.00 0.32 0.32 1.15 1.15 0.58 0.58 16 1.00 0.33 0.33 1.22 1.22 0.60 0.60 17 1.00 0.22 0.22 0.64 0.64 0.50 0.49 18 1.00 0.20 0.19 0.65 0.65 0.56 0.56 19 1.00 0.31 0.31 1.03 1.03 0.54 0.55 20 1.00 0.32 0.32 1.11 1.11 0.57 0.57 21 1.00 0.19 0.19 0.55 0.55 0.49 0.49 Table 6. Peak RAM use (in GB) for 19757 individuals and an increasing number of SNPs (5000 to 46914) Number of SNPs 5000 10000 20000 30000 40000 46914 RAM (GB) 1.77 3.34 8.14 15.21 24.51 32.24 Table 7. Peak RAM use (in GB) for 46914 SNP markers and an increasing number of individuals from 5000 to 19757 Number of individuals 5000 10000 15000 19757 RAM (GB) 27. 13 28.83 30.58 32.24 A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 305 Computing time The computing times for a fixed number of SNP markers were linearly related to the number of genotyped animals (Fig. 1). The computing times for a fixed number of individuals increased almost quadratically when the number of SNP markers increased (Fig. 2). The timings are based on a multi-core computer with 2 Intel Xeon E5-2680 v. 2 processors (2.8 GHZ; Intel Corp., Santa Clara, CA) for the computations and are limited to use at most 10 threads. Conclusions The snp_blup_rel is a user-friendly program for calculating model reliabilities for simple SNP-BLUP models. All computations in the program are optimized for speed under different memory requirements. Parallel computing is available through multithreading. The snp_blup_rel program integrates efficient and fast algorithms, thus lead- ing to high computing performance. Fig. 1. Computing time (in seconds) for 46914 SNP markers and an increasing number of individuals from 5000 to 19757 Fig. 2. Computing time (in seconds) for 19757 individuals and an increasing number of SNPs 5000 to 46914 A G R I C U LT U R A L A N D F O O D S C I E N C E H. Ben Zaabza et al. (2020) 29: 297–306 306 Availability and requirements The snp_blup_rel program comes free of charge for the scientific community and for Interbull member organi- zations. Users are required to credit its use in any publication. Commercial users must contact the authors. Snp_ blup_rel executable is available for Linux upon request from Luke (contact: ismo.stranden@luke.fi). Note that the snp_blup_rel program is under ongoing development, and due to the number of features, some combinations of options may not have been tested thoroughly. Thus, users use the program at their own risk. The simulated data that support the findings of this study are available in the Supplementary materials of this article. Acknowledgements Viking Genetics (Randers, Denmark) and Nordic Cattle Genetic Evaluation (Aarhus, Denmark) are acknowledged for providing the Finnish dairy cattle genotype data. References Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., Mckenny, A. & Sorensen, D. 1999. LAPACK Users’ Guide, 3rd edn. Philadelphia, PA, USA: SIAM. https://doi.org/10.1137/1.9780898719604 Ben Zaabza, H., Mäntysaari, E.A. & Strandén, I. 2020. Using Monte Carlo method to include polygenic effects in calculation of SNP- BLUP model reliability. Journal of Dairy Science 103: 51705182. https://doi.org/10.3168/jds.2019-17255 Fernando, R., Cheng, H. & Garrick, D.J. 2016. An efficient exact method to obtain GBLUP and single-step GBLUP when the genomic relationship matrix is singular. Genetic Selection Evolution 48: 1–12. https://doi.org/10.1186/s12711-016-0260-7 Forni, S., Aguilar, I. & Misztal, I. 2011. Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetic Selection Evolution 43: 1–7. https://doi.org/10.1186/1297-9686-43-1 Intel 2014. Intel Math Kernel Library reference manual. Accessed 11 July 2017. https://software.intel.com/en-us/mkl-reference- manual-for-fortran Liu, Z., Goddard, M.E., Hayes, B.J., Reinhardt, F. & Reents, R. 2016. Technical note: Equivalent genomic models with a residual polygenic effect. Journal of Dairy Science 99: 20162–015. https://doi.org/10.3168/jds.2015-10394 Liu, Z., VanRaden, P.M., Lidauer, M., Calus, M.P., Benhajali, H., Jorjani, H. & Ducrocq, V. 2017. Approximating genomic reliabilities for national genomic evaluation. Interbull Bulletin 51: 75–85 McPeek, M.S., Wu, X. & Ober, C. 2004. Best Linear unbiased allele-frequency estimation in complex pedigrees. Biometrics. 60: 359–367. https://doi.org/10.1111/j.0006-341X.2004.00180.x Strandén, I. & Vuori, K. 2006. RelaX2: pedigree analysis program. Proc. 8th WCGALP, Belo Horizonte, Brazil Strandén, I. & Garrick, D.J. 2009. Technical note: Derivation of equivalent computing algorithms for genomic predictions and reli- abilities of animal merit. Journal of Dairy Science 92: 2971–2975. https://doi.org/10.3168/jds.2008-1929 Strandén, I., Matilainen, K., Aamand, G.P. & Mäntysaari, E.A. 2017. Solving efficiently large single-step genomic best linear unbi- ased prediction models. Journal of Animal Breeding and Genetics 134: 264–274. https://doi.org/10.1111/jbg.12257 VanRaden, P.M. 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science 91: 4414–4423. https://doi.org/10.3168/jds.2007-0980 Snp_blup_rel: software for calculating individual animalSNP-BLUP model reliabilities Introduction Theoretical background EVB reliability of a genomic model Residual polygenic effect by Monte Carlo Algorithm and implementation Input files Output files Options Examples RAM use Computing time Conclusions Availability and requirements Acknowledgements References