FOCUS

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

 Statistical genomics and bioinformatics

Prem Narain
29278 Glen Oaks Blvd. W.

Farmington Hills, MI 48334 USA
E-mail: narainprem@hotmail.com

ABSTRACT

Some important and interesting topics in the newly emerging disciplines of Statistical genomics and bioinformatics
have been discussed briefly in relation to plants with possible references to fruit crops. This paper is therefore
divided into two parts relating to the two disciplines, respectively. In the first part, mapping of quantitative trait loci
(QTL), association mapping, mapping of gene expression transcripts (eQTL), marker-assisted selection, and a
systems approach to quantitative genetics have been dealt with. In the second part, generation of databases, annotation,
annotated sequence databases, and sequence similarity search have been described.

Key words: Statistical genomics, bioinformatics, fruit crops, eQTL, annotated sequence databases, sequence
similarity search

I. STATISTICAL GENOMICS

INTRODUCTION

Most of the traits of economic importance in plants
have an underlying genetic basis involving several genes,
and, are subject to modification by environmental factors.
Statistical considerations have been predominant in dissecting
such complex traits into estimable components (Narain,
1990). Heritability of a trait, as a proportion of the phenotypic
variation that is attributed to genetic causes, has been a
prime indicator helpful in taking decisions for genetic
improvement of economic traits. Prediction of response to
artificial selection (based on intensity and accuracy of
selection) and the existence of genetic variability have been
successful across several crop plants. However, relationship
between the phenotype and the genotype has been like a
black box where inferential approach has been the only way
to look into it. This scenario is now changing with advent of
the modern technologies of gene sequencing, microarray
experiments and the enormous advances made in attempts
to understand gene and protein expression within the cell of
an organism. In this context, information on molecular
markers has been extremely helpful in identifying regions
on chromosomes (QTL) that bring about variation in a trait,
thereby providing tools that can lead to far more accurate
selection procedures for genetic improvement. Saturated
genetic maps of markers, giving their order along a
chromosome and relative distances between them, have

been developed. Gene transcript data from microarray
experiments can be integrated with molecular marker
information to map expression traits (eQTL) that can possibly
lead to causal networks. The network approach connecting
data on genes, transcripts, proteins, metabolites, etc.
indicates emergence of a systems quantitative genetics
(Narain, 2009, 2010).

Mapping of Quantitative Trait Loci (QTL)

Genomic techniques like restriction  fragment
length  polymorphism (RFLP), random amplified
polymorphic DNA (RAPD), amplified  fragment  length
polymorphism (AFLP), variable number of tandem
repeats (VNTR) - that consist of micro satellites (short
sequences) termed as short tandem repeats (STR) or
simple sequence repeats (SSR) and mini satellites
(long sequences) - and single nucleotide polymorphisms
(SNP) have been developed that help in identification of
QTLs by correlation between a trait and its specific DNA
markers (Narain, 2000). The first problem is, therefore, to
construct a linkage map that indicates the position and relative
genetic distances between markers along the chromosomes.
Map distance is based on the total number of cross-overs
between the two markers, whereas, physical distance
between them is denoted in terms of nucleotide base pairs
(bp). A centi-Morgan (cM), corresponding to a cross-over
of 1%, may span 10 kbs to 1,000 kbs and can vary across
species.

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



86

Since marker genotypes can be followed for their
inheritance through generations, these can serve as
molecular tags for following the QTL, provided they are
linked to the QTL. This requires detecting the marker-QTL
linkage and, if established, estimating the QTL map position
on the chromosome. However, these problems depend on
whether we have data on experimental populations obtained
from controlled crosses, as in plants, or on natural populations
like humans where controlled crosses cannot be made.

The most popular method, given by Lander and
Botstein (1989), is that of simple interval mapping (SIM).
It involves formation of intervals by pairing of adjacent
markers and treating them as a single unit of analysis for
detection and estimation purposes. It is based on joint
frequencies of a pair of adjacent markers and a putative
QTL flanked by the two markers. Suppose markers A and
B are linked with recombination fraction r and QTL Q is
located between them with r

1
 recombination from A and r

2

from B. Then, r = r
1
+r

2 
-2r

1
r

2 
≅

 
r

1
+r

2
, on the assumption of

no interference and r so small that no double cross-overs
can be assumed. In the classical back-cross design with
three loci each with two alleles, A-a, B-b, and Q-q, the
expected frequencies for the eight marker-QTL genotypes
can be used to obtain conditional probabilities of the QTL
genotypes, given the marker genotypes. By setting up a
linear regression model between the trait (Y) and the
indicator variable (X) taking the value 1 if the QTL is QQ
and –1 if it is Qq, one can estimate a regression coefficient
that defines the allelic substitution effect of this QTL. In
such a model, the QTL genotype for a given individual is
unknown. X is then a random indicator variable with
conditional probabilities of obtaining QQ or Qq at the QTL.
This means the observed value is modelled as a mixture-
distribution with mixture ratios as the conditional
probabilities. We have, therefore, a situation often referred
to as a linear regression with missing data. The problem of
estimation then involves the use of EM algorithm. By
assuming that the character is normally distributed within
each of the eight marker-QTL classes with equal variance
σ2, one can set up a likelihood function in terms of unknown
parameters, and develop a log likelihood ratio ( Λ ) for testing
the hypothesis that the QTL is not located in the interval
where the log likelihoods are evaluated using the maximum
likelihood estimates of the genotypic values for the two QTL
genotypes, the variance σ2 and the recombination fraction
r

1
 between marker A and the putative QTL using iterative

procedures based on EM algorithm. This statistic is
distributed as χ2 with 1 d.f. The associated lod score for the

interval mapping is then (½) (log
10

e) Λ. This statistic is
evaluated at regularly-spaced points; say 1 or 2 cm distance,
covering the interval as a function of the presumed QTL
position. Repeating this procedure for each interval along
the chromosome and plotting the lod score curve against
the interval gives a QTL likelihood map that presents
evidence for the QTL at any position in the genome.
Presence of a putative QTL is assumed if lod score exceeds
a certain threshold T and the maximum of the lod score
function in the map gives an estimate of the QTL position
and gene effects. Mapping of QTL by interval method is
widely used in practice. Analysis is done through the software
MAPMAKER/QTL.

Although SIM is the method for QTL mapping most
widely used with advantage in several practical situations,
it ignores the fact that most quantitative traits are influenced
by numerous QTLs. This is overcome either by adopting a
model of Multiple QTL Mapping (MQM) or by combining
SIM with the method of multiple linear regression, a
procedure known as composite interval mapping (CIM).
In all these methods, one uses the approach of maximum
likelihood that produces only point estimates of the
parameters such as the number of QTLs, their location, and
effects. The corresponding confidence intervals are required
to be determined separately by re-sampling methods.
Further, the correct number of QTLs is difficult to determine
using traditional methods. Their incorrect specification leads
to distortion of the estimates of locations and effects of
QTLs. To address these problems, a Bayesian approach is
often adopted wherein the joint posterior distribution of all
the unknown parameters given their prior distributions and
the observed data is computed. For details of these various
aspects, one can refer Narain (2003a, 2005).

The first application of interval mapping in plant
breeding was to an inter-specific backcross in tomato. The
parents for the back-cross were the domestic tomato
Lycopersicon esculentum (E) with fruit mass 65 g and a
wild South American green-fruited tomoto L. chmielewskii
(CL) with fruit mass 5 g. A total of 237 back-cross plants
were assayed for continuously varying characters like fruit
mass, soluble-solids concentration and pH, and, 63 RFLP
and 20 isozyme markers spaced at approximately 20 cM
intervals were selected for QTL mapping. A threshold T=2.4,
giving a probability of under 5% that even a single false-
positive will occur anywhere in the genome, was used. This
corresponds approximately to significance level for any
single test as 0.001. The resulting QTL likelihood maps
revealed multiple QTLs for each trait (6 for fruit weight, 4

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

Prem Narain

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



87

for concentration of soluble solids and 5 for fruit pH) and
estimated their location to within 20-30 cm.

Fruit crops

Fruit crops differ from most of the agronomic/forest
crops in that they have large plant size, long intergeneration
period due to their extended juvenile phase, asexual
propagation, high heterozygosity and polyploidy. These
practice outcrossing and have a long life. They are mostly
woody perennials and their products are usually perishable.
The major temperate fruit crops belong to Rosaceae family.
The most important genera of this family are Prunus, Malus,
Pyrus, Fragaria, and Rosa. Important members of the
genus prunus are peach, cherry, plum, apricot, almond and
of the genus Malus is apple. They have been slow to respond
to new technologies in breeding, until recently. Characters
like yield, blooming, harvesting time and fruit quality have
been studied with the help of molecular markers in several
fruit crops. Long period from seed to fruiting in such crops
is a major problem in breeding studies involving crosses.
Vegetative reproduction, on the other hand, allows every
population to be immortalized and one can study a given
character for as many years and in as many different
environments as one wants. Interspecies crosses are
possible and most of them have small genomes. For instance
peach, the best characterized among Prunus species, has
a haploid genome size of 164 Mbp only. Most of the Prunus
species are diploid, with 8 pairs of chromosomes whereas,
apple and pear are allotetraploid with 17 pairs of
chromosomes.

Saturated linkage maps with transferable markers,
RFLPs, and microsatellites have been developed to provide
basic tools for studies on QTLs and marker-assisted selection
in fruit tree breeding. As a result of a European project, a
saturated linkage map of 246 markers (235 RFLPs and 11
isozymes) constructed from an F

2
 progeny derived from

almond (cv. Texas) x peach (cv. Earlygold) cross – termed
TxE map- indicated 8 linkage groups (G1 to G8) with a total
distance of 491 cm. This led to a Prunus reference map
with 652 markers and a further set of 13 maps constructed
with a sub-set of these markers has enabled genome
comparisons among seven Prunus diploid species (almond,
peach, apricot, cherry, Prunus ferganensis, Prunus
davidiana, and Prunus cerasifera). These have helped
establish the position of 28 major genes affecting various
agronomic characters in different species of Prunus crops
(Dirlewanger et al., 2004).

The first linkage map in apples was constructed by

a European Consortium based on F
1
 progeny derived from

the cross cv. Prima x cv. Fiesta (FxF map). There were a
total of 290 markers consisting of RFLPs, SSRs, isozymes,
RAPD etc., distributed over 17 linkage groups.  A more
saturated map was constructed with the F

1
 progeny derived

from the cross cv. Fiesta x cv. Discovery (FxD map) using
840 markers that included 129 SSRs. These maps have been
helpful in QTL studies on apple. A comparison between
apple and Prunus maps suggests a high degree of synteny
between these two genera. QTLs for blooming, ripening
and fruit quality have been found in peach and apple. Some
of these QTLs were found to be located in regions of the
genome where major genes were earlier mapped. For
instance, in peach a major gene responsible for low fruit
acidity was in the same region as QTLs affecting fruit quality,
a quantitative trait. In apple too, a major gene coding for
malic acid content is located in the same region as QTLs
for fruit quality.

Various populations of peach x Prunus davidiana
crosses with different levels of introgression of the Prunus
davidiana genome into the cultivated peach viz. F

1
, F

2
 or

BC2 were used to discover the positions of respective QTLs.
About 13 QTLs explained up to 65% of the total phenotypic
variation for powdery mildew resistance in plants exposed
to the disease in different times and environments.

Candidate gene approaches have been adopted for
finding associations between genes involved in relevant
metabolic pathways and major genes or QTLs in fruit trees.
Several resistance gene analogs (RGAs) were mapped in
Prunus that are at similar genomic positions as genes or
QTLs which determine ‘sharka‘ resistance in apricot or root-
knot nematode resistance in peach and plum.

Linkage Disequilibrium or Association Mapping

The mapping of QTLs in plants based on data
collected from pedigrees of populations formed by crossing
inbred lines is on a coarser scale, so that a QTL detected is
likely to refer to several genes in a chromosomal region.
The approach of population-based association mapping that
involves linkage disequilibrium (LD) between markers and
the genes underlying complex traits leads, on the other hand,
to more accurate mapping of genes. The key idea is that a
disease mutation assumed to have arisen once on the
ancestral haplotype of a single chromosome in past history
of the population of interest is passed on from generation to
generation, together with markers at tightly linked loci,
resulting in LD. The use of this approach in horticultural
crops, though widely prevalent in human genetics, is limited.

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

 Statistical genomics and bioinformatics

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



88

Advantages of the two approaches can be combined by
detecting QTL initially using linkage mapping with moderate
number of markers, followed by a second-stage of high-
resolution association mapping in QTL regions that
capitalizes on a high-density marker map.

Benefits of linkage and association mapping have
recently been combined in a single population of maize by
adopting a nested association mapping (NAM) approach.
The maize NAM population was derived by crossing a
common reference sequence strain to 25 different maize
lines. Individuals resulting from each of the 25 crosses were
self-fertilized for four further generations to produce 5,000
NAM recombinant inbred lines (RILs). This population was
first used for initial detection of QTL using the linkage
mapping approach. Subsequently, within each diverse strain,
high-resolution association mapping was adopted with a
high-density marker map. It is significant to note that within
each RIL, all individuals are genetically nearly identical. This
means we can estimate true breeding value of each line far
more accurately by averaging phenotypic measurements of
a given trait taken on several individuals with the same
genotype.

In a recent experiment, genetic architecture of
flowering time in Zea mays (maize) was dissected using
NAM. About 1 million plants were assayed in eight
environments to map the QTLs. About 29 to 56 QTLs were
found to affect flowering time. These were small-effect
QTLs shared among the diverse families. The analysis
showed, surprisingly, absence of any single large-effect
QTL. Moreover, no evidence was found of epistasis or
environmental interactions. Flowering time controls
adaptation of plants to their local environment in an out-
crossing species like Zea mays. A simple, additive genetic
model predicting accurately flowering time in this species
is, thus, in sharp contrast to that observed in several plant
species which practice self-fertilization (Buckler et al.,
2009).

Mapping QTLs for Gene Expression profile (eQTL)

The advent of DNA chip technology in the form of
cDNA and oligonucleotide microarrays has provided huge
and complex data-sets on gene expression profiles of
different cell lines from various organisms. Such gene
expression profiles have recently been combined with linkage
analysis, based on QTL mapping, through molecular markers
in what has been termed ‘genetical genomics’ (Jansen and
Nap, 2001). Gene expression, in terms of transcript levels,
for each individual of a segregating population are

phenotypes that are correlated with markers, genotyped for
that individual, to identify QTLs and their location on the
genome to which the expression trait is linked. Such
expression quantitative trait loci (eQTL) studies are similar
to traditional multi-trait QTL studies, but with thousands of
phenotypes. It is also important to note that, underlying the
gene expression differences, there are two types of
regulatory sequence variation. One is cis-regulatory that
affects its own expression and the other is trans-acting or
protein coding that affects expression of other genes. The
first attempt where transcript abundance was used to study
the linkage with QTLs was on budding yeast (Brem et al,
2002) based on a cross between a laboratory strain and a
wild strain, the parents being haploid derivatives. Heritability
estimation was based on haploid segregants and the linkage
with a marker was tested by partitioning the segregants into
two groups, according to marker genotypes, and comparing
expression levels between groups, with Wilcoxon-Mann-
Whitney test. They found 8 trans-acting loci, each affecting
expression of a group of 7 to 94 genes of related function.
Since then, several eQTL studies have been published in
species like mice, maize, humans, rats and Arabidopsis
thaliana.

Apart from study of the eQTL in yeast, Foss et al.
(2007) investigated protein QTL in the same population of
the yeast using mass spectrometry. Comparison between
genetic regulation of proteins and that of the transcripts
revealed that loci that influenced protein abundance differed
from those that influenced transcript levels, much against
expectations.

Marker–Assisted Selection (MAS)

Molecular markers such as those provided by RFLP
have not only made it possible to detect and estimate effects
of QTLs,  but can also be used as a criterion of indirect
selection for genetic improvement of a given quantitative
trait – a procedure of selection which has come to be known
as marker-assisted selection (MAS). The underlying basis
of MAS is the correlation between a trait and the marker
genotype, which gets generated due to linkage disequilibria
between the QTL and marker loci. The fact that such
information can be integrated with those of artificial selection
on individual and/or collateral basis, to increase the efficiency
of selection, was demonstrated by the work of Lande and
Thompson (1990). They showed that relative efficiency of
the selection index, combining phenotypic and molecular
information optimally, is a function of heritability (h2) of the
trait and the proportion (p) of additive genetic variance of

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

Prem Narain

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



89

the trait that is associated with marker loci. This efficiency
is always one when h2=1, the phenotype being a perfect
indicator of its breeding value. But, for a character with
low heritability, the efficiency can be substantially high,
provided p is high. This means the value of maker information
can be very great if a larger proportion of additive genetic
variance is associated with the markers. Efficiency is
maximum when p=1 and is (1/h), that becomes infinitely
large for extremely small h. In that case, all of the weight in
selection index is put on molecular information. If we select
only on the basis of marker information, the efficiency,
relative to individual selection with the same intensity, would
be. This shows that when p>h2, selection based on marker
information alone would be more efficient than individual
phenotypic selection.

Increased efficiency of MAS, however, is
accompanied by increased cost involved in sample collection,
DNA extraction and typing of individuals in the sample,
compared to that involved in taking simple measurements
of the trait. Cost reduction for MAS can be achieved in
several ways. Marker technologies such as those based on
polymerase chain reaction (PCR) may reduce the cost of
MAS. Selective genotyping of the extreme progeny, as
advocated by Lander and Botstein (1989), is another way.
Yet another way could be to bring in auxiliary information
from other traits that are correlated with the main trait, and
are cheaper to measure. This idea has been used in the
past by several workers to increase the efficiency of
individual and family selection itself, by including in the index
one or more auxiliary traits in conjunction with the main
trait. As a matter of fact, molecular information in MAS is
itself a sort of auxiliary information, but obtained at a higher
cost. Narain (2003b), therefore, showed how the efficiency
of MAS behaved if information on one or more auxiliary
traits with the corresponding molecular scores was combined
with that on the main trait, in an optimal manner.

Fruit crops

In fruit crops, molecular markers are used for
screening and selecting the best seedlings several years
before the characters are evaluated in the field. It saves
space and time so important in woody perennials. Marker-
assisted selection in such crops is, however, mostly based
on major genes, since several characters like disease
resistance, flower/fruit/nut quality are found to be controlled
by major genes that follow a simple inheritance pattern.
Markers tightly linked to such genes are searched for early
selection. They are primarily used for characters that cannot

be evaluated till the plant has reached the adult stage, such
as fruit characters or self-incompatible genotypes. For
instance, gametophytic self-incompatibility in almond, apricot
and cherry is one such trait that is encoded by a highly
polymorphic locus (S/s) located in the distal part of G6 linkage
group. With determination of the sequences of the
polymorphic S-RNase gene at this locus, a number of
species-specific and allele-specific DNA markers were
discovered that were used for early and more accurate
selection of self-incompatibility or self-compatibility alleles.
Markers close to the two genes of resistance to root-knot
nematodes are used for selection of resistant Prunus
rootstocks. The resistance gene Ma/ma from Myrobalan
plum and located on G7 linkage group, and another one from
peach cv. Nemared (Mi/mi) located on G2 linkage group,
have been screened with markers in a search for rootstocks
that pyramid both resistance genes in a three-way progeny
obtained from peach, almond and Myrobalan plum.

Marker-assisted selection for disease resistance is
quite widespread in apple as a means of early selection,
and, to pyramid resistance genes.

Systems approach

As we know, the central dogma of molecular
biology stipulates that sequence information flows from DNA
to RNA to protein but not in the reverse direction. But,
Kimchi-Sarfaty et al (2007) reported data that indicate that
a protein’s three-dimensional structure is not necessarily
determined by its amino acid sequence that has been
specified by the DNA sequence. An mRNA, if subjected to
translational braking, can generate a protein with a structure
different from that specified by the DNA sequence. This
has been termed ‘translation-dependent folding’ (TDF)
hypothesis (Newman and Bhat, 2007). Differential gene
expression resulting in transcripts as sub-phenotypes could,
then, lead to different proteins and give results similar to
those obtained in the yeast experiment, as reported by Foss
et al (2007). Genes and proteins are, therefore, required to
be considered simultaneously to unravel the complex
molecular circuitry operating within a cell. One has to have
a global perspective of genotype-phenotype relationship,
instead of individual components like DNA or protein in a
cellular system.

It seems the interplay of genotype-phenotype
relationship for quantitative variation is not only complex
but also needs a closer look at how we view this relationship
– whether purely at the DNA-RNA level (as in the
reductionist approach) or at the level of cell as a whole

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

 Statistical genomics and bioinformatics

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



90

(where DNA-RNA are just parts of the cellular system with
other contextual forces present in the micro-environments
of the cell, also playing their own important roles). Such
situations have also been noticed in agricultural
experimentation where a dialectical approach has been
advocated (Narain, 2006, 2008). In the grain production
process, it is also important to study how this process affects
soil health and the ecosystem surrounding the plant, as is
studying the effect of inputs on production. In the dialectical
approach, this relationship between the plant and its
environment is studied both ways – input to output as well
as output to input, a sort of feedback. A similar possibility
seems to exist in the genotype and phenotype relationship
within a cell. The protein as a phenotype is determined by a
DNA sequence as the genotype, but the reverse
phenomenon of protein affecting the DNA could also take
place at the expense of violating central dogma. In fact,
studies are on to explore biochemical signaling pathways
that regulate function of living cells through regulatory
networks having positive and negative feedback loops, though
it is unclear how genetics can be incorporated into it. These
feedback loops are basically cybernetic concepts that are
inherent in the dialectical approach. This approach takes
into account dynamics of the system over time as well, in
which, development is a consequence of opposing forces.
This is based on the concept of contradiction inherent in
the meaning of dialectics. Things change because of the
action of opposing forces on them, and things remain what
they are because of temporary balance of the opposing
forces. Opposing forces are seen as contradictory in the
sense that each taken separately would have an opposite
effect, but their joint action may be different from result of
either acting alone. These forces are, however, part of self-
regulation and development of the object is regarded as a
network of positive and negative feedback loops,
incorporation of which (in the genetic context) would violate
the central dogma. Genes, transcripts, proteins, metabolites,
physical components, etc., can be regarded as ‘parts’ of
the cellular system and the ‘whole’ is regarded as a relation
of these parts that acquire properties by virtue of being parts
of a particular whole. As soon as the parts acquire properties
by being together, they impart to the whole new properties
that are, in turn, reflected in changes in the parts, and so on.
Parts and whole, therefore, evolve as a consequence of
their relationship, and the relationship itself evolves. Genes
are fixed, but their expression-the transcript-is not. At any
given moment of time, genes are expressed as per
requirement of the cell and through information contained
in its DNA. At this moment of time, the cellular system is

said to have a particular state of the system. At the next
moment of time, the same genes may be expressed, but
differently, depending upon the then requirement of the cell
and based on the feedback, if any, from the system’s state
at the previous time point, assuming that the process is
Markovian. This gives the next state of the system, which
might or might not be different from the previous state. And,
the process goes on continually, modifying the relationship
between different parts of the system based on interactions
and feedbacks. It seems that a dialectical approach could
provide the clue for understanding how ‘parts’ of a system
and the ‘whole’ system behave in the context of genetics.

II. BIOINFORMATICS

INTRODUCTION

Genomic research is creating quantities of data at
unprecedented scales by looking at either all genes in a
genome, or all transcripts in a cell, or else all metabolic
processes in a tissue in several species, in general, and in
agriculture in particular. Very soon new genomic technologies
will enable individual laboratories to generate terabyte or
even petabyte scales of data. To handle these data, to make
sense of them and render them accessible to biologists, is
the task of a newly emerging field of bioinformatics existing
at the interface of biological and computational sciences -
computer based analysis of large biological data sets. The
data sets usually pertain to macromolecular sequences
(DNA, RNA and protein sequences), protein structures, gene
expression profiles and biochemical pathways. It has three
components. Firstly, it involves development of databases
to store and search data. Secondly, it deals with statistical
tools and algorithms to analyze and determine relationships
between data sets. Lastly, it involves application of the tools
for analysis and interpretation of various types of genomic
data. For a brief discussion on these aspects, reference may
be made to Narain (2005). Here, we discuss primarily those
aspects that relate to plant genomes.

Generation of Databases

DNA sequences stored in databases are of three
types: genomic DNA, cDNA and recombinant DNA.
Genomic DNA, taken directly from the genome, contains
genes in their natural state which, in eukaryotes, include
introns, regulatory elements and a large amount of
surrounding inter-genic DNA. cDNA is reverse-transcribed
from mRNA and corresponds to only expressed parts of
the genome, there being no introns. It gives direct access to
genes that represent only a small percentage of the entire
sequence.  Recombinant DNA comes from the laboratory,

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

Prem Narain

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



91

being artificial DNA molecules – sequence of vectors such
as plasmids, modified viruses and other genetic elements
used in the laboratory.

High quality sequence data is generated by
performing multiple reads on both DNA strands. Sequence
data of lower quality can, however, be generated by single
reads – single pass sequencing on a much larger scale,
quickly and cheaply. Expressed sequence tags (ESTs) are
generated by single-pass sequencing of random clones from
cDNA libraries and are used to identify genes in genomic
DNA as well as to prepare large clone sets for DNA
microarrays. Most RNA sequences are deduced from the
corresponding DNA sequences, or, from a cDNA sequence.
The latter is more informative due to it being extensively
processed during synthesis. For example, introns are spliced
out of a primary transcript to generate mature mRNA.

Plant sequence data are generated through (i) whole
genome sequencing, (ii) sample sequencing of bacterial
artificial chromosomes (BACs), (iii) genome survey
sequencing (GSS), and (iv) sequencing of expressed
sequence tags (ESTs). An integrated database and suite of
analytical tools to organize and interpret these data, has been
developed and is known as PlantGDB (vide the website
http://www.plantgdb.org/).

Annotation

Annotation means obtaining useful biological
information (structure and function of genes and other
genetic elements) from raw sequence data. Since
prokaryotes and eukaryotes differ in their structure and
genome organization, their annotations involve different
problems. Prokaryotes have high gene-density with virtually
no introns, but in eukaryotes, gene-density is low and the
genome has greater complexity.

We have two groups of annotation - structural
annotation and functional annotation. In the former, we are
concerned with finding genes and other genetic elements in
genomic DNA. In the latter, we assign functions to the
discovered sequences.

Annotated Sequence Databases

The following three repositories and resources for
primary sequence data are available where each entry is
extensively annotated. They can be accessed freely over
the World Wide Web (www).

(i) Gene Bank of the National Centre for Biotechnology
Information (NCBI)

(ii) Nucleotide Sequence Database of European
Molecular Biology Laboratory (EMBL)

(iii) DNA Databank of Japan (DDBJ).

New sequences can be deposited in any of the
databases, since, these exchange data on a daily basis. The
main sequence databases have a number of subsidiaries
for storage of particular types of sequence data. For example,
dbEST is a division of Gen Bank which is used to store
expressed sequence tags (ESTs). Other divisions of Gen
Bank include dbGSS, dbSTS - used to store sequence
tagged sites (STSs) - and several others.

These large database providers, however, do not
give non-redundant and curated records, so that detailed
analysis cannot be performed at the resource site by the
user. A data- base like PlantGDB, which downloads raw
plant genomic data from Gen Bank, overcomes such
difficulties and provides curated records with detailed and
updated information. It organizes EST sequences into
contigs that represent tentative unique genes. They are duly
annotated and linked to their respective genomic DNA. The
data-base gives the basis for identifying genes common to
particular species by integrating a number of bioinformatics
tools that help in gene prediction and cross-species
comparison - the goal of comparative genomics.

Besides PlantGDB database, there are species-
specific databases like The Arabidopsis Information
Resource (TAIR), MaizeGDB, Gramene, a tool for grass
genomics, and the Stanford Microarray Database. The
PlantGDB genome browsing capabilities for Arabidopsis
are made possible by A. thaliana Genome Database
(AtGDB; http://www.plantgdb.org/AtGDB/). This database
stores EST and cDNA spliced alignments along with current
Arabidopsis genome annotation.

As we know Arabidopsis thaliana, which is a
small mustard species – eukaryotic and self-pollinating –
is already playing an important role as a model organism in
development of plant molecular biology, by way of providing
increased knowledge and understanding of the plant’s
functional and developmental processes. It has a rapid life
cycle and can be easily grown in laboratory in large numbers.
Its entire genome, that is highly compact and consists of
about 130 Mb with little interspersed repetitive DNA, has
been sequenced. Many thousands of Arabidopsis plants
can be grown on a bench to search for particular mutants
which can then be isolated and genes cloned for use in other
crops. It is related to many food plants like rice, wheat,
maize, sorghum, millets, etc., and can, therefore, provide a

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

 Statistical genomics and bioinformatics

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



92

focus from which genome content of other higher plants
can be extrapolated.

Fruit crops

In regard to horticultural crops, an international
consortium led by Albert Abbott at Clemson University
(Clemson, SC), developed databases on Prunus genome.
Using RFLPs on the TxE map and a BAC library of peach
cv. Nemared, a physical map was assembled. A growing
collection of ESTs from peach and almond, based on cDNA
libraries, was released to public databases and more than
3,800 peach putative unigenes were detected. About 2,000
of these unigenes were assigned to specific BAC that
contain them. Recently, a Rosaceae database
(www.genome.clemson.edu/gdr) has been developed that
includes apple, peach, cherry, plum, apricot, pear, etc.

Sequence Similarity Searches
Due to molecular evolution, macromolecule

sequences share a common ancestor resulting in similarity
in their sequences, structure and biological functions. On
the other hand, any pair of sequences will share a certain
degree of similarity, due to chance alone. For example, DNA
sequences are constructed from an alphabet of only four
letters, viz., A, T G and C. Any sequence that consists of a
mixture of these letters will show some similarity to any
other similarly-constructed sequence. We need to distinguish
between such a chance similarity and similarity resulting
from real evolutionary and/or functional relationship. This
requires use of appropriate statistical methods.

Sequences are first aligned in terms of their letters.
When identical letters get aligned, we say that these letters
were part of the ancestral sequence and have remained
unchanged. When non-identical letters get aligned, we say
that a mutation has occurred in one of the sequences. It
may also happen that some letters in a particular sequence
lack an equivalent in the other sequence, resulting in a gap.
This could be due to insertion or deletion of letter/s in one of
the sequences, with respect to the ancestral sequence.
Dynamic programming algorithms – computational methods
- can calculate the best alignment of two sequences. The
algorithm takes two input sequences and produces the best
alignment between them as the output. Well-known
algorithms are Smith-Waterman algorithm (local alignment)
and Needleman-Wunsch algorithm (global alignment).

To quantify similarity, a simple alignment score
measures the number or proportion of identically matching
residues. Gap penalties are subtracted from such scores to

ensure that alignment algorithms produce biologically
sensible alignments, without too many gaps. Gap penalties
may be constant, i.e., independent of the length of the gap
or be proportional to the length of the gap, or else may be
affine, i.e., containing gap-opening and gap-extension
contributions.

We have often a query sequence about which we
need to predict the structure and/or the function. We perform
sequence similarity searches of databases in which the query
sequence is aligned (compared) to each database sequence
in turn and then rank the database sequences with the highest
scoring (most similar) at the top. This can be achieved by
the dynamic programming method with Smith-Waterman
algorithm but the procedure is very slow, taking hours, for
searching large databases. On the other hand, algorithms
like BLAST (Best Local Alignment Search Tool) and
FASTA provide very fast (about five to fifty times faster)
searches of sequence databases. They are however less
accurate than the dynamic programming method which
provides the best possible alignment to each database
sequence. Each of the BLAST and FASTA operates by
first locating short stretches of identically or near-identically
matching letters (words) –assumed to lead to high scoring
alignment - that are eventually extended into longer
alignments.

Acknowledgements

This work was supported by the Indian National
Science Academy, New Delhi, under their programme
“INSA Honorary Scientist”.

REFERENCES

Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. 2002.
Genetic dissection of transcriptional regulation in
budding yeast. Science, 296: 752-755

Buckler, E.S., Holland, J.B., Bradbury, P.J., Acharya, C.B.,
Brown, P.J., Browne, C., Ersoz, E., Flint-Garcia, S.,
Garcia, A., Glaubitz, J.C., Goodman, M.M., Harjes,
C., Guill, K., Kroon, D.E., Larsson, S., Lepak, N.K.,
Li, H., Mitchell, S.E., Pressoir, G., Peiffer, J.A., Rosas,
M.O., Rocheford, T.R., Romaij, M.C., Romero, S.,
Salvo, S., Villeda, H.S., da Silva, H.S., Sun, Q., Tian,
F., Upadyayula, N., Ware, D., Yates, H., Yu, J., Zhang,
Z., Kresovich, S. and McMullen, M.D. 2009. The
genetic architecture of maize flowering time. Science,
325: 714-718

Dirlewanger, E., Graziano, E., Joobeur, T., Garriga-Caldere,
F., Cosson, P., Howad, W. and Arus, P. 2004.
Comparative mapping and marker-assisted selection

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

Prem Narain

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no



93

in Rosaceae fruit crops. PNAS, 101: 9891-9896
Foss, E.J., Radulovic, D., Shaffer, S.A., Ruderfer, D.M.,

Bedalov, A., Goodlett, D.R., and Kruglyak, L. 2007.
Genetic basis of proteome variation in yeast. Nat.
Genet., 39: 1369-1375

Jansen, R.C. and Nap, Jan-Peter. 2001. Genetical genomics:
the added value from segregation. Trends in Genetics,
17: 388-391

Kimchi-Sarfaty, C., Oh, J.M., Kim, I.W., Sauna, Z.E.,
Calcagno, A.M., Ambudkar, S.V. and Gottesman,
M.M. 2007. A “silent” polymorphism in the MDRI
gene changes substrate specificity. Science, 315: 525-
528

 Lande, R. and Thompson, R. 1990. Efficiency of marker-
assisted selection in the improvement of quantitative
traits. Genetics, 124: 743-756

Lander, E.S. and Botstein, D. 1989. Mapping Mendelian
factors underlying quantitative traits using RFLP
linkage maps. Genetics, 121: 185-199

Narain, P. 1990. Statistical Genetics. New York: John Wiley
and Wiley Eastern Ltd., New Delhi. Reprinted in
1993. Published by the New Age International Pvt.

Ltd., New Delhi in 1999. Reprinted in 2008
Narain, P. 2000. Genetic diversity – conservation and

assessment. Curr. Sci., 79:170-175
Narain, P. 2003a. Evolutionary genetics and statistical

genomics of quantitative characters. Proc. Ind. Natl.
Sci. Acad., B69:273-352

Narain, P. 2003b. Accuracy of marker-assisted selection
with auxiliary traits. J. Biosci., 28:569-579

Narain, P. 2005. Mapping of Quantitative Trait Loci. The
Mathematics Student, 74:7-18, Printed in 2007

Narain, P. 2006. Statistical Tools in Bioinformatics. The
Mathematics Student, 75:17-27, Printed in 2007

Narain, P. 2006. Dialectical agriculture. Natl. Acad. Sci.
Lett., 29:253-260

Narain, P. 2008. Dialectical approach to agriculture. Proc.
Indian Natn. Sci. Acad., 74:61-66

Narain, P. 2009. The Genetic Architecture of Quantitative
Variation. Natl. Acad. Sci. Lett., 32:135-1437

Narain, P. 2010. Quantitative Genetics: past and present.
Mol. Breeding, 26:135-143

Newman, S. A. and Bhat, R. 2007. Genes and proteins:
Dogmas in decline. J.Biosci., 32:1041-1043

J. Hortl. Sci.
Vol. 5 (2): 85-93, 2010

 Statistical genomics and bioinformatics

Prinect Color Editor
Page is color controlled with Prinect Color Editor 4.0.70
Copyright 2008 Heidelberger Druckmaschinen AG
http://www.heidelberg.com

You can view actual document colors and color spaces, with the free Color Editor (Viewer), a Plug-In from the Prinect PDF Toolbox. Please request a PDF Toolbox CD from your local Heidelberg office in order to install it on your computer.

Applied Color Management Settings:
Output Intent (Press Profile): ISOcoated_v2_eci.icc

RGB Image:
Profile: eciRGB.icc
Rendering Intent: Perceptual
Black Point Compensation: no

RGB Graphic:
Profile: RGB2CMYK.icc
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent RGB/Lab Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Image:
Rendering Intent: Perceptual
Black Point Compensation: no

Device Independent CMYK/Gray Graphic:
Rendering Intent: Perceptual
Black Point Compensation: no

Turn R=G=B (Tolerance 0.5%) Graphic into Gray: yes

Turn C=M=Y,K=0 (Tolerance 0.1%) Graphic into Gray: no
CMM for overprinting CMYK graphic: yes
Gray Image: Apply CMYK Profile: no
Gray Graphic: Apply CMYK Profile: no
Treat Calibrated RGB as Device RGB: no
Treat Calibrated Gray as Device Gray: yes
Remove embedded non-CMYK Profiles: no
Remove embedded CMYK Profiles: yes

Applied Miscellaneous Settings:
Colors to knockout: no
Gray to knockout: no
Pure black to overprint: no
Turn Overprint CMYK White to Knockout: yes
Turn Overprinting Device Gray to K: yes
CMYK Overprint mode: set to OPM1 if not set
Create "All" from 4x100% CMYK: yes
Delete "All" Colors: no
Convert "All" to K: no