ABSTRACT


30 

 
The Jolly-Seber-Tag-Loss Model with Group Heterogeneity 

 
Selina Gonzalez and Dr. Laura Cowen

 
Abstract: Mark-recapture experiments are performed to estimate population parameters such 
as survival probabilities. Animals are captured, tagged, released, and recaptured at subsequent 
time periods in order to obtain parameter information. The Jolly-Seber-Tag-Loss (JSTL) model 
(Cowen & Schwarz, 2006) requires some individuals to be double tagged in order to account for 
the possibility of animals losing their tags. The Jolly-Seber-Tag-Loss model does not, however, 
consider the possibility of parameters being different among different groups of individuals, that 
is, group heterogeneity (for example, males may have higher capture probabilities than 
females).  Our research extends the Jolly-Seber-Tag-Loss model to account for this possibility of 
group heterogeneity among parameters. We use a Newton-Raphson method to obtain 
maximum likelihood estimators and R software to create a program that estimates population 
parameters from tag histories. Our simulation study concludes that when group heterogeneity 
exists, accounting for this group heterogeneity results in more accurate parameter estimates 
than the original JSTL model. We present the group heterogeneous JSTL (g-hJSTL) for this 
purpose. 

 
Key Terms: capture-recapture, tag loss, Jolly-Seber, heterogeneity, double tagging 

 
Introduction 
 

Mark-recapture experiments are designed to study animal populations and, in particular, to 
estimate their parameters, such as survival rates. Early tagging studies of waterfowl resulted in the 
development of the Lincoln-Peterson estimator used to estimate population size. Limitations of the 
Lincoln-Peterson estimator led to the development of models that dealt with studies with multiple 
sample times, dealt with death and immigration, and allowed for flexibility in model parameters (e.g. 
survival varying over time or age) (Cormack, 1964; Jolly, 1965; Seber 1965). In these studies, animals are 
marked with unique tags, released, and recaptured at discrete sampling times (Williams, Nichols, & 
Conroy, 2002). If tags are lost during such experiments, bias may occur in parameter estimates. Cowen 
and Schwarz (2006) developed the Jolly-Seber-Tag-Loss (JSTL) model to account for the possibility of tag 
loss, using double tagging experiments in open populations. 

 
 One of the important assumptions of the JSTL model is that there is a single population group; 
that is, all the population parameters are homogeneous across all individuals of the population. The 
limitations of the JSTL model are thus particularly significant when different population groups (for 
example, males and females) are heterogeneous in terms of their parameters. Otis, Burnham, White, 
and Anderson (1978) discuss statistical inference of biological hypotheses such as capture 
heterogeneity. Here, they found that mice that had been caught one or more times were “trap happy,” 
thus the assumption of homogeneity among capture probabilities was violated, and so behaviour of the 
animals amounts to heterogeneity in population capture rates. The same is true for different groups of 
animals within a population, where, for example, females are more trap happy than males (for example, 
Chinook salmon in Cowen et al. (2007)). We extend the JSTL model to allow for heterogeneity of 


31 

 
parameters across multiple population groups. The group-heterogeneous JSTL (g-hJSTL) model allows 
for more accurate parameter estimation when these parameters vary across groups. These model 
improvements will benefit researchers for whom high precision regarding population information is 
essential. For example, if a disease affects only one gender of a species, it would be very important to be 
able to observe that gender’s survival rates and other attributes vis-à-vis the other gender’s parallel 
attributes.  

 
Experimental Design 
 

The study design is comparable to that outlined by Williams, Nichols, and Conroy (2002) and by 
Lebreton, Burnham, Clobert, and Anderson (1992). Briefly, animals are captured at an initial sample 
time, marked (some with single tags and some with double tags), released, and recaptured at 
subsequent sampling times. Untagged individuals may be captured, tagged, and released at any 
sampling time. The primary study area is chosen to cover most of the population’s range, and sampling 
times cover the whole period of interest. Furthermore, the population group to which each observed 
animal belongs is identified and recorded. 

 
 Data are in the form of tag histories, a series of 1s and 0s, where a 1 indicates the tag was 
observed on a particular animal and a 0 indicates otherwise. For example, in a four-sample-time 
experiment, the tag history for a double-tagged individual with no tag loss, that is, captured only at the 
first and at the last sample times (but not in either the second or third sample time), is {11 00 00 11}. For 
a single-tagged individual with the same capture history and tag loss, the tag history would be {10 00 00 
10}; the double identifier helps specify individual tags. An example of a double-tagged individual who 
loses a tag after the second sample time is {11 11 01 01}. Different tag histories may correspond to the 
same capture history. The capture history of an animal records a 1 for those sample times when the 
animal was observed and a 0 otherwise.  Capture histories can be produced from the tag histories, but 
not vice versa. Finally, the number of animals with the same tag history and their group membership is 
required. Table 1 shows an example of data from a four-sample time experiment. 

 
Table 1:  Sample Input Data: An example of data from a four-sample-time experiment 

including the tag and capture histories, the number of animals observed with each history 

(frequency), and an indicator variable for group membership when there are two possible 

groups. 

 
Tag History Capture History Frequency Group 

11   11   11   11 1  1  1  1 12 1 

10   00   10   00 1  0  1  0 107 2 

00   00   01   01 0  0  1  1 119 2 

11   01   00   01  1  1  0  1  28 1 

11   11   00   10 1  1  0  1 82 1 

11   01   00   00 1  1  0  0 97 2 

    
 

32 

 
Model 

 Model Design  
 

The model is based on the proposal made by Schwarz and Arnason (1996) to consider a super-
population, which includes all individuals that enter the population of interest during the experiment. 
We extend this model to more than one group by assuming G independent super-populations, where 
the super-population of Ng animals is assumed to enter the population through a multinomial 
distribution. 

 
 Model Assumptions 
 

Important assumptions of this model are similar to those described by McDonald et al. (2003) 
but are modified to incorporate tag loss and heterogeneity of parameters across groups. They are 
outlined as follows: 

 
 All animals in population group g enter the system between sample times j and j+1 with 
equal probability, but entry probabilities may vary across groups and/or across sample 
times. 

 All animals in population group g are captured at sample time j with equal probability, 
but capture probabilities may vary across groups and/or sample times. 

 All animals in population group g survive between sample times j and j+1 with equal 
probability, but survival probabilities may vary across groups and/or sample times. 

 All animals in population group g that were first captured at sample time f retain their 
tag(s) between sample times j and j+1 with equal probability, but tag-retention 
probabilities may vary across groups and/or sample times. 

 For double-tagged individuals, tag loss is independent between tags, and both tags have 
the same probability of being lost. 

 Individuals who lose all their tags and are recaptured are either recognized when 
recaptured, or there are so few of these that even if they are mistaken as new 
individuals, they will have a small effect on overall estimates. 

 The survival probabilities of any animal are independent of whether or not that animal 
has been previously captured. 

 There is independence across animals and across groups of animals. 

 Sampling occurs instantaneously. In practice, sampling occurs over a short period of 
time (a day for example) compared with the intervening sampling time (every month for 
example). This is a standard assumption for the Jolly-Seber type models as it discretizes 
the problem, simplifying the modelling process. 

 
 Notation 
 
Statistics: 

k number of sample times for the experiment 

m number of unique tag histories 


33 

 
G number of population groups 

i index for a tag history; i = 0, 1, …, m; i = 0 represents tag history for animals  

never caught, {00 00 … 00} 

ntg,j number of tags on the individual i in group g at sample time j, j = 1, 2, …, k 

ng,j number of individuals in group g captured at time j 

  total number of animals in group g observed during the experiment 

ωgi
★
 capture history vector ωgi

★
 = (ωgi1

★
, ωgi2

★
,…, ωgik

★
), where 

  
 ωgij
� = 

 
ωgi tag history vector ωgi = (ωg,i,1,1, ωg,i,1,2, ωg,i,2,1, ωg,i,2,2, …, ωg,i,k,2), i = 0, 1, …, m, where 

 
 ωgijd =  

 
Referring to Table 1 for an example, we look at the fourth entry of the table. The individuals are from 

group g = 2, thus for each individual i which was observed with this tag history, we have the following 

capture history: ω2i
★
= (ω2i1

★
=1, ω2i2

★
=1, ω2i3

★
=0, ω2i4

★
=1), which is written in Table 1 as {1 1 0 1}. The tag 

history {11 01 00 01} in Table 1 is as follows:  

ω2i = (ω2,i,1,1=1, ω2,i,1,2=1, ω2,i,2,1=0, ω2,i,2,2=1, ω2,i,3,1=0, ω2,i,3,2=0, ω2,i,4,1=0, ω2,i,4,2=1).  

 
 number of ωgi tag histories 

fgi first sample time where ωgij
★
 = 1  

lgi last sample time where ωgij
★
 = 1  

lgid last sample time where tag d was present 

1    if individual i from group g was captured for the first time at sample time j = fi; 
1    if individual i from group g was captured at sample time j > fi  
      with at least one tag present; 
0    if individual i from group g was not captured at sample time j. 

{ 

1    if individual i from group g was captured and tag d was present at time j; 
0    if individual i from group g was captured at sample time j without tag d present; 
0    if individual i from group g was not captured at sample time j. { 


34 

 
qgid first sample time where tag d was known to be missing for individual i in  

group g; for example, for history {11 00 00 01}, q111 = 4 because tag 1 is not known to be missing 

at sample time 2 or at sample time 3 

 
Parameters: 

pg,j the probability that an individual in group g is captured at sample time j, given the  

individual is alive at that time, j = 1, 2, …, k  

ϕg,j the probability that an individual in group g survives and remains in the population

 between sample times j and j+1, given that it was alive and in the population at time j,  

 j = 2, 3, …, k-1 

bg,j the probability that an individual in group g enters the system between sample 

times j and j+1, j = 0, 1, …, k-1. Note that bg,0 is the expected fraction of animals in group g alive 

just prior to the first sample time and that  

 
  the probability that an individual in group g first tagged at sample time fgi retains its tag 

between sample times j and j+1 given that it is alive 

Ng super-population size for group g (all individuals in group g that ever enter the population of 

interest during the study) 

 
Functions of Parameters: 

the expected fraction of the population in group g remaining to enter the 

population that enters between sample times j and j+1, j = 0, 1, … , k-1.  Note that this function is 

defined as follows: 


35 

 
The probability that individual i in group g, first seen at fgi, is not seen after sample time lgi with nt tags.  

This  function is a recursive function of ϕ, p, and Λ.  If fgi = 0, this indicates individual i in group g has 

not yet been captured but is alive at time lgi. 

 
    1           if j = k 

 
    1         if j = k 

         
      1       if j  = k 

 
the probability that an individual enters the population, is alive and is not seen before time 

j; j = 1, 2, …, k. 

   
Bg,j  the number of individuals in group g who enter the population after sample time j and 

survive to sample time j+1; j = 0, 1, …, k-1. Bg,0 is the number of individuals alive just before 

{ 

{ 

{ 
 

36 

 
the first sample time.  E[Bg,j|Ng] = Ngbg,j . 

Ng,j total population size for group g at time j.  Note that 

 
 Likelihood Development  

 Using a similar approach to Cowen and Schwarz (2006), we partition the likelihood by group as 
follows.  The first part of the likelihood models the number of observed tag histories in group g as 

 
where          

 
   and             . 

 
Here,  is the probability that a group g member is observed.  The second part of the likelihood 
models the number of observed tag histories in each group conditional on being seen as  
 

where 

 
and   


37 

 
The complete likelihood for the model, therefore, is the product over all groups: 

     
Note that in this model development, we have not included the possibility of loss on capture; this could 
also be developed following Cowen and Schwarz (2006) or Schwarz and Arnason (1996). 

 
 Parameter Estimation 

We use a Newton-Raphson type estimator for estimating the model’s parameters. Similar to 
Lebreton et al. (1992), we use a logit-link function (because it constrains the probabilities between 0 and 

1) to transform the unknown probabilities pg,j, ϕg,j,  and    to a reduced parameter set of β-
parameters, and design matrices are used with this reduced parameter set to implement different, 
flexible models (for example, one design could have all parameters constant across sample times but 
varied across groups; another design may have all parameters varying across sample times and across 
groups). 

 
The parameters are modeled as logistic regressions on appropriate covariates; for example, 

 
where X is the appropriate design matrix. The delta method is used to compute standard error estimates 

for the standard (p, ϕ, b★, and Λ) parameters. Note that the b★s are used in place of the b parameters 
because the former do not have the difficult constraint of the latter of adding up to 1 within a group. R 
2.11.0 (R Development Core Team, 2009) was used in developing a program to obtain these results and 
is available from the authors. 

 We estimate the group g super-population size Ng as    .  Using this estimated super-

population size, we are able to estimate Ng,j – the size of group g population at sample time j – using the 

formulae described above when we defined Ng,j and replacing parameters with their estimates.  


38 

 
 Goodness-of-fit 

In testing the goodness-of-fit for a particular model, we calculate the log-likelihood, conditional 
on being observed,   

 
where l
*
is the log-likelihood for the model of interest. For each group and each sample time, we 

calculate the probability that an individual is double tagged (versus single tagged) upon release, given 

that it is in a particular group (P(d|g,j)).  We then adjust the probability of occurrence for each tag 

history for each group (P(g|j)). We then normalize the adjusted tag history probabilities by the 

probability of being observed at any time during the experiment (1 – P0g). Thus we obtain the fitted 

model. We also calculate the log-likelihood for the saturated model, which has a multinomial 

distribution for each group. Every tag history ωgi in the multinomial distribution has probability 

. The test statistic is the difference between the conditional log-likelihood and the saturated 

log-likelihood; it has a 
2
 distribution asymptotically. The degrees of freedom for this statistic are 

calculated as the difference in the number of parameters for each model. The saturated model has 

 parameters. The number of parameters for the conditional log-likelihood includes 

catchability, survival, new entrants, and tag retention parameters plus (G x k) parameters for the double 

tagging probabilities (P(d|g,j)), plus k(G-1) parameters for the group membership (P(g|j)) probabilities. 

 Following Burnham and Anderson (2002), a variance expansion factor may be determined from 

the goodness-of-fit statistic if overdispersion is deemed present. 

 Model Selection 
 

Model design possibilities vary extensively for different sets of parameters. In accordance with 
the design specifications described by Lebreton et al. (1992), tag retention parameters may vary across 
release groups, any parameter may vary across sample times, and any or all parameters may vary across 
population groups. These different sets of parameters allow for extensive variability of model designs. 
  

We follow Burnham and Anderson (2002) in applying the Akaike information criterion for model 
selection. 

 
39 

 
Simulation Study  
 

In testing our g-hJSTL model, we generated a number of different datasets of various different 
population sizes and different parameter sets. Using notation akin to that of Lebreton et al. (1992), the 
following are some of the different designs we generated and estimated using our g-hJSTL model (and 
compared with the estimates of the JSTL model). 

 
 =  entry varies over time, but capture, survival, and tag retention are constant over 

time; no group heterogeneity. 

  =   entry varies over time but not across groups; capture, survival, and tag retention 

vary across groups but are constant across time. 

 =  entry varies over time but not over groups; capture, survival, and tag retention vary 

across groups and over time. 

 For the more complicated models, we must take care to deal with non-identifiability of certain 

parameters. For instance, only the products  can be estimated. These parameters cannot be 

estimated individually. Similarly, b0 p1 can be estimated only as a product. 

 Table 2 shows the results for one dataset using the g-hJSTL model and also using the JSTL model 

for the model . Percent bias of the estimates is calculated as . 

Table 2: Simulation results to compare the JSTL model with and without group 

heterogeneity (g-hJSTL). Actual parameter values, parameter estimates (estimated 

standard errors) from the two models, and percent bias of the estimates are provided. 

 
Actual 

values 

g-hJSTL model 

estimates 

JSTL model 

estimates  

Percent Bias: 

g-hJSTL 

model 

(%) 

Percent Bias: 

JSTL model 

(%) 

p 
0.75 

0.90 

0.72 (0.03) 

0.91 (0.01) 
0.84 (0.02) 

–4.00 

1.11 

12.00 

–6.67 

 
0.80 

0.70 

0.82 (0.04) 

0.70 (0.01) 
0.72 (0.02) 

2.50 

0.00 

–10.00 

2.86 

 
0.70 

0.90 

0.69 (0.02) 

0.91 (0.01) 
0.83 (0.01) 

–1.42 

1.11 

18.57 

– 0.08 

b* 

0.200 

0.375 

1.000 

0.20 (0.005) 

0.36 (0.007) 

1.00 (0.000) 

0.20 (0.005) 

0.36 (0.006) 

1.00 (0.000) 

0.00 

–2.93 

0.00 

0.00 

–3.20 

0.00 


40 

 
 The simulation dataset used to produce the estimation results tabulated in Table 2 included 
10,000 individuals in the population, with 5,000 individuals in each of two groups in the population. 

Table 3 shows each ˆ N 
gj

, which estimates the number of group g individuals in the entire population 

(observed and unobserved) at sample time j. Results show that these estimates are reasonably close to 
the actual total population numbers. Table 4 shows results calculated using the standard JSTL model. 

The super-population size is estimated ( ), and then the population size at each sample time j is 

calculated ( ˆ N 
j
). Results show that percent bias is smaller for all population-size related estimates when 

using the g-hJSTL model compared with the JSTL model.  

 
Table 3: The g-hJSTL model estimates of number of group g individuals in population at 

sample time j, ˆ N 
gj

. Actual group g super-population size (Ng), estimated group g 

population size ( ), and percent bias of estimates are shown. 

 
ˆ N 
gj

 
Sample Time 

 Ng 
Percent 

Bias (%) j = 1 j = 2 j = 3 

g = 1 1,014.86 2,334.30 4,542.92 5,137.05 5,000 2.74 

g = 2 982.22 2,134.80 4,042.03 4,971.82 5,000 –0.56 

Total 1,997.08 4,469.10 8,566.94 10,108.87 10,000 1.09 

  
Table 4: The JSTL model estimates of number of individuals in population and sample 

time j.  Actual super-population size (N), estimated population size ( ), and percent bias 

of estimates are shown. 

 
Sample Time 

 N 
Percent 

Bias (%) j = 1 j = 2 j = 3 

ˆ N 
j
 1,815.33 4,034.71 7,679.94 9325.73 10,000 –6.74 

 
Discussion  
 

Heterogeneity often causes over-dispersion in data. For example, males in a population 
behaving differently than females would cause more variability in the data than a model ignoring sex 
differences accounts for; this extra variability is termed over-dispersion. Over-dispersion can be 
estimated, and estimated standard errors can be adjusted for over-dispersion; however, a more 
satisfactory approach would be to incorporate the source of over-dispersion directly in the model. One 
source of heterogeneity is the difference in behaviour of various groups.   

 
41 

 
 By allowing parameters in the JSTL model to vary by group, we have extended the JSTL model to 
account for differences in group membership. Accounting for group heterogeneity, when it exists, 
improves the accuracy of parameter estimates. As observed in Table 2, percent bias is notably lower for 
the g-hJSTL model compared with the JSTL model. Similarly, derived estimates of population size are 
more accurate. These findings were confirmed with multiple simulated datasets.  
 
 Future applied work will involve using our group-heterogeneous JSTL model with real population 
data and estimating the population parameters using different model designs. Model selection will be 
determined using the Akaike Information Criterion. Further, we could extend our model to account for 
loss on capture (due to a fishery harvest for example), similar to Cowen and Schwarz (2006) and Schwarz 
and Arnason (1996).  We could also relax the assumption of independent tag loss; that is, we could take 
into consideration the possibility that one tag’s retention rate depends on the other tag’s retention rate 
for double-tagged animals. For instance, tag loss may be due to a particular aspect of the individual – 
such as higher mobility leading to higher probability of tag loss – in which case, loss of one tag may 
suggest a higher probability of loss of a second tag as is the case with southern elephant seals (Pistorius, 
Bester, Kirkman, and Boveng, 2000). Finally, we could extend the model to work with more than two 
tags (for example, include a possibility of triple tagging certain individuals).  


42 

 
References 
 

Burnham, K.P. & Anderson, D.R. (2002). Model selection and multi-model inference: A practical  
 information-theoretic approach, 2nd Edition. New York: Springer-Verlag. 
 
Cormack, R.M. (1964). Estimates of survival from the sighting of marked animals. Biometrics, 51, 429- 
 438. 
 
Cowen, L., & Schwarz, C.J. (2006). The Jolly-Seber model with tag loss. Biometrics, 62, 677-705. 
 
Cowen, L., Trouton, N., & Bailey, R. E. (2007). Effects of angling on Chinook salmon for the Nicola River,  
 British Columbia, 1996-2002. North American Journal of Fisheries Management, 27, 256-267.  
 
Jolly, G.M. (1965). Explicit estimates from capture-recapture data with both death and immigration.  
 Biometrika, 52, 225-247. 
 
Lebreton, J. D., Burnham K.P., Clobert J., & Anderson D.R. (1992). Modeling survival and testing  
 biological hypotheses using marked animals. A unified approach with case studies. Ecological  
 Monographs, 62, 67-118. 
 
Otis, D.L., Burnham, K.P., White, G.C., &Anderson, D.R. (1978). Statistical inference from capture data on  
 closed animal populations. Wildlife Monographs, 62. 
 
Pistorius, P.A., Bester, M.N., Kirkman, S.P., & Boveng, P.L. (2000). Evaluation of age- and sex-dependent  
 rates of tag loss in southern elephant seals. Journal of Wildlife Management 64, 373-380. 
 
R Development Core Team (2010). R: A language and environment for statistical computing. R  
 Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL  

http://www.R- project.org  
 
Seber, G.A.F. (1965). A note on the multiple recapture census. Biometrika 52: 249-259. 
 
Schwarz, C.J., & Arnason, A.N. (1996). A general methodology for the analysis of capture-recapture  
 experiments in open populations. Biometrics 52(3), 860-873. 
 
Williams, B.K., Nichols, J.D., & Conroy, M.J. (2002). Analysis and management of animal populations. San  
 Diego: Academic Press. 
 

Contact Information 

 
Corresponding author: Selina Gonzalez, Department of Mathematics and Statistics, University of 
Victoria, Victoria, British Columbia, Canada, V8W 3R4. Email address: selinago@uvic.ca. Dr. Laura Cowen 
can be contacted at lcowen@uvic.ca. 
 
 
mailto:selinago@uvic.ca


43 

 
Acknowledgements 
 

This research was supported by an Undergraduate Student Research Award from the Natural 
Sciences and Engineering Research Council of Canada and also by the University of Victoria’s 
Undergraduate Research Scholarship to SG. The University of Victoria Mathematics and Statistics 
Department is also gratefully acknowledged for its help and accommodations throughout this project.