Meta-Psychology, 2021, vol 5, MP.2019.2071
https://doi.org/10.15626/MP.2019.2071
Article type: Commentary
Published under the CC-BY4.0 license

Open data: Not applicable
Open materials: Yes

Open and reproducible analysis: Yes
Open reviews and editorial process: Yes

Preregistration: No

Edited by: Felix D. Schönbrodt
Reviewed by: Fried, E., Althouse, A.

Analysis reproduced by: Alexey Guzey
All supplementary files can be accessed at OSF:

https://doi.org/10.17605/OSF.IO/FQJZM

A Reproduction of the Results of Onyike et al.
(2003)

Nicholas J. L. Brown
University of Groningen

Jakob van de Velde
Ghent University

Jan van Rongen
Independent consultant

Matt Williams
Massey University

Abstract
Onyike et al. (2003) analyzed data from a large-scale US-American data set, the Third National Health and Nutrition
Examination Survey (NHANES-III), and reported an association between obesity and major depression, especially
among people with severe obesity. Here, we report the results of a detailed replication of Onyike et al.’s analyses.
While we were able to reproduce the majority of these authors’ descriptive statistics, this took a substantial amount
of time and effort, and we found several minor errors in the univariate descriptive statistics reported in their Tables
1 and 2. We were able to reproduce most of Onyike et al.’s bivariate findings regarding the relationship between
obesity and depression (Tables 3 and 4), albeit with some small discrepancies (e.g., with respect to the magnitudes
of standard errors). On the other hand, we were unable to reproduce Table 5, containing Onyike et al.’s findings
with respect to the relationship between obesity and depression when controlling for plausible confounding vari-
ables—arguably the paper’s most important results—because some of the included predictor variables appear to be
either unavailable, or not coded in the way reported by Onyike et al., in the public NHANES-III data sets. We discuss
the implications of our findings for the transparency of reporting and the reproducibility of published results.

Keywords: Body mass index, Body Weight, Depression, Obesity, Weighted surveys

Background

In the spring of 2016, the first author (Nick Brown)
had a plan, as part of his PhD studies, to perform some
analyses using the U.S. American Third National Health
and Nutrition Examination Survey (NHANES-III) data
set. To improve his understanding of the NHANES data
and code books for this project, Nick decided to down-
load an article that was based on the same data set and
attempt to reproduce the results. Somewhat arbitrarily,
he chose the widely-cited article on the topic of the re-
lation between obesity and depression by Onyike et al.
(2003).

Onyike et al. (2003) reported two major find-
ings. First, obesity—defined as a body mass index
(BMI) above 30—was associated with past-month de-
pression in women (but not men). Second, severe obe-
sity—defined as a BMI above 40—was associated with
past-month depression in men and women combined.
At the time, Nick found that he could not reproduce the
results from this article, other than some of the most
basic descriptives. He wrote to the lead/corresponding
author of the Onyike et al. article, Dr Chaidi Onyike, but
after a brief exchange the correspondence ceased, and
he decided to put the exercise of reproducing Onyike et

https://doi.org/10.15626/MP.2019.2071
https://doi.org/10.17605/OSF.IO/FQJZM


2

al.’s results to one side for the time being.1

In March 2018 Nick resurrected the project with
a blog post (Brown, 2018) asking for volunteers to
help with independent reanalyses. Several people re-
sponded, and three stayed on board long enough to
contribute substantial amounts of code and insights.
This article, of which those three people are the second
through fourth authors, is the result of that exercise.

At an early stage of our reanalyses, it became clear
the one of the main reasons why Nick had not been
able to reproduce Onyike et al.’s (2003) results was his
failure to notice that the survey results were weighted
to make the sample as representative as possible of the
U.S. population. However, once that elementary prob-
lem had been overcome, several other issues emerged
with the design choices of Onyike et al.’s study as well
as each of the individual tables of results that could not
be so easily explained. We discuss these issues in the
following sections. Further details, in particular con-
cerning the different ways available to calculate stan-
dard errors, can be found in a technical note by Jan van
Rongen, which we have made available, along with our
analysis code, at https://osf.io/j32yw.

In this article, our primary focus is on reproducing
the results reported by Onyike et al. (2003), rather than
evaluating the appropriateness of their analyses or the
validity of the study as a whole. Nevertheless, we have
included some brief commentary at a few points where
the analyses conducted by Onyike et al. seem to have
had clear problems.

Data processing

The NHANES-III survey upon which Onyike et al.’s
(2003) paper is based was conducted in the United
States in two phases between 1988 and 1994. Data
sets and documentation for NHANES-III are openly ac-
cessible from this link. It appears that Onyike et al.
used data from three of the four main data sets pro-
duced by the survey: “Household Adult”, “Household
Youth”, and “Examination”. Our initial data process-
ing steps involved downloading and merging these data
sets, applying the inclusion and exclusion criteria spec-
ified in Onyike et al. (see p. 1141), and creating de-
rived variables (e.g., age and BMI categories). Most of
this process was relatively straightforward and we do
not describe the steps in detail here; see the analysis
script in our OSF project for further information. How-
ever, it is worth describing how we created the principal
outcome variables (prevalence of depression at various
time points). Onyike et al. classified participants as
having had a diagnosis of depression (a) at any point in
their lifetime, (b) in the past month, (c) in the past year,
and (d) recurrently. The NHANES-III examination data

contains several depression-related variables. We took
a value of 3 for the variable MQPDEP (any lifetime di-
agnosis of major depressive disorder, except following a
bereavement) to indicate “lifetime depression.” We as-
sumed that past-month and past-year depression were
defined by the variable MQPLDDP, which measures how
long ago the last episode was diagnosed, with the two
values 51 (“within the last 2 weeks”) and 52 (“between
2 weeks and 1 month ago”) indicating past-month de-
pression, and these two values together with 53 (“1 to
6 months ago”) and 54 (“6 months to 1 year ago”) indi-
cating past-year depression. Recurrent depression was
taken from the variable MQPDEPRT, with values of 2
or 3 indicating two or more lifetime diagnoses of ma-
jor depression with any level of severity (NHANES-III,
1996b). The remainder of our manuscript is organized
according to the five tables reported in Onyike et al.
(2003). While a small quantity of additional statistical
information was provided in their main text (which we
have not discussed below), these tables appear to con-
tain all of the most important results of their study.

Table 1

We were able to reproduce Onyike et al.’s (2003) Ta-
ble 1 exactly, apart from the column labeled “β”. Onyike
et al. did not provide a legend for this column, but
we assume, from their remark that “There was suffi-
cient statistical power to test the study hypotheses” (p.
1141), that it is a form of post hoc power calculation (in
which case the column label should arguably have been
“1−β”). Numerous authors (e.g., Lakens, 2014) have
pointed out that post hoc power calculations amount
to little more than a transformation of the p value, so
the utility of this column is perhaps questionable. We
were unable to reproduce the reported “β” figures, be-
cause we do not know how Onyike et al. dealt with the
issues of (a) sample weighting and (b) the design ef-
fect (i.e., the correlations among clustered observations
due to the non-random survey design; cf. NHANES-III,
1996c, pp. 25–27) in their analyses. We refer interested
readers to Jan van Rongen’s technical note in our OSF
repository for more details on this.

Table 2

Table 2 in Onyike et al. (2003) contains demographic
information (stratified by gender), with standard er-
rors for means and percentages falling in various groups
(e.g., age and ethnicity).

1On July 11, 2020, we wrote again to Dr Onyike, enclos-
ing a copy of the preprint version of the present article, and
inviting him to comment. However, as of August 28, 2021, we
have received no reply of any kind.

https://osf.io/j32yw
https://wwwn.cdc.gov/nchs/nhanes/nhanes3/DataFiles.aspx


3

Table 1  
Reproduction of Onyike et al.’s (2003) Table 1. 

Hypothesis and sample n1 n2 p1 p2 n2/n1     β 

Hypothesis A: Obesity (body mass index 
≥30) is associated with depression. 

      
 All respondents 4,154 1,658 0.028 0.051 0.40 0.985 

 Females 2,180 1,084 0.038 0.067 0.50 0.943 

 Males 1,974 574 0.017 0.029 0.29 0.449 

Hypothesis B: Class 3 (severe) obesity 
(body mass index ≥40) is associated with 
depression. 

      
 All respondents 4,154 267 0.028 0.125 0.06 1.000 

 Females 2,180 202 0.038 0.130 0.09 0.995 

  Males 1,974 65 0.017 0.115 0.03 0.947 

Note. Underscored values are different from those reported in Onyike et al.’s (2003) Table 1. See Onyike et al.  

(2003, p. 1142) for column label legends. 

 
Somewhat confusingly, Onyike et al.’s (2003) Table
2 indicates a sample size of 8,773 (4,745 female, 4,028
male), whereas all of their other results were based on a
sample size of 8,410. The difference between these two
sets of participants is explained by the fact that in the
original sample of 8,773 participants aged 15–39 years
who underwent the medical examination and interview
(thus meeting the inclusion criteria), the data (height
and/or weight) needed to calculate body mass index
(BMI) were missing for 25 people. Similarly, sufficient
interview data to make a diagnosis of (non)depression
were missing for 14 people, and both of these elements
were missing for 324 people. Hence, Onyike et al. ex-
cluded a total of (25 + 14 + 324) = 363 people from
the final sample (8,773 – 363 = 8,410). It is not clear
to us why the larger data set was used for describing de-
mographic characteristics, given that it was apparently
not used for the rest of the analyses in Onyike et al.’s
article.

Although Table 2 is described by Onyike et al. (2003)
in their text as displaying “The demographic character-
istics of the respondents” (p. 1141), our attempts to
reproduce this table suggest that this is not the case:
The estimates provided in the table appear to have been
produced with sample weighting applied, meaning that
they are actually estimates of the demographic charac-
teristics of the U.S. population (and, consequently, sub-
stantially different from the demographic characteristics
of the respondents)2. This also means that Table 2 con-

tains a discrepancy between the Ns reported at the top
of each column (which show that the sample was 54.1%
female) and the percentages reported in the “Gender”
breakdown (50.7% female), which refer to the popula-
tion.

In applying weighting in these and all subsequent
analyses we assume that Onyike et al. (2003) used the
sample weighting variable WTPFEX6 (“examined sam-
ple final weight”). This is not the only weighting vari-
able available in the NHANES-III datasets, but the use
of other available weighting options (e.g., WTPFQX6,
“interviewed sample final results”) gives results that do
not match Onyike et al.’s Table 2.

Having identified the weighting variable applied by
Onyike et al. (2003), three of us, working indepen-
dently, found three different ways of calculating the
reported standard errors. We established that the JKn
jackknife method in the R function as.svrepdesign from
the survey package (Lumley, 2019) produces almost ex-
actly the same standard errors as those reported by
Onyike et al. We tentatively assume that the version
of Stata used by Onyike et al. produced standard errors
for descriptives in the same way. Any remaining minor
differences between our values and those of Onyike et
al. might stem from the choice of underlying replicate

2The caption of the table, in contrast, refers to the “char-
acteristics of the study population”, so is more consistent with
the statistics provided.


4

designs (see the accompanying Technical Note).
In addition to the general issues outlined above, we

had some specific difficulty in reproducing Onyike et
al.’s (2003) percentages for ethnicity by gender. After
some experimentation, we established that their exact
numbers could be reproduced if we used the N = 8,773
data set for females, and the N = 8,410 data set for
males (in which there were 3,849 male participants).
In other words, our earlier note about the use of data
prior to exclusions for this table does not apply to the
specific case of the ethnicity of males.

A final issue in Table 2 relates to the Education fre-
quencies. Our results for the percentages of participants
with more than 12 years of education are slightly lower
than those reported by Onyike et al. (2003). On closer
examination of the data set, it appears that Onyike et al.
counted any numerical value above 12 in the NHANES-
III variable HFA8R as representing more than 12 years
of education. However, as the NHANES-III Adult data
file code book makes clear (NHANES-III, 1996a, p. 96),
some participants have the value 88 (“Blank but appli-
cable”—which, confusingly, appears to mean “impossi-
ble value”, cf. NHANES-III, 1996, p. 21) or 98 (“Don’t
know”) for this variable, which corresponds to missing
data. Hence, we believe that our numbers are the cor-
rect ones here. Our accompanying code also calculates
the education results treating these “missing” values as
indicating more than 12 years of education; the results
in that case correspond exactly to those reported by
Onyike et al.

In sum, while we were able to reproduce most of the
numbers reported in Table 2, doing so involved applying
analysis decisions that seem at odds with the written de-
scription in Onyike et al. (2003), and at least two errors
appear to have been made in producing the table. On
the other hand, this Table primarily contains descriptive
information, so it is not critical to the conclusions of the
study.

Table 3

Onyike et al.’s (2003) Table 3 displays how the
estimated prevalence of past-month depression varies
across BMI categories (with stratification by gender).
It provides only point estimates, and no indicators of
uncertainty (e.g., confidence intervals, p values). We
were able to reproduce this table in its entirety, with the
exception of a discrepancy of 0.01 in one percentage
(possibly due to different rounding between Stata and
R), and an apparent transcription error in the number
of people in Obesity class 1, where Onyike et al. re-
ported 910 instead of 981 (a value that Onyike et al.
themselves reported correctly in their Table 4).

We disagree, however, with Onyike et al.’s (2003)

choice of label for the third column of this table, “All
respondents” (and, by implication, the fourth and fifth
columns also, assuming that “Females” and “Males” im-
plicitly carry over the term “respondents” for each sex).
The word “respondents” suggests that, looking for ex-
ample at the third column of the first line of Table 3,
2.79% of people who responded to the survey had normal
body weight and met the criteria for a past-month diag-
nosis of depression, whereas in fact this figure of 2.79%
represents the estimate for the total population of the
US based upon the weights and the survey design.

Table 4

Table 4 in Onyike et al. (2003) displays differences
in the estimated odds of past-month depression across
BMI categories. These differences are expressed in the
form of odds ratios comparing the odds of depression
in various BMI categories to the odds of depression in
participants of normal weight.

We exactly reproduced almost all of the point esti-
mates of the odds ratios in Table 4, with three excep-
tions: All respondents, Past-year, Obese, where we ob-
tained a result of 1.42, versus 1.41 in Onyike et al.’s
(2003) article; Females, Past-month, Obesity class 1
(1.32 versus 1.28); and Females, Past-month, Obesity
class 2 (1.84 versus 1.75). The first of these discrepan-
cies might be due to rounding, but it is not clear what
could have caused the other two. Overall, however, the
level of agreement between our table and the original
gives us confidence that our derivation of the four de-
pression category variables from the NHANES-III mea-
sures was faithful to that of Onyike et al.

The majority of the confidence interval boundaries in
our reproduction of Table 4 were also close to those of
Onyike et al. (2003) within a margin of 0.01 or 0.02,
suggesting that the method that we chose for determin-
ing the standard errors of the odds ratios among the
numerous options that the survey package makes avail-
able, namely JKn (jackknife for stratified designs; Lum-
ley, 2019), was the one that most closely matches that
applied by Onyike et al. However, for the three lines of
Table 4 with the smallest sample sizes—the BMI classes
“Underweight,” “Obesity class 2,” and “Obesity class 3”
for males, with sample sizes of 99, 125, and 65, respec-
tively—our CIs were even wider than those of Onyike et
al., in some cases by a considerable margin. The total
number of male participants in those three BMI classes
who reported ever being depressed in their lifetime was
5, 3, and 7 respectively (and these numbers were, natu-
rally, even lower for recurrent, past-year, or past-month
depression, with just one male participant in the BMI
35–39.9 category having recurrent or past-month de-
pression).

https://osf.io/74sn9/


5

Table 2 
Reproduction of Onyike et al.’s (2003) Table 2. 

 
          Characteristic 

Females(n=4,745)  Males (n=4,028) 

% SE  % SE 

Gender 50.6 0.6  49.4 0.6 

Age (years)      

 15-19 17.2 1.2  17.8 0.9 

 20-24 20.3 1.3  19.2 1.0 

 25-29 19.6 1.2  21.0 1.1 

 30-34 21.5 1.4  22.3 1.3 

 35-39 21.4 1.2  19.7 1.2 

Race/ethnicity      

 White 70.0 1.6  71.2 1.7 

 Black 14.0 1.0  12.1 0.7 

 Hispanic 12.0 1.2  12.4 1.1 

 Other 4.1 0.6  4.3 0.6 

Education (years)      

 0–8 6.8 0.8  7.4 0.6 

 9–11 19.7 1.1  22.3 1.2 

 12 33.6 1.1  31.4 1.2 

 >12 39.4 1.8  38.6 1.7 

Marital status      

 Married 52.2 1.4  50.9 1.6 

 Separated/divorced/widowed 11.4 0.7  4.8 0.5 

 Never married 36.2 1.5  44.1 1.7 

Area of residence      

 Urban 49.6 5.0  50.4 4.9 

 Rural 50.4 5.0  49.6 4.9 

Notes.  Underscored values are different from those reported in Onyike et al.’s (2003) Table 2. 

For “Race/ethnicity”, the number of males is 3,849; see discussion in the main text.


6

Table 3  

Reproduction of Onyike et al.’s (2003) Table 3. 

   % with DIS/DSM-III depression 

 Relative body weight No. of 
participants All respondents Females Males 

Normal weight (BMI 18.5–24.9) 4,154 2.79 3.82 1.67 

Underweight (BMI <18.5) 301 3.24 3.82 1.82 

Overweight (BMI 25.0–29.9) 2,297 2.42 4.01 1.37 

Obese (BMI ≥30) 1,658 5.13 6.74 2.85 

 Obesity class 1 (BMI 30–34.9) 981 3.55 4.97 1.88 

 Obesity class 2 (BMI 35–39.9) 410 4.80 6.79 0.83 

 Obesity class 3 (BMI ≥40) 267 12.51 13.03 11.54 

Note. Underscored values are different from those reported in Onyike et al.’s (2003) Table 3. 

Numbers in parentheses represent the standard error of the corresponding percentage estimate. 

In most cases, the wider confidence intervals in our
reproduction do not affect whether the odds ratios re-
ported in Table 4 are statistically significant at the .05
level, with two exceptions:

• The odds ratio for the relationship between BMI
(treated as a continuous variable) and past month
depression in females is statistically significant in
Onyike et al.’s (2003) Table 4, 95% CI [1.03,
1.06], but not in our reproduction, 95% CI [0.99,
1.04].

• The odds ratio for the comparison of the preva-
lence of past-month major depression between
obesity class 3 and normal weight participants in
the male subsample is statistically significant in
Onyike et al.’s Table 4, 95% CI [1.03, 57.26], but
not in our reproduction, 95% CI [0.12, 486.2].
As mentioned above, there were only three male
participants in obesity class 3 for this compari-
son; it does not seem implausible that minor vari-
ations in calculation methods between statistical

software packages could cause substantial differ-
ences in their outputs for such small subsamples.

These two discrepancies nevertheless relate to rela-
tively ancillary findings that were not emphasized in
Onyike et al.’s (2003) abstract or discussion.

Table 5

We were unable to reproduce Onyike et al.’s (2003)
Table 5 because several of the covariates that these au-
thors claimed to have included were either not avail-
able in the NHANES-III data set that we downloaded,
or were calculated in an unclear way. Specifically:

• We were unable to find any measure of the use of
psychiatric medicine in the NHANES-III data set
or code books.

• We have no way to determine the criteria used by
Onyike et al. to categorize participants’ alcohol
use as None, Moderate, and Abuse, based on the
six variables (MYPF1, MYPF2, MYPF3S, MYPF4,


7

T
ab

le
 4

  
Re

pr
od

uc
ti

on
 o

f O
ny

ik
e 

et
 a

l.’
s 

(2
00

3)
 T

ab
le

 4
. 

 
Po

pu
la

tio
n 

an
d 

BM
I c

at
eg

or
y 

N
o.

 o
f 

pa
rt

ic
ip

an
ts

 
Pa

st
-m

on
th

 m
aj

or
 

de
pr

es
si

on
 

Pa

st
-y

ea
r 

m
aj

or
 

de
pr

es
si

on
 

Li

fe
tim

e 
m

aj
or

 
de

pr
es

si
on

 
Re
cu

rr
en

t m
aj

or
 

de
pr

es
si

on
 

O
R 

95
%

 C
I 

 
O

R 
95

%
 C

I 
 

O
R 

95
%

 C
I 

 
O

R 
95

%
 C

I 

A
ll 

re
sp

on
de

nt
s 

 
BM

I (
co

nt
in

uo
us

 v
ar

ia
bl

e)
 

8,
41

0 
1.

05
 

1.
01

, 1
.0

9 
 

1.
03

 
0.

99
, 1

.0
6 

 
1.

02
 

0.
99

, 1
.0

5 
 

1.
01

 
0.

97
, 1

.0
5 

 
N

or
m

al
 w

ei
gh

t (
BM

I 1
8.

5–
24

.9
) 

4,
15

4 
1.

00
§ 

 
1.
00

§ 
 

1.

00
§ 

 
1.
00

§ 
 

U

nd
er

w
ei

gh
t (

BM
I <

18
.5

) 
30

1 
1.

17
 

0.
49

, 2
.7

7 
 

1.
39

 
0.

66
, 2

.9
2 

 
1.

35
 

0.
74

, 2
.4

5 
 

1.
21

 
0.

54
, 2

.6
9 

 
O

ve
rw

ei
gh

t (
BM

I 2
5.

0–
29

.9
) 

2,
29

7 
0.

86
 

0.
53

, 1
.4

0 
 

0.
84

 
0.

53
, 1

.3
2 

 
0.

93
 

0.
65

, 1
.3

3 
 

0.
84

 
0.

54
, 1

.2
8 

 
O

be
se

 (
BM

I ≥
30

) 
1,

65
8 

1.
88

 
1.

03
, 3

.4
3 

 
1.

42
 

0.
86

, 2
.3

3 
 

1.
22

 
0.

82
, 1

.8
1 

 
1.

13
 

0.
73

, 1
.7

6 
 

O

be
si

ty
 c

la
ss

 1
 (

BM
I 3

0–
34

.9
) 

98
1 

1.
28

 
0.

65
, 2

.5
3 

 
1.

01
 

0.
55

, 1
.8

4 
 

0.
87

 
0.

55
, 1

.3
8 

 
0.

78
 

0.
47

, 1
.2

9 
 

O

be
si

ty
 c

la
ss

 2
 (

BM
I 3

5–
39

.9
) 

41
0 

1.
76

 
0.

78
, 3

.9
7 

 
1.

67
 

0.
92

, 3
.0

6 
 

1.
39

 
0.

78
, 2

.4
6 

 
1.

41
 

0.
72

, 2
.7

7 
 

O

be
si

ty
 c

la
ss

 3
 (

BM
I ≥

40
) 

26
7 

4.
98

 
2.

07
, 1

1.
98

 
2.
92

 
1.

28
, 6

.6
3 

 
2.

60
 

1.
39

, 4
.8

6 
 

2.
28

 
0.

92
, 5

.6
7 

Fe
m

al
es

 
BM
I (

co
nt

in
uo

us
 v

ar
ia

bl
e)

 
4,

56
1 

1.
05

 
1.

01
, 1

.0
8 

 
1.

02
 

0.
99

, 1
.0

5 
 

1.
02

 
0.

99
, 1

.0
4 

 
1.

00
 

0.
97

, 1
.0

3 
 

N
or

m
al

 w
ei

gh
t (

BM
I 1

8.
5–

24
.9

) 
2,

18
0 

1.
00

§ 
 

1.

00
§ 

 
1.
00

§ 
 

1.

00
§ 

 
U
nd

er
w

ei
gh

t (
BM

I <
18

.5
) 

20
2 

1.
00

 
0.

38
, 2

.6
2 

 
1.

36
 

0.
61

, 3
.0

2 
 

1.
20

 
0.

59
, 2

.4
3 

 
1.

03
 

0.
40

, 2
.6

2 
 

O
ve

rw
ei

gh
t (

BM
I 2

5.
0–

29
.9

) 
1,

09
5 

1.
05

 
0.

65
, 1

.7
2 

 
0.

81
 

0.
54

, 1
.2

1 
 

0.
94

 
0.

66
, 1

.3
4 

 
0.

71
 

0.
45

, 1
.1

2 
 

O
be

se
 (

BM
I ≥

30
) 

1,
08

4 
1.

82
 

1.
02

, 3
.2

5 
 

1.
29

 
0.

81
, 2

.0
7 

 
1.

12
 

0.
78

, 1
.6

1 
 

0.
97

 
0.

64
, 1

.4
9 

 
O
be

si
ty

 c
la

ss
 1

 (
BM

I 3
0–

34
.9

) 
59

7 
1.

32
 

0.
61

, 2
.8

6 
 

0.
90

 
0.

45
, 1

.8
0 

 
0.

74
 

0.
43

, 1
.2

8 
 

0.
68

 
0.

37
, 1

.2
5 

 
O
be

si
ty

 c
la

ss
 2

 (
BM

I 3
5–

39
.9

) 
28

5 
1.

84
 

0.
71

, 4
.7

5 
 

1.
66

 
0.

79
, 3

.4
6 

 
1.

41
 

0.
74

, 2
.7

0 
 

1.
40

 
0.

67
, 2

.9
4 

 
O
be

si
ty

 c
la

ss
 3

 (
BM

I ≥
40

) 
20

2 
3.

78
 

1.
67

, 8
.5

5 
 

2.
19

 
0.

97
, 4

.8
7 

 
2.

15
 

1.
19

, 3
.8

7 
 

1.
36

 
0.

60
, 3

.1
3 

M
al

es
 

BM

I (
co

nt
in

uo
us

 v
ar

ia
bl

e)
 

3,
84

9 
1.

06
 

0.
97

, 1
.1

6 
 

1.
04

 
0.

98
, 1

.1
0 

 
1.

02
 

0.
97

, 1
.0

7 
 

1.
03

 
0.

96
, 1

.1
0 

 
N

or
m

al
 w

ei
gh

t (
BM

I 1
8.

5–
24

.9
) 

1,
97

4 
1.

00
§ 

 
1.
00

§ 
 

1.

00
§ 

 
1.
00

§ 
 

U

nd
er

w
ei

gh
t (

BM
I <

18
.5

) 
99

 
1.

09
 

0.
13

, 9
.2

8 
 

0.
57

 
0.

07
, 4

.9
5 

 
1.

06
 

0.
23

, 4
.9

0 
 

1.
12

 
0.

04
, 2

9.
83

 
O
ve

rw
ei

gh
t (

BM
I 2

5.
0–

29
.9

) 
1,

20
2 

0.
82

 
0.

35
, 1

.9
4 

 
1.

08
 

0.
56

, 2
.0

7 
 

1.
16

 
0.

65
, 2

.0
6 

 
1.

25
 

0.
64

, 2
.4

5 
 

O
be

se
 (

BM
I ≥

30
) 

57
4 

1.
73

 
0.

52
, 5

.7
1 

 
1.

54
 

0.
71

, 3
.3

6 
 

1.
28

 
0.

67
, 2

.4
7 

 
1.

40
 

0.
57

, 3
.4

5 
 

O

be
si

ty
 c

la
ss

 1
 (

BM
I 3

0–
34

.9
) 

38
4 

1.
13

 
0.

40
, 3

.1
7 

 
1.

22
 

0.
59

, 2
.5

4 
 

1.
14

 
0.

61
, 2

.1
3 

 
1.

00
 

0.
40

, 2
.4

7 
 

O

be
si

ty
 c

la
ss

 2
 (

BM
I 3

5–
39

.9
) 

12
5 

0.
49

 
0.

00
, 3

.1
e8

 
0.
99

 
0.

16
, 5

.9
7 

 
0.

66
 

0.
11

, 3
.9

4 
 

0.
71

 
0.

00
, 1

.9
e9

 
O

be
si

ty
 c

la
ss

 3
 (

BM
I ≥

40
) 

65
 

7.
68

 
0.

12
, 4

86
.2

 
4.
53

 
0.

41
, 5

0.
07

 
3.
26

 
0.

40
, 2

6.
54

 
5.
15

 
0.

53
, 5

0.
23

 
N

ot
e.

 U
nd

er
sc

or
ed

 v
al

ue
s 

ar
e 

di
ff

er
en

t 
fr

om
 t

ho
se

 r
ep

or
te

d 
in

 O
ny

ik
e 

et
 a

l.’
s 

(2
00

3)
 T

ab
le

 4
. §

: R
ef

er
en

ce
 c

at
eg

or
y.

 
8

MYPF5S, and MYPF6S) that correspond to partici-
pants’ responses to questions that were about their
alcohol consumption in the NHANES-III interview.

• Onyike et al. classified participants as (a) current
smokers, (b) former smokers, or (c) those who
had never smoked. The NHANES-III survey and
examination data sets contain a number of items
related to the smoking of cigarettes, cigars, and
pipes; it is not clear how these were combined to
arrive at Onyike et al.’s three-way classification.

• We do not understand why the five categories (Ex-
cellent, Very good, Good, Fair, Poor) for physician’s
health rating from the NHANES-III examination
(variable PEP13A) were collapsed into just three
categories (Excellent, Good, Fair/poor).

• We do not understand why the four race/ethnic
categories from Table 2 were collapsed into three
in Table 5, with “Hispanic/other” apparently be-
ing used as an omnibus category for anyone who
was not classed as “White” or “African-American”
(this last category apparently being a synonym for
“Black” from Table 2).

We could, of course, have reproduced the table with
these covariates either omitted or guessed at, but a com-
parison of the results with the published table would
probably not have been very meaningful.

Discussion

Our efforts to reproduce Onyike et al.’s (2003) anal-
yses were made easier by the fact that the underly-
ing data set was openly accessible and extensively doc-
umented (the NHANES-III documentation consists of
many hundreds of pages for each data set). This is in
contrast to the situation facing researchers who wish to
reproduce articles for which the data are less thoroughly
documented or simply not available for re-analysis at
all. Despite this, however, it was difficult for us to re-
produce many of Onyike et al.’s tables, because we did
not know how all of the choices that these authors made
in analyzing the data. The fact that our reanalysis was
so challenging even in this seemingly favorable scenario
speaks to the importance of sharing not only data and
descriptions of analyses, but also the original code (typi-
cally in the form of scripts in the language of a statistical
software package) that was used to process and analyze
the data. It is only with access to this code that readers
and reviewers can obtain full insight into how the data
were actually analyzed.

A number of positive changes in the process of an-
alyzing scientific data and publishing the results of

those analyses have taken place in the 18 years since
Onyike et al.’s (2003) article was published. First, the
widespread dissemination and adoption of free software
such as R (R Core Team, 2018) and its associated pack-
ages has made powerful computing tools and associated
support resources available at essentially no cost to any-
one with access to a quite modest desktop or laptop
computer. Second, organizations such as the Open Sci-
ence Foundation (https://osf.io/) now make it easy for
authors to share their analysis code and (depending on
licensing arrangements and confidentiality issues) data.
Third, helped by the improvements mentioned in the
two previous points, the sharing of code and data so
that other researchers may reproduce and possibly ex-
tend one’s results is rapidly becoming a standard part
of publishing a scientific article (e.g., Lindsay, 2017).
All of those developments have played their part in our
replication efforts and the writing of the current article.

In our reproduction, we were able to reproduce most
of the figures in Onyike et al.’s (2003) Tables 1 and
2, although the analyses necessary to reproduce Table
2 are somewhat inconsistent with the written descrip-
tion in the article (cf. the issue of the “demographic
characteristics of the respondents”), and include what
appear to be at least two data processing errors. Nev-
ertheless, Tables 1 and 2 represent primarily descrip-
tive information rather than statistics bearing on Onyike
et al.’s research questions. For Tables 3 and 4 (repre-
senting bivariate relationships between BMI and vari-
ous operationalizations of depression), we were able to
reproduce the reported statistics, albeit with some mi-
nor discrepancies. On the other hand, we were com-
pletely unable to reproduce Table 5. This table rep-
resents arguably the most crucial statistical output of
the study, in that it presents information about the re-
lationship between BMI and depression while control-
ling for the variables that Onyike et al. considered to
be plausible confounds. Our inability to reproduce the
statistics in this table does not mean that Onyike et al.’s
results are invalid—indeed, they are entirely congruent
with the findings of subsequent systematic reviews and
meta-analyses, such as those by Luppino et al. (2010)
and Pereira-Miranda et al. (2017)—but it does suggest
that they were presented without sufficient information
to permit direct replication.

Despite the issues we have raised in the present arti-
cle, we do not believe that Onyike et al.’s (2003) arti-
cle is severely flawed; certainly we do not think that it
is atypical of the research that was being published at
the time. Nor do we think that an extensive corrigen-
dum is required, although perhaps a brief note could
be added to the published article to correct the most
obvious errors that we have identified and add suffi-

https://osf.io/


9

cient information about the data preparation and analy-
sis process to allow the reproduction of the reported re-
sults. Our take-home message for researchers is, rather,
a more general one: Even with a carefully curated data
set such as NHANES-III, the process of data analysis re-
quires precision and care, preferably with multiple sets
of eyes and the sharing of code (and, where they are not
already public, data) to allow for computational repro-
ducibility (Donoho, 2010) of their findings. We believe
that the time needed for the reader of an article to re-
produce the calculations in a published paper ought to
be measurable in minutes, not months.

Author Contact

Corresponding author is Nicholas J. L. Brown. Author
contact: nicholasjlbrown@gmail.com

Conflict of Interest and Funding

The authors declare that no conflict of interest exists.
No funding was involved in this research.

Author Contributions

All four authors analyzed the data independently.
Nicholas J. L. Brown wrote the paper and the other au-
thors provided critical revisions. All authors approved
the final version of the manuscript.

Open Science Practices

This article earned the Open Materials badge for
making the materials available. This is a commentary
that focused on reproducing the findings of a published
article, and as such there are no (new) data. It was
not pre-registered. It has been verified that the analy-
sis reproduced the results presented in the article. The
entire editorial process, including the open reviews, are
published in the online supplement.


10

References

Brown, N. (2018, March 13). Announcing a crowdsourced reanalysis project [Weblog post]. Retrieved August 28,
2021 from https://steamtraen.blogspot.com/2018/03/announcing-crowdsourced-reanalysis.html

Donoho, D. L. (2010). An invitation to reproducible computational research. Biostatistics, 11(3), 385–388.
https://doi.org/10.1093/biostatistics/kxq028

Lakens, D. (2014, December 19). Observed power, and what to do if your editor asks for post-hoc power analyses
[Weblog post]. Retrieved August 28, 2021 from
https://daniellakens.blogspot.com/2014/12/observed-power-and-what-to-do-if-your.html

Lindsay, D. S. (2017). Sharing data and materials in Psychological Science. Psychological Science, 28(6), 699–702.
https://doi.org/10.1177/0956797617704015

Lumley, T. (2019). Package ‘survey’, v. 3.35-1. https://cran.r-project.org/web/packages/survey/survey.pdf
Luppino, F. S., de Wit, L. M., Bouvy, P. F., Stijnen, T., Cuijpers, P., Penninx, B. W. J. H., & Zitman, F. G. (2010).

Overweight, obesity, and depression: A systematic review and meta-analysis of longitudinal studies. Archives of
General Psychiatry, 67(3), 220–229. https://doi.org/10.1001/archgenpsychiatry.2010.2

NHANES-III. (1996a). Third National Health and Nutrition Examination Survey (NHANES III), 1988–94: NHANES
III household adult data file documentation.
http://www.nber.org/nhanes/ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/nhanes3/1A/adult-acc.pdf

NHANES-III. (1996b). Third National Health and Nutrition Examination Survey (NHANES III), 1988–94: NHANES
III examination data file documentation.
http://www.nber.org/nhanes/ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/nhanes3/1A/exam-acc.pdf

NHANES-III. (1996c). Third National Health and Nutrition Examination Survey (NHANES III), 1988–94: Analytic
and reporting guidelines.
https://wwwn.cdc.gov/nchs/data/nhanes/analyticguidelines/88-94-analytic-reporting-guidelines.pdf

Onyike, C. U., Crum, R. M., Lee, H. B., Lyketsos, C. G., & Eaton, W. W. (2003). Is obesity associated with major
depression? Results from the Third National Health and Nutrition Examination Survey. American Journal of
Epidemiology, 158(12), 1139–1147. https://doi.org/10.1093/aje/kwg275

R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. https://www.R-project.org/

Pereira-Miranda, E., Costa, P. R. F., Queiroz, V. A. O., Pereira-Santos, M., & Santana, M. L. P. (2017). Overweight
and obesity associated with higher depression prevalence in adults: A systematic review and meta-analysis.
Journal of the American College of Nutrition, 36(3), 223–233,
https://doi.org/10.1080/07315724.2016.1261053

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data
collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.
https://doi.org/10.1177/0956797611417632

https://steamtraen.blogspot.com/2018/03/announcing-crowdsourced-reanalysis.html
https://doi.org/10.1093/biostatistics/kxq028
https://daniellakens.blogspot.com/2014/12/observed-power-and-what-to-do-if-your.html
https://doi.org/10.1177/0956797617704015
https://cran.r-project.org/web/packages/survey/survey.pdf
https://doi.org/10.1001/archgenpsychiatry.2010.2
http://www.nber.org/nhanes/ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/nhanes3/1A/adult-acc.pdf
http://www.nber.org/nhanes/ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/nhanes3/1A/exam-acc.pdf
https://wwwn.cdc.gov/nchs/data/nhanes/analyticguidelines/88-94-analytic-reporting-guidelines.pdf
https://doi.org/10.1093/aje/kwg275
https://www.R-project.org/
https://doi.org/10.1080/07315724.2016.1261053
https://doi.org/10.1177/0956797611417632

	Background
	Data processing
	Table 1
	Table 2
	Table 3
	Table 4
	Table 5

	Discussion
	Author Contact
	Conflict of Interest and Funding
	Author Contributions
	Open Science Practices

	References