S TRATEG
 

Y 
JOURNAL OF SMALL BUSINESS 

APPROACHING THE PSED: “SOME ASSEMBLY REQUIRED” 
 
 
Kelly G. Shaver 
College of Charleston 
sharverk@cofc.edu 

 
Amy E. Davis 

College of Charleston 
davisae@cofc.edu 

 
Mark S. Kindy 

Medical University of South Carolina 
Ralph H. Johnson VA Medical Center 

kindyms@musc.edu 
 

Carrie A. Blair 
College of Charleston 

messalc@cofc.edu 
 
 
ABSTRACT 
 
The Panel Studies of Entrepreneurial Dynamics (PSED I and PSED II) are nationally 
representative longitudinal surveys of individuals in the United States who are in the process 
of starting businesses.  These nascent entrepreneurs have been followed for three to four 
years (PSED I, N = 1,261, over 6,000 variables), or for six years (PSED II, N = 1,214, over 
8,000 variables).  As of this writing there are over 150 publications based on the PSED, but 
there could be even more if some of the critical data cleaning and data combining instructions 
were widely available.  This article presents code (both SPSS and STATA) that can be used to 
check on the inclusion criteria, to renormalize weights for subgroup analysis, and to combine 
the data for PSED I with those for PSED II. 
 
Keywords: PSED, longitudinal research, nascent entrepreneurship, syntax codes 
 
Editor’s Note (G. Hills): As noted in the conclusion, the PSED data set is the only 
representative national sample reflecting the firm creation process.  Commenting as a member 
of the ‘start up’ PSED I team, there was an entrepreneurial spirit at the inception. There were 
20 universities (ultimately growing to 34). Each pledged $20,000, which provided the initial 
funding.  Paul Reynolds, Kelly Shaver, and many others deserve great credit for advancing 
scholarship and knowledge in the entrepreneurship field. 
 

99 

mailto:sharverk@cofc.edu
mailto:davisae@cofc.edu
mailto:kindyms@musc.edu
mailto:messalc@cofc.edu


Journal of Small Business Strategy                                                          Volume 22, Number 1 
 

100 

Preparation of this article was supported by the National Science Foundation Partnerships for 
Innovation (PFI) Program under Grant # IIP-0917987, Kelly G. Shaver, PI. Any opinions, 
findings, conclusions, or recommendations expressed in this material are those of the authors 
and do not necessarily reflect the views of the National Science Foundation. 
 
 
INTRODUCTION 
 

The two Panel Studies of Entrepreneurial 
Dynamics, PSED I and PSED II, provide 
entrepreneurship researchers with an 
extremely valuable resource for 
examining the process of creating a new 
business venture.  Each of these datasets is 
a nationally representative sample of 
people who are in the process of creating 
new businesses (PSED I also includes data 
from a comparison group of individuals 
who were not starting businesses).  Each 
dataset is both wide (over 6,000 variables 
for PSED I and over 8,000 variables for 
PSED II) and long, with a sizable array of 
questions asked of respondents who were 
then followed for four years (PSED I) or 
six years (PSED II).  The longitudinal 
design allows researchers to identify the 
characteristics of start-up efforts that have 
(and have not) succeeded (Reynolds & 
Curtin, 2011).  
 
As of this writing 13 books, over 50 book 
chapters, and more than 90 peer-reviewed 
publications based on the PSED studies 
have appeared in the literature (Frid, 
Gordon, & Davidsson, 2011).  PSED I has 
been described in detail in a book edited 
by Gartner, Shaver, Carter, and Reynolds 
(2004) with chapters written by members 
of the research teams who contributed the 
variables included.  A comprehensive 
description of the outcomes of PSED I 
was written by Reynolds (2007).  PSED II 
has been described in a book edited by 

Reynolds and Curtin (2009) with chapters 
written by researchers interested in 
particular topics included in the data.  A 
comprehensive description of the 
outcomes of PSED II has been written by 
Reynolds and Curtin (2008).  Codebooks 
and interview schedules for both datasets 
are publicly available from the Institute of 
Social Research (ISR) at the University of 
Michigan, http://www.psed.isr.umich 
.edu/psed/home.   

 
Details of the construction and use of each 
dataset (and of the combined or 
“harmonized” dataset created by Reynolds 
& Curtin, 2011) have been published (e.g., 
Appendices A-C in the Gartner, et al. 
book; the Reynolds & Curtin, 2011 
paper).  Together these resources are 
excellent references for researchers who 
have some prior experience with the data.  
But for many entrepreneurship researchers 
who have not yet dipped a toe into the 
PSED ocean, the technical details can 
appear overwhelming.  There is a good 
reason that for every parent, the three 
most dreaded words at holiday time are 
“some assembly required.”  Where the 
PSED is concerned, there is a great deal of 
assembly required.  In fact, there is 
enough so that with funding from the 
Ewing Marion Kauffman Foundation the 
first two authors of this paper have for 
four years taught a three-day course for 
doctoral students and faculty called 
“PSED 101.”  The present article presents 
some of the principles developed in that 

http://www.psed.isr.umich.edu/psed/home
http://www.psed.isr.umich.edu/psed/home


Journal of Small Business Strategy                                                          Volume 22, Number 1 

course and includes parallel SPSS and 
STATA syntax required to accomplish the 
purposes described.     
 

SCREENING TO IDENTIFY 
POTENTIAL RESPONDENTS 

 
Collection of each dataset began with a 
“screener” that was embedded in a larger 
market research process.  For PSED I, 
64,622 individuals were reached by 
Market Facts (now Synovate) from July 
1998 to January 2000 through random 
digit dialing and asked two screening 
questions: 
 
1. Are you, alone or with others, currently 
trying to start a new business, including 
any form of self-employment? 
 
2. Are you, alone or with others, now 
trying to start a new business for your 
employer? An effort that is part of your 
job assignment? 

 
Respondents who answered affirmatively 
to either question (or to both) were then 
asked if they expected to be owners of the 
new firm and whether they had been 
active in the past 12 months in trying to 
establish the firm.  Those who expected to 
be owners (in whole or in part) and who 
had been active were then asked if they 
could be contacted by a university-based 
survey research laboratory that was 
conducting research on the creation of 
new businesses in the United States. Early 
in the project the interviews were done by 
the University of Wisconsin Survey 
Research Laboratory.  A third screening 
criterion – whether the business 
organizing effort is still in the start-up 
phase – was asked early in the University-
based interviews. When the UWSRL 
closed, the PSED I interviews were 
completed by the University of 

Michigan’s Survey Research Center 
(SRC).  
 
PSED I consisted of two phases, an initial 
telephone interview (1,261 people) 
followed by a mail survey returned by 871 
of the 1,261. Of the 1,261, 830 were 
nascent entrepreneurs, 431 were in a 
comparison group composed of people 
who had initially said they were not 
organizing a business. It is important for 
the data analysis to note that because of a 
gap in funding, roughly half of those 
screened for PSED I were reached a year 
later than the beginning of the first 
screening. 
 
For PSED II, 31,845 individuals were 
reached by ORC International from 
October 2005 to January 2006 and asked 
three screening questions, rather than two, 
on the basis of what had been learned 
about inclusion criteria from the Global 
Entrepreneurship Monitor (GEM).  The 
three PSED II screening questions were: 

 
1. Are you, alone or with others, currently 
trying to start a new business, including 
any self-employment or selling goods and 
services to others?  
 
2. Are you, alone or with others, currently 
trying to start a new business or new 
venture for your employer, an effort that is 
part of your normal work? 
 
3. Are you, alone or with others, currently 
the owner of a business you help manage, 
including self-employment or selling any 
goods or services to others? 

 
Respondents who answered one or more 
of these questions affirmatively were then 
asked additional questions to determine 
whether they had been active in the past 
12 months, whether they personally would 

101 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

102 

own all, part, or none of the new business, 
and whether the business had received 
revenues sufficient to cover expenses 
including salaries or wages of the owners 
(this last point was actually three separate 
questions).  Respondents who had 
engaged some start-up activity in the 
preceding 12 months, expected to own all 
or the major part of the new firm, and had 
not achieved revenues sufficient to be 
classified as an ongoing new firm were 
asked if they would consent to a telephone 
interview by the University of Michigan’s 
Survey Research Center.  Interviews were 
completed with 1,214 respondents (there 
was no mail survey, and no comparison 
group).  
 

CHECKING THE INCLUSION 
CRITERIA 

 
Researchers interested in firm-level issues 
typically include all 2,475 individuals, but 
researchers focusing on person-level 
variables often elect to remove the 53 
people (45 in PSED I and 7 in PSED II) 
who did not meet a strict definition of the 
inclusion criteria. In PSED I, the SPSS 
code to accomplish these reductions was 
written by Paul Reynolds and modified by 
Kelly Shaver.  The corresponding STATA 
code was written by Amy Davis. In the 
Tables that follow, the SPSS code 
precedes the corresponding STATA code 
(descriptions of what is being 
accomplished obviously apply in both 
cases). 
 
In PSED I, 6 people had achieved positive 
cash flow for more than 90 days, so were 
by definition no longer in the organizing 
phase. An additional 7 individuals 
expected that some institution 
(technically, ownership by an entity that 
was not a person) would own more than 

50% of the business.  It was later 
discovered that 32 members of what was 
supposed to be the comparison group had 
actually been organizing a business at the 
time of the interview.   The syntax to 
accomplish this data cleaning is contained 
in Table 1.  NOTE: The syntax to be used 
appears in Courier type. When 
respondents who do not meet the strict 
definitions of the inclusion criteria are 
eliminated, there are 1,216 people left in 
PSED I (817 of whom are nascent 
entrepreneurs, 399 of whom are in the 
comparison group).  
 
WEIGHTS AND RENORMALIZING 

 
One of the major advantages of using 
PSED studies is that the data can be made 
to be nationally representative.  In any 
survey research there is the possibility that 
biases will be introduced when contacting 
potential respondents. For the PSED 
studies the primary biases are differential 
selection probabilities and differential 
rates of non-response.  For example, in 
PSED I five subsamples of data were 
collected. These were the initial sample 
(known as the “mixed gender sample,” 
identified by the variable RTYPE with a 
score of 10), an oversample of women 
collected with funding from the National 
Science Foundation (RTYPE = 11), an 
oversample of minorities also collected 
with funding from the National Science 
Foundation (RTYPE = 12), a comparison 
group (RTYPE = 20) collected 
contemporaneously with the mixed gender  
sample, and a second comparison group 
(RTYPE = 21) collected 
contemporaneously with the minority 
oversample. The clearest conceptual 
indication of the need for weighting the 
data is provided by, for example, the 
oversample of women:  

 
Journal of Small Business Strategy                                                          Volume 22, Number 1 

 
Table 1: Checking the Inclusion Criteria 

STEP 1.  
Eliminate 6 infant businesses that should have been screened out because they had positive 
cash flow including owner salary for more than 3 months prior to the date of interview.  The 
cash flow variable is CFPHLAG (for Cash Flow PHone LAG). Eliminating these 6 infant 
businesses reduces the sample to 1255.  (Individual respondents can be identified by sorting 
the data in descending order on the variable of interest.  This puts the problematic 
respondents at the top of the data list.) RESPIDs for these 6 cases are 328100601, 37800137, 
328100395, 328100124, 328100541, and 328100145. 

 
SPSS CODE 

FILTER OFF. 
USE ALL. 
SELECT IF (sysmis(cfphlag=1) or (cfphlag < 90)). 
EXECUTE. 
 
FREQ cfphlag. 
 

STATA CODE 
gen cfphlag1=cfphlag 
recode cfphlag1 .=0 
keep if cfphlag1<90 
 
STEP 2.  
Eliminate cases in which institutional ownership will exceed 50%.  NPOWNPC was created 
by Paul Reynolds on the basis of Q217 (who will own?) answered as "not a person" and 
percentage of ownership (Q207C).  This variable identifies 18 people (out of the total of 830 
nascents) who expect that non-persons will own some percentage of the business.  Of the 18, 
7 show an expected non-person ownership greater than 50% (one at 66%, 1 at 82%, 1 at 
85%, four at 100%).  Delete these cases eliminates RESPIDs 328100020, 328100183, 
328100255, 328100267, 328100443, 328100572, and 337800154, reducing the sample to 
1248. 

 
SPSS CODE 
FILTER OFF. 
USE ALL. 
SELECT IF (sysmis(npownpc=1) or (npownpc LE 50)). 
EXECUTE. 
 
FREQ npownpc. 
 
STATA CODE 
drop if autonsu==5 
 

103 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

 
Table 1 Continued 

STEP 3.  
Minority oversample Comparison Group participants, who are RTYPE 21, were asked the 
screening questions about start-up activities.  Any who answered affirmatively to the 
question about start-up involvement, SUINVOL should be deleted from the Comparison 
Group.  This represents a total of 28 people, 14 females and 14 males, leaving an overall 
total of 1220. (The total number of respondents can be seen by viewing the end of the listing 
in the “Data View” of the data file.) 

 
SPSS CODE 
DO IF (RTYPE = 21). 
SELECT IF (SUINVOL = 1). 
END IF. 
EXECUTE. 
 
FREQ SUINVOL. 
 
STATA CODE 
gen cgbiz=rtype 
recode cgbiz 21=1 else=0 
replace cgbiz=0 if suinvol==1 
drop if cgbiz==1 
 
STEP 4.  
Respondents targeted for the ERC Mixed Gender and NSF Women (all of whom are RTYPE 
20) comparison group were not asked about their start-up involvement.  In the one-year 
follow-up, these respondents were asked about their start-up activities. (They should have 
had none.)  The variable representing involvement is CGSUACT.  This variable identifies 
four individuals who should be removed.  RESPID numbers are 328200046, 328200059, 
328200084, and 328200115, reducing the sample to 1,216.  

 
SPSS CODE 
DO IF (RTYPE = 20). 
SELECT IF (CGSUACT NE 1). 
END IF. 
EXECUTE. 
 
FREQ RTYPE. 
STATA CODE 
gen cgbiz1=rtype 
recode cgbiz1 20=1 else=0 
replace cgbiz1=0 if cgsuact==0 
drop if cgbiz==1 
 
 
104 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

105 

Table 1 Continued:  

STEP 5.  
Finally, there is an error in one value assignment for AUTONSU.  A frequency count of 
AUTONSU will show a total of 716 individuals, one of whose ventures is said to be 1-50% 
owned by a nonperson.  Any such ownership, of course, means that the person is not fully 
autonomous.  The problem is identified by a crosstab between AUTONSU and AUTONSU4 
(AUTONSU4 will at this point have only three categories, as the fourth – institutional 
ownership > 50% has been eliminated). 

 
SPSS CODE 
CROSSTABS 
   /TABLES= AUTONSU BY AUTONSU4 
   /FORMAT= AVALUE TABLES 
   /CELLS= COUNT. 
 
STATA CODE 
ta autonsu autonsu4 
 

STEP 6.  
In this crosstab the column totals, which represent AUTONSU4, are correct (715, 102, 817).  
The RESPID in error is 337800099.  The correct value, determined by comparison to the 
frequencies in Q190 is a score of 3.  Correct the value for this person then check to ensure 
that the column and row totals agree.   

 
SPSS CODE 
IF (RESPID = 337800099) AUTONSU = 3. 
EXECUTE. 
 
CROSSTABS 
   /TABLES= AUTONSU BY AUTONSU4 
   /FORMAT= AVALUE TABLES 
   /CELLS= COUNT. 
 
STATA CODE 
 
replace autonsu=3 if respid==337800099 
ta autonsu autonsu4 
 

Journal of Small Business Strategy                                                          Volume 22, Number 1 

If a male (potential) respondent answered 
the telephone during the collection of this 
oversample, that male’s probability of 
inclusion was zero. 

 
Whether the initial screening was done by 
Market Facts (PSED I) or ORC (PSED II) 
the screening organization conducted 
interviews in replicated waves of 1000 
people per wave.  As part of their services 
to clients, these organizations provided a 
separate weight for every sample of 1000.  
Once all of the screening had been 
accomplished, the staff at SRC reconfigured 
the weights based on the total sample, a 
change that substantially reduced the 
variance in the weights (e.g., in PSED I  
from a range of nearly 10 points to a range 
of 1.7 points) according to Curtin (2004).  A 
similar procedure was followed for PSED 
II, with comparable results.  In each case 
the weight created is for the entire sample 
screened (64,622 or 31,845).   
 
The general procedure for creating weights 
is to compare the percentage of respondents 
in a particular demographic group (e.g., 
white women aged 18-29 with incomes of 
$40,000 to $60,000) to the proportion of 
that same group in the total population of 
the United States, according to data from 
the Current Population Survey (CPS) of the 
U.S. Department of the Census.  People in 
the specified demographic group would 
then be weighted to bring the weighted 
proportion into line with the proportion 
shown by the CPS.  For example, if white 
women aged 18-29 with incomes of 
$40,000 to $60,000 were 15% of the CPS 
population but only 7.5% of the PSED 
sample, each respondent would be given a 
weight of 2.  The demographic 
characteristics actually used to cross-
classify the cells to be compared to the CPS 
were different from PSED I to PSED II 
because there were too many missing values 

in the income data.  For PSED I the cells 
were the cross-classification of Gender X 
Ethnic Background X Age X Educational 
Attainment.  For PSED II the cells were the 
cross-classification of Gender X Ethnic 
Background X Age X Income.   Across all 
individuals in the screener, the weights 
were then centered to equal the total 
number of individuals screened.  In both 
PSED I and PSED II the resulting weight is 
the variable WT_SCRN. 
 
Although the screener samples are quite 
useful for estimating such things as the 
proportion of business creation activity 
among individuals with different gender, 
ethnic characteristics, and educational 
attainment, both screeners were limited to a 
very few questions.  For detailed 
consideration of the factors involved in 
start-up, one needs the interview datasets.  
This means, of course, that the sums of 
weights need to be 1,261 and 1,214, not a 
number in the thousands.  The process of 
creating weights for the detailed datasets 
began with using the weighted screener 
results to generate the demographic 
characteristics of nascent entrepreneurs.  
Following the logic outlined above for 
producing the screener weights, post-
stratification weights were then created for 
the two detailed datasets.  In PSED I the 
screener weights were adjusted by the SRC 
to produce one normalized weight for 
members of the comparison group (WTCG) 
and one for the nascent entrepreneurs in 
Wave 1 (WTW1).  In PSED II, where there 
was no comparison group, the initial weight 
for the detailed dataset was WT_WAVEA.  
In each dataset there are weights for 
subsequent waves, but for present purposes 
we will restrict the discussion to the Wave 1 
weights in both datasets.   
 
If one is interested only in all nascents, or 
all nascents compared to members of the 

106 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

107 

comparison group, the weights given 
(WTW1, WTCG; WT_WAVEA) are 
sufficient.  On the other hand, many 
investigators are interested in gender 
differences, differences between nascents 
who are fully autonomous (no financial 
support from any “nonperson”), or variables 
that appear only in the mail survey.  In any 
or all of these cases, the overall weights will 

need to be renormalized so that the sum of 
the weights equals the number of 
individuals in the particular subsample of 
interest.  An example of the problem is 
shown in Table 2. 

 
Table 2: Example of Need for Recentering of Weights. 

Gender 
(NCGENDER) 

 
Number of People 

 
Sum of WTW1 

 
Mean of WTW1 

Females 403 305.25 .76 
Males 427 524.75 1.23 
Total 830 830 1.00 

 
Table 2 is based on the original dataset for 
PSED I (ERCW14Q, N = 1,261) 
downloaded from the ISR website. As noted 
above, in that original dataset there are 830 
nascent entrepreneurs and 431 members of 
the comparison group.  WTW1  
was computed so that the total of this 
weight would equal the number of nascent 
entrepreneurs (830), and the bottom row in  
Table 2 shows that this is the case.  The 
problem arises in the other two rows.  When 
the sample of entrepreneurs is split into 
females and males (using NCGENDER, the 
only gender variable recommended for  
general use) the sum of weights for females 
is too small, whereas the sum of weights for 
the males is too large.  This imbalance can 
be corrected by multiplying the value for 
WTW1 by a fraction consisting of (the 
number of individuals)/(the total weight for 
that class of individuals).  Specifically, the 
new weights are: 
 
For females, WTW1 * (403/305.25), sum of 
which is 403; 
For males, WTW1 * (427/524.75), sum of 
which is 427. 
 
 
The renormalizing of weights becomes a bit 
more complicated when the sample is cut  
 
two or more times.  For this reason, Table 3 
contains the syntax necessary to  
renormalize weights when the data of 
interest have been split on two dimensions.   
This procedure can simply be generalized to 
as many different splits as are needed for 
the particular research question.  One note 
of caution:  the SPSS command “MEANS 
TABLES,” is not available through the 
menu system in some older versions of 
SPSS.  All versions, however, recognize the 
command when it is written out into a 
syntax file. 
 
COMBINING PSED I AND PSED II 
 
As valuable as PSED I and PSED II are 
separately, they allow researchers to answer 
even more questions if they are combined 
into a single dataset.  For sophisticated 
users of SPSS or STATA, accomplishing 
this task is a relatively simple matter.  On 
the other hand, for those of us who are 
accustomed to working with one dataset at a 
time, putting the two together can be a 
challenge in at least four ways.   


Journal of Small Business Strategy                                                          Volume 22, Number 1 

 
First, depending on the active memory of 
the computer you use, combining a 1261 
person x 6000+ variable dataset with one 
that is 1214 person x 8000+ variables, it is 
prudent to be prepared for a crash.  
Minimize the number of other applications 
that are open, and save your work early and 
often.  Some university email systems will 
not accept a file as large as the resulting  
combined dataset, so if you are working 
with colleagues it may be necessary to 
compress or zip the file. 

 
Second, there is the need to have in the 
combined file some variable that indicates 
the source of the data (PSED I or PSED II).  
There are several ways to accomplish this, 
one of which is to add a variable called 
PSED (or SOURCE, or whatever variable 
name makes the most sense to you) to each 
dataset before the two are combined.   
 

Table 3.  Syntax for Renormalizing Case Weights, Example for PSED I Mail 

Questionnaire 

STEP 1.  
When the overall sample is reduced by eliminating people who did not return the mail 
questionnaire, the weights will need to be renormalized.  The 871 respondents who completed 
the mail questionnaire will have a valid (not missing) value for return year (MAILQYR).  
Then retain only those respondents with a valid MAILQYR.  
 
The counts should be as follows:  
Full autonomy (245 females, 235 males, total of 480). 
Partial autonomy (41 females, 32 males, total of 73). 
Comparison group (173 females, 145 males, total of 318). 
 
SPSS CODE 
FREQ mailqyr. 
 
FILTER OFF. 
USE ALL. 
SELECT IF(SYSMIS(mailqyr) NE  1). 
EXECUTE. 
 
CROSSTABS 
  /TABLES=autonsu4 BY ncgender 
  /FORMAT= AVALUE TABLES 
  /CELLS= COUNT. 
STATA CODE 
drop if mailqyr ==. 
ta autonsu4 ncgender 
 
 
108 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

Table 3 Continued: 

STEP 2.  
Next, compute a weight that for nascent entrepreneur respondents will be WTW1 but for 
comparison group respondents will be WTCG. 

 
SPSS CODE 
COMPUTE weight = 0. 
EXECUTE. 
IF (rtype = 10) weight = wtw1. 
IF (rtype = 11) weight = wtw1. 
IF (rtype = 12) weight = wtw1. 
IF (rtype = 20) weight = wtcg. 
IF (rtype = 21) weight = wtcg. 
EXECUTE. 
 
STATA CODE 
gen weight=0 
replace weight=wtw1 if rtype==10 
replace weight=wtw1 if rtype==11 
replace weight=wtw1 if rtype==12 
replace weight=wtcg if rtype==20 
replace weight=wtcg if rtype==21 
 

STEP 3.  
Next, check the weights for a Gender x Autonomy split (which will have six cells).  The result 
will show the numbers to use as divisors in the fractions to renormalize. 

 
SPSS CODE 
MEANS TABLES= weight BY autonsu4 BY ncgender  
  /CELLS SUM MEAN COUNT STDDEV. 
 
STATA CODE 
sort autonsu4 
by autonsu4: su weight if ncgender==1 
by autonsu4: su weight if ncgender==2 
 

109 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

110 

Table 3 Continued: 

STEP 4.  
Finally, renormalize the weights for each of these six cells. At this point the sum of the 
weights for a cell should agree with the number of individual respondents in that cell. 

SPSS CODE 
COMPUTE RENORMWT = 99. 
EXECUTE. 
IF ((ncgender = 2) and (autonsu4 = 100)) RENORMWT = 
weight*(245/184.87). 
IF ((ncgender = 2) and (autonsu4 = 200)) RENORMWT = 
weight*(41/33.95). 
IF ((ncgender = 2) and (autonsu4 = 400)) RENORMWT = 
weight*(173/175.06). 
IF ((ncgender = 1) and (autonsu4 = 100)) RENORMWT = 
weight*(235/291.68). 
IF ((ncgender = 1) and (autonsu4 = 200)) RENORMWT = 
weight*(32/40.12). 
IF ((ncgender = 1) and (autonsu4 = 400)) RENORMWT = 
weight*(145/160.14). 
EXECUTE. 
 
MEANS TABLES= renormwt BY autonsu4 BY ncgender  
  /CELLS SUM MEAN COUNT STDDEV. 
 
STATA CODE 
gen renormwt=99 
replace renormwt=weight*(245/184.87) if ncgender==2 & 
autonsu4==100 
replace renormwt=weight*(41/33.95) if ncgender==2 & 
autonsu4==200 
replace renormwt=weight*(173/175.06) if ncgender==2 & 
autonsu4==400 
replace renormwt=weight*(235/291.68) if ncgender==1 & 
autonsu4==100 
replace renormwt=weight*(32/40.12) if ncgender==1 & 
autonsu4==200 
replace renormwt=weight*(145/160.15) if ncgender ==1 & 
autonsu4==400 
 
sort autonsu4 
by autonsu4: su renormwt if ncgender==1 
by autonsu4: su renormwt if ncgender==2 
 
This variable would be given a value of 1 if 
the source dataset was PSED I, and a value 
of 2 if the source dataset was PSED II.  This 
method is accomplished by the SPSS syntax 
in Table 4.  Another way is to combine the  
datasets and then create a variable 
representing the source dataset.  This  

 
method is accomplished in the STATA 
syntax in Table 4.  NOTE: to make the 
combining as widely useful as possible, we 
show how to combine the two original 
datasets (with no elimination of respondents 
from either dataset). 
 

Journal of Small Business Strategy                                                          Volume 22, Number 1 

Table 4. Syntax for Combining PSED I and PSED II 

STEP 1. 
This syntax contains pathnames where datasets are to be found.  Those pathnames will differ 
from user to user depending on where both files are stored.  First, download ERCW14Q.sav 
from the ISR website (this will be data file 2b under the PSED I heading). Save the file onto 
your desktop.  Next, download psedii_scrn_ABCDEF.sav from the ISR website.  Also save 
this file to your desktop. 

STEP 2. 
Next, retrieve the ercw14q.sav dataset from your desktop and create a variable called PSED to 
identify the source dataset.  Make all values of PSED = 1. Then save the file back to your 
desktop with a new name (the example uses “psed1.sav”). 

SPSS CODE 
GET   FILE='/Users/kellyshaver/Desktop/ERCW14Q.sav'.  
DATASET NAME DataSet1 WINDOW=FRONT. 
 
COMPUTE psed = 1. 
EXECUTE. 
VARIABLE label psed 'source dataset'. 
VALUE labels psed 
 1 'from psed1' 
 2 'from psed2'. 
EXECUTE. 
FREQ psed. 
 
SAVE OUTFILE='/Users/kellyshaver/Desktop/psed1.sav'  
  /COMPRESSED. 

STEP 3. 
Next, retrieve the psedii_scrn_ABCDEF dataset from your desktop, create a variable called 
PSED and make all values of PSED = 2.  Then save the file back to your desktop with a new 
name (the example uses “psed2.sav”). 

SPSS CODE 
GET FILE='/Users/kellyshaver/Desktop/psedii_scrn_ABCDEF.sav'. 
DATASET NAME DataSet2 WINDOW=FRONT. 
 
COMPUTE psed = 2. 
EXECUTE. 
VARIABLE label psed 'source dataset'. 
VALUE labels psed 
 1 'from psed1' 
 2 'from psed2'. 
EXECUTE. 
FREQ psed. 
 
SAVE OUTFILE='/Users/kellyshaver/Desktop/psed2.sav'  
  /COMPRESSED. 
 
NOW CLOSE psed2. 
 
 
111 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

112 

Table 4 Continued: 
 

STEP 4.  
Finally, with psed1.sav open, add psed2 to it.  When the two files are combined, there should 
be a total of 2,475 people, which can be confirmed by checking the frequency of PSED.  Of 
course, if you have previously eliminated respondents who did not return the PSED I mail 
survey, the total number will be (871+1214) = 2,085.   
 
You will probably want to save this combined file so that you do not have to do the 
combining every time you care to do an analysis. 

SPSS CODE 
ADD files  
  FILE='/Users/kellyshaver/Desktop/psed2.sav'  
  FILE=*. 
EXECUTE. 
 
FREQ psed. 
 
STATA CODE FOR THE OTHER METHOD OF COMBINING 
format respid %20.0f 
gen sampid=respid 
sort sampid 
format sampid %20.0f 
 
append using "C:\Documents and Settings\davisae\My 
Documents\research\PSEDIIhandbook\psedii_scrn_ABCDEF.dta" 
gen psed=sampid 
recode psed 328100000/537800160=1 50001/60000=2 
 
 
Third, once the datasets have been 
combined, it is essential to check all 
variables of interest using the data and the 
relevant codebook (codebooks for both 
datasets are also available as PDF files 
from the ISR website).  Not all of the 
items included in PSED I are present in 
PSED II, and the latter contains variables 
not present in the former.  Even when the 
variables are identical across datasets, 
their names will not be.  Variables in 
PSED I have their waves identified by a 
leading capital letter (Q for wave 1, R for 
wave 2, S for wave 3, and T for wave 4).  
In PSED II, by contrast, waves 1-6 are 
identified by the leading capital letters A-
F.  In the mail questionnaire for PSED I, 
different conceptual variables appear 
together, based on the nature of their 

response scales (e.g.,most variables with 
5-point scales were grouped together, 
whether or not they were conceptually 
related).  In PSED II the variables are 
grouped in “modules,” but the placement 
of items into modules would not be done 
the same way by each of a dozen 
researchers interested in the topics.  So if a 
variable of interest to you does not appear 
in the module where you expect it, don’t 
stop looking.  It could simply be 
somewhere else.  
 
Fourth, check both the codebook and the 
variable listing to make sure that a 
particular variable of interest (a) had the 
same stem and response scale from PSED 
I to PSED II, and (b) that the numbers 
assigned to response alternatives were 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

identical from one dataset to the other 
(this is not always true).  For example, in 
PSED I the conceptual variable of 
entrepreneurial intensity was assessed 
with four items in the mail questionnaire.  
These are ql1d (q-ell-one-d) to ql1g: 
 
d. I would rather have my own business 
than pursue another promising career. 
 
e. There is no limit to how long I would 
give maximum effort to establish my 
business. 
 
f. My personal philosophy is to “do 
whatever it takes” to establish my own 
business. 
 
g. Owning my own business is more 
important than spending time with my 
family. 
 
For each item there was a response scale 
with five alternatives: completely untrue 
(1), mostly untrue (2), it depends (3), 
mostly true (4), completely true (5) such 
that higher numbers represent greater 
levels of intensity.  In PSED II, however, 
only two of the items were repeated (e and 
f) appearing as AY9 and AY10.  Here the 
response scale has six alternatives: 
strongly agree (1), agree (2), neither (3), 
disagree (4), strongly disagree (5) and not 
relevant (6).  Ignoring the last alternative, 
it is clear that higher numbers represent 
lesser levels of intensity.  Thus, in the 
combined dataset, a researcher would 
have only two items available and would 
have to reverse score those two. 
 

STARTUP TEAMS 
 

A distinctive feature of the PSED is its 
inclusion of secondary founders in its 
surveys. Most large-scale surveys of 
entrepreneurs and business owners only 

seek information from the primary owner 
of each business. Thus, many 
entrepreneurs remain “hidden” from 
scholarly inquiry.  By contrast, in the 
PSED, the use of household telephone 
numbers as the sampling frame means that 
the originator of the entrepreneurial 
concept is just as likely to be interviewed 
as the fourth team member that he or she 
recruited to the startup. Indeed, in PSED 
II, more than 200 respondents listed their 
primary role as being something other 
than “general management” or 
“everything” and more than 100 
respondents reported that someone else on 
the team was in charge of daily operations 
in the business (Davis, Longest, Kim, & 
Aldrich 2009). Therefore, researchers 
must be mindful that although the PSED 
is richer for its inclusion of secondary 
entrepreneurs, those studying individual 
differences or personality should control 
for team characteristics because the 
attitudes and behaviors of an individual 
who initiated the startup process may be 
considerably different from an individual 
who was recruited into an ongoing nascent 
venture.  
 
PSED I and II contain information about 
team members’ demographic 
characteristics, human capital 
characteristics, contributions to the 
startup, and relationships among team 
members. All of this information is 
reported from the point of view and 
recollection of the respondent. In PSED I, 
respondents were asked about their 
occupation, industry experience, 
entrepreneurial experience, and amounts 
of money and time invested in the 
business in different places depending on 
whether they were starting their business 
by themselves or as members of teams. 
For example, if someone new to the PSED 
ran an analysis of q197 (the amount of 

113 


Journal of Small Business Strategy                                                          Volume 22, Number 1 

money a respondent has invested in the 
startup), he or she would find only 376 
responses out of the 830 potential 
answers. Therefore, for anyone interested 
in studying teams or human capital and 
startup investments in the PSED I, the 
most important variables are q210b_1 
through q210b_5. These variables indicate 
whether the person about whom other 
questions are asked is the respondent or 
not. Note that these variables do not 
capture human capital and startup 
investments across team members but 
simply restore missing values for 
respondents on teams. 

 
CONCLUDING REMARKS 

 
The two PSED datasets are important 
resources for researchers who seek to 
examine the early stages of new business 
formation.  Indeed, Reynolds and Curtin 
(2008) identified 26 separate datasets 
related to business creation, including 
those from the Bureau of Labor Statistics, 
the Census Bureau, Dun & Bradstreet, the 
Internal Revenue Service, the Kauffman 
Foundation, the National Opinion 
Research Center, the National Science 
Foundation, the Small Business 
Administration, and the University of 
Michigan.  In their words, “Only one 
extant research program, the Panel Study 
of Entrepreneurial Dynamics, provides 
detailed information on a representative 
national sample reflecting the firm 
creation process” (p. 162).  The purpose of 
this article is to make the PSED data more 
approachable by newcomers.  In short, we 
hope we have provided an abbreviated 
diagram to help reduce the frustration of 
using the PSED given that there is “some 
assembly required.”  

 
REFERENCES 
 

Davis, A. E., & Shaver, K. G. (2009). 
Social motives in the PSED II. In P. 
D. Reynolds & R. T. Curtin (Eds.), 
New firm creation in the United 
States: Initial explorations with the 
PSED II data set (pp.19-34). 
Dordrecht, Germany: Springer. 

 
Davis, A. E., Longest, K. C., Kim, P. H., 

& Aldrich, H. E. (2009).  Owner 
contributions and equity. In P. D. 
Reynolds & R. T. Curtin (Eds.), New 
firm creation in the United States: 
Initial explorations with the PSED II 
data set (pp.71-94). Dordrecht, 
Germany: Springer. 

 
Frid, C., Gordon, S., & Davidsson, P. 

(2011).  Publications based on the 
Panel Study of Entrepreneurial 
Dynamics.  Downloaded from 
http://www.psed.isr.umich.edu/psed/d
ocumentation, May 24, 2012. 

 
Gartner, W. B., Shaver, K. G., Carter, N. 

M., & Reynolds, P. D.  (Eds.) (2004).  
The handbook of entrepreneurial 
dynamics:  The process of business 
creation.  Thousand Oaks, CA:  Sage 
Publications. 

 
Reynolds, P. D. (2007).  New firm 
creation in the U.S.: A PSED I 
overview. .  Foundations and 
Trends® in Entrepreneurship, 3 (1), 
1-149.  

 
Reynolds, P. D., & Curtin, R. T. (2008). 

Business creation in the United 
States: Panel Study of Entrepreneurial 
Dynamics II Initial Assessment.  
Foundations and Trends® in 

114 

http://www.psed.isr.umich.edu/psed/documentation
http://www.psed.isr.umich.edu/psed/documentation


Journal of Small Business Strategy                                                          Volume 22, Number 1 

115 

Entrepreneurship, 4 (3), 155-307. 
DOI 10.1561/0300000022. 

Reynolds, P. D., & Curtin, R. T. (2011). 
PSED I, II Harmonized transitions, 
outcomes data set.  Downloaded from 
http://www.psed.isr.umich.edu/psed/d
ocumentation, May 24, 2012. 

 
Reynolds, P. D., & Curtin, R. T. (Eds.), 

New firm creation in the United 
States: Initial explorations with the 
PSED II data set.  Dordrecht, 
Germany: Springer. 

 
Kelly G. Shaver is Professor of 
Entrepreneurial Studies in the School of 
Business at the College of Charleston.  
His prior appointments include the 
College of William & Mary, the National 
Science Foundation, and the 
Entrepreneurship and Small Business 
Research Institute (ESBRI) in Stockholm, 
Sweden.  Shaver served as a member of 
the PSED I Executive Committee and the 
PSED II Advisory Committee.  His highly 
cited research has been supported by the 
Ewing Marion Kauffman Foundation, the 
National Institute of Mental Health, and 
the National Science Foundation. 
 
Amy E. Davis is Assistant Professor of 
Entrepreneurship at the College of 
Charleston.  Her research interests include 
social networks, startup teams, biomedical 
entrepreneurship, and gender.  She has 
published in Entrepreneurship Theory and 
Practice, Frontiers of Entrepreneurship 
Research, and Work and Occupations. 
 
 
Mark S. Kindy is the Admiral Pihl 
Professor of Neurosciences in the College 
of Medicine at the Medical University of 
South Carolina and Career Research 
Scientist at the Ralph H. Johnson VA 
Medical Center.  Dr. Kindy was on the 
faculty at the University of Kentucky 
School of Medicine, is a member of 
several societies and has served on 
numerous editorial boards and review 
committees.  Dr. Kindy has been 
supported by the National Institutes of 
Health, National Science Foundation, 
Veterans Administration, Department of 
Defense, and the American Heart 
Association. 
 
Carrie Blair Messal is an Assistant 
Professor of Management in the School of 
Business at the College of Charleston.  
Her passion is leader development.  She 
has designed and executed components of 
the Executive Education Leader 
Development Program at the University of 
Tennessee-Knoxville.  She is the Founder 
and Director of the Schottland Scholars 
Program, a leader development program 
for School of Business undergraduates at 
the College of Charleston. 
 
 
http://www.psed.isr.umich.edu/psed/documentation
http://www.psed.isr.umich.edu/psed/documentation


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.