DEVELOPMENT AND INTERPRETATION OF A
CREDIT RISK EVALUATION INSTRUMENT

T. Hillman Willi,
College of Business Administration

Louisiana Tech University
Ruston, Louisiana

Banks and other commercial lending institutions frequently use a credit
"score card" instrument to evaluate the credit worthiness of potential borrow-
ers. Regulation B of the Code of Federal Regulations (12 CFR 202), effective
1977, stipulates the requirements pertaining to the Credit Opportunity Law. In
essence, the regulation prohibits the denial of credit on the basis of age, marital
status, location of residence, race, or sex. Further, the regulation requires that
every oommercial credit lending institution develop an objective credit scoring
system on a statistically derived set of criteria stemming from empirical data
pertaining to the credit history of loan applicants of that particular institution.

Therefore, the end product of a credit scoring system is a score card instru-
ment that furnishes a numerical evaluation of the criteria for assessing credit
worthiness. The purpose of this paper is to explain a statistical procedure for
developing a score card and how the use of a score card can improve the loan ap-
proval decision. The improvement is demonstrated using a Bayesian approach.

The use of a point allocation system to measure the risk of a loan applicant
has been used for several years and has been much discussed ([2], [7], [9], [10],
[12]). A point score system can lend a certain amount of objectivity to what
would otherwise be a personal judgment decision. Many businesses use a point
system that is simply an assignment of quantitative values for various personal
data traits that are based on the subjective assessment of a "credit expert."
These point systems oould be considered to be based on empirical data since
the "expert" is using his or her experience with the credit history of borrowers.
However, the development of a credit scoring system using statistical analysis
and validation as required by Regulation B, is not very oommon in the lending
industry.

Durand [5] wrote the first significant and oomprehensive work on credit
scoring. Bis system was based on a univariate test for significance which is now
outdated because of the availability of the oomputer to perform more sophis-
ticated analyses. Management Decision Systems, Inc. of Georgia [4] presents
a more current procedure for developing a credit scoring system, but crucial
details are left to the reader's imagination. This paper attempts to shed some

Journa.l of BU$ine63 Stmtegiea, Volume 5, Number 1 (Spring 1988)

21


light on the gray areas that are UBUa.lly glossed over by the industry due to their
confidential nature.

Credit Risk Variables

The first step involves the selection of the variables included on the score
card. A typical credit score card would contain several variables with a certain
number of points allocated for each separate classification for each variable.
Such variables typically include years at present address, years at present job,
occupation classification, annual salary, monthly housing payment, number of
people in household, housing classification (own, rent, other), etc.. For example,
the variable, time at present address, may be a.llocated 12 points for six months
or less, 16 points for over six months to one year, 23 points for over one year to
two years, etc.

The points allocated to each category are not arbitrary values. They are
computed from the discriminant function lIS explained in the next section. A
higher total score reflects a better credit risk. Therefore, the points computed for
each characteristic represent the relative importance of a particular categorical
response to account for a good credit risk. For example, if living at the present
address for over ten years is worth 32 points as compared to only 16 points
for one year residence, a person with the longer time at the same residence is
deemed the better risk by a factor of 2 to 1.

A typical loan application normally includes an extensive list of variables.
On the basis of the responses provided on the loan application, certain variables
are scored and totaled. H the total exceeds a statistically derived cutoff value
the loan would qualify for approval; otherwise, it would not. An alternative
procedure would be to include another category, one for which a minimum and
a maximum score is established. Any score between these limits would be sent
to the loan supervisor for further deliberation.

An example of a credit score card developed for a medium-sized bank located
in the midsouth is shown in Appendix A. Suppose an applicant has the following
characteristics: (1) located at the present address for 2 years; (2) buying the
residence; (3) monthly payment is $600; (4) four people in the household; (5)
working at the present job for 5 yearSj (6) office worker; (7) no savings account;
(8) paid out a bank loan; and (9) monthly income of $1,800. Using the score
card in Appendix A, the point total is 240, which means that the applicant
would automatically qualify for approval if the suggested cutoff guide is used.

The process of selecting the particular variables to be used and allocating the
point values to the various intervals is the critical part of the process. First, an
appropriate data base must be established. A sufficient history of loan activity
must be available to establish a block of "good" and "bad" loans. It is desirable
to have several hundred loans of both types and preferably on computer tape
or disk for processing convenience.

22


The size sample of "good" and "bad" loans that is sufficient for statistical
analysis depends upon the number of credit characteristics evaluated. However,
a sample of 100 loans of each type would be sufficient in most cases. Actually,
two samples are needed. An analysis sample is needed to construct the score
card. A second sample, called the "holdout" or "validation" sample, is used
to validate both the discriminant function and the score card. This process is
discussed later.

Of course, the larger the sample the better. The lending institution illus-
trated in this paper had a loan population of sufficient size to permit a random
sample of 500 good and 500 bad loans. Half of these were used for analysis and
half for validation. To get a sample this large requires that the loan application
be standardized and used for a substantial period of time. In fact, the loan
application should be used long enough for the loans to be either paid out (or
a certain history of good credit to be built up) or defaulted (or excessive late
payments to be recorded to result in a "bad" loan).

The definition of a "bad" loan may vary among loan institutions, but it is
up to the lender to make that determination. In most instances a loan default is
not all that is required to result in a "bad" loan. A record of one payment over
90 days late, or two occasions of over 60 days late, or three or more occasions
of being over 30 days late may be considered "bad." There appears to be no
standard definition.

The next step is to determine which variables are potential predictors of
credit worthiness. This can be done with various procedures, the most popular
of which is the chi-square test. The chi-square distribution is used to perform
a contingency table test. The table consists of the number of loans that were
found to be good or bad and classified according to an appropriate variable
scale. An example of a chi-square contingency table is shown in Table 1.

Table 1

Contingency Table for Housing Classification Variable
(number of loaDS)

Housing Classification Good Loan Bad Loan
Buying or Own 351 286
Rent 92 121
Other 57 93

r~-~~- ,-----
Total 500 500

x: = 19.22 with 2 degrees of freedom
p-value < 0.005

Table 1 shows that more of the good loan applicants are buying or own their
residence. On the other hand, more of the bad loan applicants either rent or
fit some other classification, such as live with relatives. The p-value less than

23


0.005 indicates that this particular variable is very useful in predicting whether
the applicant is a good credit risk.

H the variable is dichotomous, such as the presence of a current checking
account or savings account, or discrete, such as the housing classification as
shown in Table 1, the classes are obvious. But if the variable is continuous,
the classes are not as obvious. For example, annual income, monthly housing
payment, time at present job, and time at present address are variables that
require class intervals to be specified.

To determine the appropriate intervals, a sorted listing of the data for each
variable is helpful. From this, the range and any natural groupings of the data
can be identified. Then it is a matter of testing various group intervals to
determine the particular classifications that give the largest chi-square value.
or course this extra effort is only justified after the variable has been identified
as significant in distinguishing between good and bad loans. It is also possible to
use a series of discriminan\ analyses to locate the most discriminating interval
breaks, but it is recommended only when computer time is not a constraint.

Those variables that have significant relationships are selected for further
consideration. A significance level should be selected for this decision. Although
significance levels of 0.01 or 0:05 are common, the researcher used a cutoff of a
p-value of 0.10 or less. This figure would allow elimination of those variables
showing no significant relationship and those being marginal would be tried
because they could prove useful in combination with other variables.

Linear Discriminant Analysis

Those variables that have statistically significant relationships to credit wor-
thiness are submitted to a multiple linear discriminant analysis program. This
procedure is well documented ([11, [3], [6], [8]) 8Jl.d is available on several com-
puter packages. The most common packages are SAS 8Jl.d SPSS. A loan appli-
cation may have thirty or forty data values that could be useful in predicting
credit worthiness. After employing chi-square tests, as many as twenty vari-
ables may appear to be significant. The discriminant analysis program is then
run on these variables to determine which variables should be included on the
score card. As evidenced in Appendix A, the typical score card usually contains
ten or fewer evaluation criteria.

A discriminant 8Jl.alysis is appropriate when the population of interest con-
sists of two distinct groups-good 8Jl.d bad credit risks. An F-ratio is calculated
from the analysis sample to measure whether the two loan type groups have been
significantly separated on the basis of the sample loans. The necessary assump-
tions for the multiple linear discriminant function are equal dispersion matrices
for each group and a multivariate normal distribution of the population.

Determining the best set of explanatory variables is a decision based on
the results of the discriminant analysis. This is accomplished by evaluating

24


the F statistic, the standard error of estimate, the multiple coefficient of de-
termination, &lld the partial correlation coefficients. The discriminant analysis
accomplishes two things. First, it identifies those variables having the greatest
explanatory value and, second, it provides discriminant linear coefficients which
are used to calculate the points for the score card.

The procedure for determining the actual points uses the ratio of good-to-
bad loans for each class interval for each variable. Each variable will have an
interval either established statistically for the data (e.g. monthly income-$O-
499, 500-999, 1000-1499, etc.) or the interval will be natural (e.g. housing
classification-renting, buying, or other). The discriminant analysis coefficients
for each variable are then multiplied by the ratio of good-to-bad loans that
applies to each interval. The resulting values are the scores for each category.

Validation of the discriminant function is accomplished using the holdout
sample to construct a confusion (cross-classification) matrix and computing
the F statistic level and Wilk's lambda to determine the significance of these
characteristics. The confusion matrix gives the percentage of good and bad
loans that are correctly or incorrectly classified and provides insight regarding
the performance of the discriminant equation.

Score Card Validation

Another important use of the "holdout" sample is to use those loans to test
the effectiveness of the score card IIJld validate the score card. This is done by
mst computing a total score for each loan known to be good or bad. When
these total scores are ranked, it is' possible to estimate the percentage of good
and bad loans that would be accepted at various cutoff scores. This allows the
lending institution to predict the effect of selecting a particular cutoff score. A
typical validation sample prediction table is given in Table 2.

To illustrate what the percentages in Table 2 represent, suppose a credit
score of 110 is considered. The table shows that no loans (good or bad) scored
below this value. For those loans scoring 170 or lower, 89 percent were good
loft-ns and 58 were bad. In fact, none of the loans that turned out to be bad
scored higher than 270 whereas the good loans scored as high 8.8 310. Thus,
as the score gets higher, the percentage of bad loans drops faster than the
percentage of good loans.

As shown in Table 2, the cutoff score resulting in the greatest differentiation
between a good and bad loan is 210. In other words, if a loan scoring below 210
is rejected as too risky IIJld one scoring 210 or above is accepted, then the result
is to accept two-thirds of the potentially good loans and only one-fourth of the
potentially bad loans. In actual practice, a minimum and a maximum cutoff
are usually established. For example, a loan scoring below 170 is automatically
rejected and one scoring 240 or lI.bove is lI.utomatically approved. H a loan scores
between these two limits, then the loan supervisor can make the final decision.

25


One of the difficulties in developing a credit scoring system is that the sam-
ple should be representative of the entire population which includes accepted
and rejected applicants. There are several ways to deal with the problem of
including rejected applicants. One is to simply ignore them and this is often
the case. Another is to assume that the loan officer made the correct decision
and, therefore, consider a rejected applicant a bad loan and include them in
the "bad" population. A more valid approach is to score each rejected loan as
either a good or a bad loan and use them to augment the population. In that
case it will be necessary to take a new sample from this adjusted population
and rerun the discriminant analysis and recompute the points for the score card.
Experience has shown that including the rejected loans only slightly affects the
results.

Table 2

Validation Sample Prediction Table
--_.,,--~~ - ~-~.--,_.- --

Percentage Percentage
Cutoff of "Good" of "Bad" Percentage
Score Loans Accepted Loans Accepted Difference

110 100 100 0
130 99 94 6
150 97 82 15
170 89 58 31
190 80 39 41
210 67 25 42-
230 52 15 37
250 32 6 26
270 15 1 14
290 6 0 6
310 2 0 2
330 0 0 0

* = Largest percentage difference ill sample
Bayesian Interpretation

The next step is to determine the effectiveness of the credit scoring instru-
ment. A Bayesian procedure can be used to predict the performance of the score
card. Assume that the minimum cutoff score of 170 is set. As shown in Table
2, this score means that 89 percent of the good loans and 58 percent of the
bad loans are accepted. Next, suppose that the loan history for this particular
institution prior to the use of a score card is as follows:

% of Total applicants granted and good 49%
% of Total applicants granted and bad = 9%
% of Total applicants rejected 42%

100%

26


Notice that the population should also include the rejected loans. Fifty-
eight percent were approved and 42 percent were rejected. Of the granted loans
in the past, 84 percent (0.49/0.58 = 0.84) were good and 16 percent (0.09/0.58
= 0.16) were bad. Multiplying the historical percentages by the probabilities
that would apply when using the credit scoring model with a cutoff of 170, gives
the following joint probability table.

Accepted by the model
Turned down by the model

Good
.44
.05
.49

Bad
.05
.04
.09

Reject
.24
.18
.42

Total
.73
.27

1.00

The table shows the breakdown of how each type of loan would be treated
by the scoring system. For example, of the 49 percent "good" loans, 44 percent
(0.49 x 0.89 = 0.44) would be accepted by the model and 5 percent (0.49 - 0.44
= 0.05) would be turned down. Of the 9 percent "bad" loans, 5 percent (0.09
x 0.58 = 0.05) would be accepted by the model and of the 42 percent rejected
loans, 24 percent (0.42 x 0.58 = 0.24) would be accepted by the model. Overall,
the credit scoring model would accept 73 percent of the loan applicants and turn
down 27 percent.

The Bayes formula used to revise the probability of a good loan is:

P(AIIA.C.) = p(A.a·IAd x P(Ad (1)
p(A.a.IAd x P(Ad + p(A.a.IA2 ) x P(A2 )

where A.C. = "above cutoff," Al = good loan, and A2 = bad loan.

The Bayes revision of the prior probabilities using the score card gives the
following projection of the percentages of good and bad accounts that would be
accepted by the model.

Percentage of Accepted that are Good = 90% (0.44/(0.44 + 0.05))
Percentage of Accepted that are Bad = 10% (0.05/(0.05 + 0.44))

These probabilities can be summarized as follows:

P(A.C.
P(Ai ) x P(A.C·IAi ) and ~) P(AiIA.C.)

Al - "Good" .49 x .89 .44 .44/.49 = .90
A2 - "Bad" .09 x .58 .05 .05/.49 = .10

.49 1.00

Based on these figures, the forecasted percentages for a cutoff score of 170 can
be calculated.

Percent above cutoff and " good" = .90 x .73 = .66
Percent above cutoff and "bad" = .10 x .73 = .07

27


The performance of the credit scoring instrument can be illustrated by contrast-
ing the probabilities associated with using a score card with those associated
with not using a score card. This is shown in Table 3.

The result is that the scoring system is estimated to increase the percentage
of good loans that are accepted from 49 to 66 percent which is a 17 percent
improvement. The percentage of bad loans that are accepted is decreased by
two percent. In addition, the percentage of loans that need to be rejected is
reduced by 15 percent.

Table 3

Performance Comparison
Using 8 Credit Scoring Instrument

Score Card
(Cutoff = 170) No Score Card

% of total applicants
accepted and "good" 66% 49%
----- --

% of total applicants
accepted and "bad" 7% 9%
% of total applicants
"rejected" 27% 42%

100% 100%

Remember that this example is intended to merely illustrate the performance
of the scoring instrument. Experience has been that greater improvements in
decision making are possible depending on the particular loan history of the
lending institution.

Conclusions

The use of an empirically derived credit scoring system has two advantages.
One of these is that it enables the lending institution to be in compliance with
federal regulations and furnishes a valid defense in case of litigation. Second,
when the score card is based on sound statistical practices, its use can often
improve the lending decisions of the institution as compared with strictly sub-
jective assessments. A credit scoring instrument can play an important role in
the overall lending strategy of financial institutions.

References

1. Berenson, M. 1., D. M. Levine, and M. Goldstein. Intermediate Statistical
Method" and Applications: A Computer Package Approach. Englewood
Cliffs, NJ: Prentice Hall, Inc. (1983).

28


2. Bierman, H. Jr. and Hausman, W. H. "The Credit Granting Decision."
Management Science, Vol. 16 (April 1970), pp. B519-B532.

3. Cooley, W. W. and Lohnes, P. R. Multivariate Data Analysis, New York,
NY: John Wiley and Sonll, Inc. (1971)

4. Credit Scoring Systems: A Detailed Analysis. Atlanta, GA: Management
Decision Systems, Inc. (1977).

5. Durand, D. Risk Elements in. Consumer Installment F\nancing. New
York, NY: National Bureau of Economic Research (1941).

6. Goldstein, M. and Dillon, W. R. Discrete Discriminant Analysis. New
York, NY: John Wiley and Sons, Inc. (1978).

7. Johnson, N. "How Point Scoring Can Do More Than Help Make Loan
Decisions." Banking, Vol. 62 (August 1971), pp. 36-42.

8. Morrison, D. G. "On the Interpretation of Discriminant Analysis." Jour-
nal of Marketing Research, Vol. 6 (May 1969), pp. 156-163.

9. Myers, J. H. "Predicting Credit Risk with a Numerical Scoring System."
Journal of Applied Psychology, Vol. 47 (October 1963), pp. 348-352.

10. Orgler, Y. E. "Evaluation of Bank Consumer Loans with Credit Scoring
Models." Journal of Bank Research, Vol. 2 (Spring 1971), pp. 31-37.

11. Rock, A. "Sure Ways to Score with Lenders." Money, September 1984,
pp. 121-126.

12. Roy, H. J. H. and Lewis, E. M. "Credit Scoring as a Management TooL"
Consumer Credit Leader, Vol. 1 (November 1971), pp. 10-13.

29


Col

-a
8
lI:I
><

<~
><"'0... "'""'0 lI:I
=0
Col
Q. Col
Q. "'"-< 8

tf.l
~

:.a
Col

U

'is

1
~

°~
;.:;-lLl.!::-5 .
u:l III- "... =

.$ l
= -° =.""'" .....- °e p..

:.::: f.)
p....d
p..-
='id
=-= °° -- =lLl lLl
.;:.;:
'is'''C
= 5
.9 ~
- °Il.l_- "p....oe ...
8 .~-"'" II)
Il.l ·c

.:: Il.l

<"'d.,.. "'"
1'"'1 =
Q.-£
Col lLl

... ..d
tIl_

~~8Iu8tion Qualities
Characteristics Point Values

Time at 6 mos. 7 mos. 1 yr. 1 mo. 2 yrs. 1 mo. 6 yrs. 1 mo. Over 10
Present or less to 1 yr. to 2 yrs. to 6 yrs. to 10 yrs. years
Address 12 16 23 28 38 32

Buys
Housing Rents or owns Other

Classification 10 32 14
Monthly $401 Owns free
Rent or SO $1 - 200 $201 - 300 $301 - 400 and over and clear

Payment 5 6 8 14 19 10

Number in 1 2 3 4 5 or more
Household 38 52 41 36 26

1--------
Time at 6 mos. 7 mos. 2 yrs 1 mo. 4 yrs. 1 mo. Over 6 yrs.
Present or less to 2 yrs. to 4 yrs. to 6 yrs. and retired

Job 4 6 8 9 13
Prof./Exec. Semi-prof. Lab.jService

Occupation Mgr./Ret. Sales Office/Staff Prod./Driver Other
----38--

35 25 14 16-

Savings None Yes
Account 22 43
Paid Out None Yes

Bank Loan 14 60
Total Monthly

---

$3001
Application $0 - 750 $751 - 1000 $1001 - 1500 $1501 - 2000 $2001 - 3000 and over

Income 7 9 10 14 21 22

i.i
""tl

'g,
tID -.= !l!
~ ~° . p..== p..p..
~ B c

II) "'"

~ .... ~- ~lLl u:l
b() • = Il.l=.... ~.... IJ 0:.::l
... " - =='-~ ° ='idl:l:iOa....
°-lLl
.;: "'"lLl
~<liC)..dc <Ii bel=..!l ~ :.s
'id""'oo"",:> _ °
~C)oo
•• (Ct--.:t<
rq--C"l

Q.
Col

;i

~


	Development and Interpretation of a Credit Risk Evaluation Instrument