germain.p65


URLs : Uniform Resource Locators or Unreliable Resource Locators  359

 

URLs: Uniform Resource Locators or 
Unreliable Resource Locators 

Carol Anne Germain 

As the use of citing electronic World Wide Web sites grows, the question 
arises as to whether this practice has scholarly limitations due to the 
fact that uniform resource locators (URLs) often become inaccessible. 
This research studies the accessibility of sixty-four URLs cited in thirty-
one academic journal articles. Results of this longitudinal study found 
an increasing decline in the availability of URL citations. 

en years ago, most people had 
no idea that the Internet ex­
isted. Today, it is used daily by 
millions of people who access 

it for a variety of reasons. Some use it to 
connect with friends and family; others 
use it for entertainment purposes (jokes, 
sports and freebies); and still others use 
it for research. Students approaching a 
library reference desk often insist that the 
Internet be used to locate information for 
papers, projects, and other academic as­
signments. Many journal articles, includ­
ing refereed articles, contain citations to 
Internet sources. Despite the popularity 
of Internet citations, we still may ques­
tion the integrity of this practice. How 
often have we tried to link to uniform or 
universal resource locators (URLs) only 
to find a “404 NOT FOUND” or other 
messages denying access? These warn­
ings let us know that the information we 
came to access is no longer accessible at 
this site. The information may have been 
moved to another site, equipment may be 
down, or the information may have been 
removed completely. This is frustrating 
because cited references need to be acces­

sible and persistent. Citations provide the 
reader with an outline of the works an 
author has consulted to develop an article, 
conference paper, monograph, or other 
scholarly study. After a review of the im­
portance of permanence as a feature of 
academic citation, this paper presents 
evidence of the impermanence of actual 
URL citations. 

The Role of the Citation 
What is the purpose of a citation? Why is 
this erudite mechanism so important? The 
Oxford English Dictionary defines the verb 
to cite as “to quote (a passage, book, or 
author) generally with implication of ad­
ducing as an authority.”1 Authority fur­
nishes credibility to the written piece. Ci­
tation allows the reader to reference other 
works the author has cited. The reader 
then has the ability to verify a quotation, 
check the semantic connection, or confirm 
whether the author has included all of the 
materials and statistics of a study. In a 
sense, “citation keeps you honest.”2 It is 
essential that the academic community be 
able to rely on and utilize the studies, ar­
guments, and findings of other scholars. 

Carol Anne Germain is the Networked Resources Education Librarian at the State University of New 
York at Albany ( SUNY ); e-mail: cg219@csc.albany.edu. 

359 

mailto:cg219@csc.albany.edu


360 College & Research Libraries July 2000 

Citation also provides the ability to 
acknowledge the works of others that 
support a piece of research. When using 
the materials of others, citation offers the 
opportunity to recognize the cited author. 
“A paper that conforms to the norms of 
scholarly perfection would explicitly cite 
every past publication to which it owes 
an intellectual debt.”3 

One of the most important functions 
of the citation is that it links the written 
work into a much larger community. 
When the novice physicist uses Einstein’s 
theories to uphold an argument, a con­
nection is established between significant 
works of the past and works of the 
present. Other physicists will evaluate 
this work and reach conclusions as to 
whether it is an addition to the field. 

Every time a scholar presents a re­
view of the literature in her area of 
inquiry, or writes a bibliographic es­
say, or incorporates another writer’s 
words or ideas to advance her own 
thesis, she maps the field of her dis­
cipline. She draws the boundaries, 
circumscribes the territory of her 
field of discourse, and determines 
who else is within and who is with­
out.4 

In other words, “she” makes herself a 
part of a much larger community. This 
community promotes intellectual growth 
that may, in turn, stimulate the develop­
ment of new medicines and cures, novel 
writing techniques, or breakthroughs in 
technology. The dialogue that is encour­
aged with the usage of citation encour­
ages a learned fellowship. 

Thirty-one randomly chosen 
academic journal articles, containing 
sixty-four citations with URLs, were 
reviewed. 

Henry Small, when describing why an 
author or scientist cites another text, re­
ferred to the citation as a “symbol.” These 
“symbols of concepts or methods” func­
tion as connections to earlier works that 

an author-researcher has embedded as a 
reference in his or her writings. “This 
leads to the citing of works which embody 
ideas the author is discussing. The cited 
documents become, then, in a general 
sense, ‘symbols’ for these ideas.”5 Blaise 
Cronin summed up the need for a theory 
of citing very eloquently: 

Metaphorically speaking, citations are 
frozen footprints on the landscape of 
scholarly achievement; footprints that 
bear witness to the passage of ideas. From 
footprints it is possible to deduce direc­
tion; from the configuration and depth of 
the imprints it should be possible to con­
struct a picture of those who have passed 
by, whilst the distribution and variety 
furnish clues as to whether the advance 
was orderly and purposive.6 

The Persistence of Citations 
An important feature of scholarly links is 
that they are available indefinitely. It is 
imperative that cited materials be acces­
sible and not ephemeral. Phyllis Franklin, 
executive director of the Modern Lan­
guage Association, stated that “the M.L.A. 
has concluded that scholarship depends 
on getting back to a source.”7 The re­
searcher depends on cited work as a col­
laboration of ideas. If the locations of 
ideas that substantiate the author ’s work 
no longer exist, the foundation of their 
work is in question. 

To assume that all cited works are eas­
ily obtainable is naive.8 Fugitive material 
and grey literature are found in written 
works, the former being pamphlets, pro­
grams, and other literature published (not 
always officially) in small quantities and 
often produced for one-time use.9 Mate­
rials such as these are almost impossible 
to retrieve and thus are generally not 
cited. Grey literature is literature that can­
not normally be purchased through book­
sellers. Examples of these types of mate­
rials include conference proceedings, 
trade brochures, preprints, technical re­
ports, dissertations, and government 
agency publications. It is often difficult 
to acquire these materials and frequently 
takes some skill to do so. The National 



 

URLs : Uniform Resource Locators or Unreliable Resource Locators 361 

Technical Information Service (NTIS) pro­
vides access to technology reports, while 
Bell and Howell, formerly University 
Microfilms International (UMI), places 
dissertations on microform and numer­
ous trade associations archive their pro­
fessions’ literature. Although it is difficult 
to work through resources such as NTIS 
and Bell and Howell, one is assured that 
their materials are retrievable. One of the 
reasons for this assurance is that various 
institutions and organizations, such as 
government agencies, union affiliations, 
and academic institutions, have respon­
sibility for maintaining and preserving 
the materials. 

With the emergence of the Internet and 
Internet publishing, individuals and in­
stitutions in increasing numbers are 
authoring and posting papers and stud­
ies on this electronic medium. One of the 
complications with this type of publica­
tion is that there is no guarantee that these 
works will be perpetually available. “Es­
timates put the average lifetime for a URL 
(the Web site location) at forty-four 
days.”10 A longitudinal study undertaken 
by Wallace Koehler reviewed the persis­
tence of 361 randomly chosen Web sites 
and Web pages over one year. Results of 
this study found that 110 (31%) of the Web 
sites and Web pages failed to respond at 
the final test.11 This electronic environ­
ment, though very exciting and stimulat­
ing, also is quite volatile. 

The academic world should be con­
cerned about the citation of documents 
that are located on the Internet. When 
users try to retrieve electronic sources 
listed in the citing publication, they often 
do not find the references but, instead, are 
faced with an “error” message. It is un­
fortunate, but documents found within 
the electronic setting have the character­
istic of lacking permanency.12 “URLs 
change at the whim of hardware 
reconfiguration, file system reorganiza­
tion, or changes in organizational struc­
ture, leaving users in 404 Limbo.”13 “The 
Internet’s holdings change every minute 
of the day.” Students and researchers find 
that materials on the information super­

highway can disappear “with the touch 
of a Webmaster ’s delete key.”14 In its style 
manual, the American Psychological As­
sociation warns those who use online in­
formation: 

The researcher has immediate ac­
cess to a wealth of information but 
must consider the reader ’s access to 
that material: Will the information 
be available to the reader even if the 
reader follows a given retrieval 
path, or will the material soon be 
archived to tape and difficult to ob­
tain? Is the information widely ac­
cessible or accessible only on a 
campus’s local network? This pub­
lication recommends that if the 
same data is available in both print 
and electronic formats then the 
writer should use the “preferred 
print version.”15 

Methodology 
The following study was undertaken to 
investigate the reliability of URLs in aca­
demic citation. Thirty-one randomly cho­
sen academic journal articles, containing 
sixty-four citations with URLs, were re­
viewed. The academic journals used were 
from a variety of disciplines. Thirteen ci­
tations were from information and library 
science, ten from the hard sciences, sev­
enteen from computer science, eleven 
from the humanities, and thirteen from 
the social sciences. The printed journals 
were published between 1995 and 1997. 

To verify the persistence of the URL 
citations, each address was accessed to see 
if the site was currently active. Using a 
Netscape browser, the URL address was 
logged into the Netscape “open” window. 
Over a three-year period (1997–1999), this 
procedure was conducted once a month 
for three consecutive months (February, 
March, and April). This was to determine 
if each cited site still existed. Three dif­
ferent access days were used each year to 
insure against temporary interruptions. 
Reasons for denied access might include 
that the URL’s host computer was down, 
that a Web site was not being worked on 

http:permanency.12


362 College & Research Libraries July 2000 

to either relocation or removal TABLE 1
of the site.16 “File Not Found”Availability of Cited URLs 
is similar in nature and means 
that the user has reached the Not
host computer, but the host can-Accessible Accessible % Unavailable 

1997 17 47
1998 24 40
1999 31 33 

and unreachable, or that too much traffic 
on the Internet caused a time-out. Each 
of the nine testings was conducted be­
tween 8 a.m. and 9 p.m. 

The content of the Web site, update 
information, and style format were not 
reviewed. In certain circumstances, some 
effort was made to access a site if a spell­
ing error or misprint seemed to be within 
the URL. This included omitting periods 
where the publisher added them as style; 
omitting hyphens at the end of a line, 
within a URL; and adding a top-level 
domain, such as edu, to the domain name 
where it seemed to be absent. Only direct 
URL searching was done; no attempt was 
made to use Internet search engines to 
find the cited materials. 

Some may say that Internet search 
engines provide help with locating 
sites, but these tools are neither 
authoritative nor exhaustive. 

In this paper, persistence of a URL ci­
tation is understood as the ability to ac­
cess a cited URL containing the Web site 
with the identical title of the cited work. 
If an index or search tool was retrieved 
that linked to the cited work, the URL ci­
tation also was considered persistent. Ci­
tations containing URLs that accessed a 
host site, but not the cited file, were not 
regarded as persistent. URL citations that 
had moved to a new URL and contained 
the same title/author were appraised as 
persistent. 

When an Internet site cannot be ac­
cessed, a variety of error messages may 
appear. The error message “404 Not 
Found” appears when Netscape cannot 
locate the specified Web site. This is due 

26.5
37.5
48.4 

not find the requested Web site 
file. The “Not Found ” error 
message gives the user a vari­
ety of reasons for not being able 
to connect to the desired docu­

ment. “Unable to Locate Server,” “Socket 
Error,” and “No Response” are error mes­
sages resulting from not being able to con­
nect to the remote server. This may occur 
when the remote server is either too busy 
or no longer in existence. Generally, re­
mote computers only send error mes­
sages. Unless instructed, they give no for­
warding address or other indication of the 
materials location. 

Results 
It is assumed that all of the URLs found 
in the cited works were active Internet 
sites when they were cited originally. 
Within each test year, the results did not 
vary significantly over the three monthly 
samples; however, results of annual com­
parisons did produce variability. After 
checking for persistence of the sixty-four 
citations, seventeen (26.5%) could not be 
accessed in 1997. In 1998, twenty-four 
(37.5%) could not be accessed and thirty-
one (48.4%) could not be reached in 1999. 
As table 1 shows, availability of cited 
URLs declined about 11 percent annually. 

A review of the error messages shows 
that “Not Found” notices appeared nine 
times in 1997 and 1998 and thirteen times 
in the final test. Server errors were re­
trieved five times in 1997 and twelve 
times in both 1998 and 1999. Messages in­
dicating relocation appeared three times 
in the first two years and six times in the 
final year (see table 2). 

This decline in availability of cited 
URLs had a dramatic impact on the origi­
nal articles from which these citations 
were drawn. Of the thirty-one original 
source articles, in 1997, twelve (38.7%) 
contained inaccessible citations; in 1998, 
seventeen (54%) had citations that could 



 

 

URLs : Uniform Resource Locators or Unreliable Resource Locators 363 

TABLE 2

Review of Error Messages
 

not be retrieved; and in the last 
year, twenty-one (67.7%) con­
tained citations that could not be 
found (see table 3). 

Conclusion 
After a three-year period, almost 

1997 1998 1999 
"Not Found" 9 9 13
Server Errors 5 12 12
RelocatedlUnavailable 3 3 6

50 percent of the URL citations 
could not be accessed and two-
thirds of the journal articles contained 
corroded citations. How can this pro­
found loss of academic citation be ex­
plained? 

Originally, some of the URL citations 
may have contained misspellings, incor­
rect domain names, or punctuation errors. 
Computer software requires meticulous 
input and is unforgiving when encoun­
tering any text or syntax error. Further 
decline in the accessibility of the tracked 
URL citations may be attributed to the 
vast changes in computer and institu­
tional infrastructures. A researcher mov­
ing to another job, the purchase of a new 
server, or the restructuring of an academic 
department may change the location of a 
computer file and its URL. Thus, the cited 
URL is rendered inaccessible. 

Print resources have authoritative in­
dexes and finding aids to locate hard-to­
find citations. When an author cites the 
incorrect volume number, the correct one 
can be found in a variety of sources. The 
Internet does not have comparative tools. 
Some may say that Internet search en­
gines provide help with locating sites, but 
these tools are neither authoritative nor 
exhaustive. 

An assortment of solutions for preserv­
ing Internet materials has been initiated. 
In the United States, Brewster Kahle and 

TABLE 3

a small group of technical professionals 
have started a project called the Internet 
Archive. Over a number of years, they 
have taken a “snapshot” of Web pages 
found on the Internet.17 Although this is 
a noteworthy project, there is no assur­
ance that these records will be maintained 
in the future. Without adequate finding 
aids, it will be impossible to access infor­
mation from a snapshot. 

Other efforts to preserve materials 
found on the Internet are being developed 
by OCLC. This vast library consortium is 
working on numerous projects that in­
volve the cataloging and archiving of re­
sources found on the Internet. InterCAT, a 
project funded by the U.S. Department of 
Education, is one such endeavor. With the 
effort of libraries and institutions of higher 
education, the creation, implementation, 
testing, and evaluation of a searchable da­
tabase of USMarc records that contain elec­
tronic location and access information has 
been initiated.18 “This is the most tradi­
tional library-type approach to finding 
material on the web.”19 InterCAT uses vol­
unteers to catalog electronic sites found on 
the Internet. To date, this catalog contains 
more than 70,000 records.20 Another project 
undertaken by OCLC is implementation 
of persistent uniform resource locators 
(PURLs). A PURL is a record of URL sites 

that individuals or 
institutions have reg­
istered with OCLC. Annual Comparison of Journal Articles
“Instead of pointingContaining Inaccessible URL Citations 
directly to an Internet 
location, a PURL# Articles Containing % Containing Inaccessible 
points to an interme-Inaccessible Citations URL Citations 

1997 12 38.7 diate resolution ser­
1998 17 54.0 vice that maintains a 
1999 21 67.7 database linking the 

PURL to its current 

http:records.20
http:initiated.18
http:Internet.17


364 College & Research Libraries July 2000 

URL and returning that URL to the user.”21 

This trial, though, may be a “short-term 
experiment or a long-term solution.”22 

It is ironic that a utility called “persis­
tent” could be part of a “short-term ex­
periment.” Persistence qualifies endur­
ance. Endurance is essential when dis­
cussing materials that are to be cited. For 
the scholarly community to retain its in­
tegrity, standards must be set to ensure 
that cited works are retrievable. In the 
past, this has not been such a consequen­
tial issue. Printed materials have been 
bought, stored, and archived in libraries 
for hundreds of years. It seems unlikely 
that a published book or journal could not 
be found in some library or archive in the 
world. Electronic data hold no such prom­
ise. With the average life span of an 
Internet file being less than two months, 
how many data and materials already 
have been lost? 

Not one of the sixty-four citations re­
viewed in this study was a PURL. All of 
the articles were published in academic 

journals and written by members of the 
scholarly community. At the final testing, 
twenty-one of the thirty-one articles con­
tained citations that could not be accessed. 
Whether this information is available in 
parallel print sources is unknown. None­
theless, it is frightening to think that the 
substructure of the intellectual commu­
nity is relying on a medium that is so vola­
tile. 

The Internet is a very provocative en­
vironment. It provides the ability to con­
nect, communicate, and share with mem­
bers of many disciplines. However, this 
useful tool needs, at this point, to be 
viewed as a medium for exchange rather 
than as a library. Until there is some se­
cure means of accessing data continu­
ously from this resour ce, using the 
Internet as a virtual depository of cited 
materials is indefensible. Academic cita­
tions need to be reliable and accessible, 
and URL citations are not. Students and 
scholars should proceed with caution and 
utilize sources that endure. 

Notes 

1. The Oxford English Dictionary, vol. 3. (New York: Clarendon Pr., 1989), 248. 
2. Mary-Claire Van Leunen, A Handbook for Scholars, rev. ed. (New York: Oxford University 

Pr., 1992), 9. 
3. Manfred Kochen, “How Well Do We Acknowledge Intellectual Debts?” Journal of Docu­

mentation 43 (Mar. 1987): 54–64. 
4. Shirley Rose, “Citation Rituals in Academic Cultures” (paper presented at the annual meet­

ing of the Conference on College Composition and Communication, Seattle, Mar. 16–18, 1989), 
ERIC ED 309 434, microfiche. 

5. Henry Small, “Cited Documents as Concept Symbols,” in Social Studies of Science, vol. 8 
(Beverly Hills, Calif.: SAGE, 1978), 327–40. 

6. Blaise Cronin, The Citation Process: The Role and Significance of Citations in Scientific Commu­
nication (London: Taylor, 1984), 25. 

7. Lisa Guernsey, “Cyberspace Citations,” Chronicle of Higher Education 42 (Jan. 12, 1996):A18– 
21. 

8. Charles Auger, Information Sources in Grey Literature (New Providence, N.J. : Bowker-Saur, 
1994), 3. 

9. Leonard Montague Harrod, Harrod’s Librarians’ Glossary of Terms Used in Librarianship, Docu­
mentation and the Book Crafts and Reference Book, comp. Ray Prytherch (Brookfield, Vt.: Gower 
Publishing, 1990), 263. 

10. Brewster Kahle, “Preserving the Internet,” Scientific American 276 (Mar. 1997): 82–83. 
11. Wallace Koehler, “An Analysis of Web Page and Web Site Constancy and Permanence,” 

Journal of the American Society for Information Science 50 (Feb. 1999): 162–80. 
12. Corrinne Jorgensen and Peter Jorgensen, “Citations in Hypermedia: Maintaining Critical 

Links,” College & Research Libraries 52 (Nov. 1991): 528–36. 
13. K. E. Shafer, S. L. Weible, and E. Jul, “The PURL Project,” Annual Review of OCLC Research 

(1996): 25–26. 
14. Michael A. Arnzen, “Cyber Citations: Documenting Internet Sources Presents Some Thorny 

Problems,” Internet World 7 (Sept. 1996): 2–4. 



URLs : Uniform Resource Locators or Unreliable Resource Locators 365 

15. Publication Manual of the American Psychological Association (Washington, D.C.: American 
Psychological Association, 1994), 218. 

16. Netscape 2 Simplified (Foster City, Calif.: IDG Books Worldwide, 1996), 35. 
17. Kahle, “Preserving the Internet,” 82. 
18. Jeanette Woodward, “Cataloging and Classifying Information Resources on the Internet,” 

Annual Review of Information Science and Technology 31 (1996): 189–220. 
19. Pat L Ensor, “Libraryland Organizes the Web: An Unnatural Process?” Technicalities 15 

(Nov. 1995): 9–11. 
20. Norm Medeiros, “Making Room for MARC in a Dublin Core World,” Online 23 (Nov./ 

Dec. 1999): 57–60. 
21. Jennifer L. Marill, “A Survey of Standards for Identifying Serial Items on the Internet,” 

Acquisitions Librarian 21 (1999): 83–91. 
22. Karen Schneider, “Cataloging Internet Resources: Concerns and Caveats,” American Li­

braries 28 (Mar. 1997): 57.