Background: For years, Uniform Resource Locator (URL) decay or "link rot" has been a growing concern in the field of biomedical sciences. This paper addresses this issue by examining the status of the URLs published in MEDLINE abstracts, establishing current availability and estimating URL decay in these records from 1994 to 2006. We also reviewed the information provided by the URL to determine if the context that the author cited in writing the paper is the same information presently available in the URL. Lastly, with all the documented recommended methods to preserve URL links, we determined which among them have gained acceptance among authors and publishers.
Methods: MEDLINE records from 1994 to 2006 from the National Library of Medicine in Extensible Mark-up Language (XML) format were processed yielding 10,208 URL addresses. These were accessed once daily at random times for 30 days. Titles and abstracts were also searched for the presence of archival tools such as WebCite, Persistent URL (PURL) and Digital Object Identifier (DOI).
Results: Results showed that the average URL length ranged from 13 to 425 characters with a mean length of 35 characters [Standard Deviation (SD) = 13.51; 95% confidence interval (CI) 13.25 to 13.77]. The most common top-level domains were ".org" and ".edu", each with 34%. About 81% of the URL pool was available 90% to 100% of the time, but only 78% of these contained the actual information mentioned in the MEDLINE record. "Dead" URLs constituted 16% of the total. Finally, a survey of archival tool usage showed that since its introduction in 1998, only 519 of all abstracts reviewed had incorporated DOI addresses in their MEDLINE abstracts.
Conclusion: URL persistence parallels previous studies which showed approximately 81% general availability during the 1-month study period. As peer-reviewed literature remains to be the main source of information in biomedicine, we need to ensure the accuracy and preservation of these links.