Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. Sep-Oct 2005;12(5):576-86.
doi: 10.1197/jamia.M1757. Epub 2005 May 19.

ALICE: An Algorithm to Extract Abbreviations From MEDLINE

Affiliations
Free PMC article

ALICE: An Algorithm to Extract Abbreviations From MEDLINE

Hiroko Ao et al. J Am Med Inform Assoc. .
Free PMC article

Abstract

Objective: To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly.

Methods: ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules.

Results: It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database.

Conclusion: ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate. This system is freely available at http://uvdb3.hgc.jp/ALICE/ALICE_index.html.

Figures

Figure 1.
Figure 1.
Definitions of special expressions used in this paper. An inner is a string inside a pair of parentheses, a left-chunk is a string before the left parenthesis, and an outer is a string extracted from the left-chunk and is the correspondent of the inner as a pair of an abbreviation and its expansion.
Figure 2.
Figure 2.
ALICE overview. In the Inner Search (IS) phase, ALICE searches for a pair of parentheses and identifies an inner. Once the inner is identified, its left-chunk is also determined. Then, its outer is extracted from the left-chunk in the Outer Extraction (OE) phase. Finally, in the Validity Judgment (VJ) phase, the validity of the set of the inner and its outer as an abbreviation-expansion pair is judged. If an inner is an abbreviation, the outer is its expansion. Inversely, if an inner is an expansion, the outer is its abbreviation.
Figure 3.
Figure 3.
Definitions of special expressions used for stop word lists. An inner front word is the word preceding a left parenthesis, an inner first word is the first inner word, and an outer first word is the first outer word. This figure shows an example that an outer and an inner consist of three and two words, respectively.
Figure 4.
Figure 4.
ALICE flowchart. ALICE scans each sentence four times in the Inner Search (IS) phase (1 ≤ n ≤ 4), and it checks each string before the left parenthesis up to 16 times in the Outer Extraction (OE) phase (1 ≤ m ≤ 16). The extracted inner-outer pair is evaluated with the five acceptance conditions in the Validity Judgment (VJ) phase.
Figure 5
Figure 5

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

Publication types

MeSH terms

Feedback