Dr. Tom's Taxonomy Guide

Photo of Dr. Tom wearing a hat.
Thomas D. Wason, Ph.D. (aka Dr. Tom)
http://www.tomwason.com
wason@mindspring.com



One of the Dr. Tom Guides

Purpose of Document

The purpose of this document is to define taxonomies and their uses. A selection of useful taxonomies is provided.

Document Information

Title Dr. Tom's Taxonomy Guide: Description, Use and Selections
Author(s) Thomas D. Wason (Initial selection of taxonomies by Dave McArthur)
Version Date 15 February 2006 Current version 1.02
Copyright Copyright © 2000 IMS Global Learning Consortium, Inc.
Used by permission.

Contents

  1. Description of a Taxonomy
  2. Uses of Taxonomies
  3. A Selection of Useful Taxonomies

1. Description of a Taxonomy

A taxonomy is a knowledge map of a topic, typcially realized as a controlled vocabulary of terms and or phrases. A taxonomy is an orderly classification of information according to presumed natural relationships. Denise A. D. Bedford, Ph.D. of the world bank enumerates four types of taxonomies: flat, hierarchical, faceted and network (Taxonomies for Information & Knowledge Management Architectures, URL: http://www.sla.org/chapter/cdc/presentations/20030204_taxonomies.ppt). Hierarchical is the most common form. A vocabulary is the simplest form of taxonomy. It has only one level, and comprises a list of allowable terms or phrases. The terms may have identifiers such as numbers and/or letters. The IMS meta-data field general.difficulty has a vocabulary of four levels that are normally indicated by the numeric values 0.4.

The most typical form of a taxonomy is a hierarchy. At the top level, general terms or descriptive phrases are used. Each of the general terms has beneath it a set of terms that provide more refinement of the top-level term. Each of these second level terms may have a set of refining terms beneath it. Frequently each term has an alphanumeric identification. As an example, the top-level terms from the Library of Congress Classification Outline are:

  • A -- GENERAL WORKS
  • B -- PHILOSOPHY. PSYCHOLOGY. RELIGION
  • C -- AUXILIARY SCIENCES OF HISTORY
  • D -- HISTORY: GENERAL AND OLD WORLD
  • E -- HISTORY: AMERICA
  • F -- HISTORY: AMERICA
  • G -- GEOGRAPHY. ANTHROPOLOGY. RECREATION
  • H -- SOCIAL SCIENCES
  • J -- POLITICAL SCIENCE
  • K -- LAW
  • L -- EDUCATION
  • M -- MUSIC AND BOOKS ON MUSIC
  • N -- FINE ARTS
  • P -- LANGUAGE AND LITERATURE
  • Q -- SCIENCE
  • R -- MEDICINE
  • S -- AGRICULTURE
  • T -- TECHNOLOGY
  • U -- MILITARY SCIENCE
  • V -- NAVAL SCIENCE
  • Z -- LIBRARY SCIENCE

The category "B -- PHILOSOPHY. PSYCHOLOGY. RELIGION" has an extensive set of sub-categories. Among them is F, Psychology. Psychology is further divided, and includes 180-198.7, Experimental psychology. These levels can be shown in outline form: class="indent"

  B -- PHILOSOPHY. PSYCHOLOGY. RELIGION
    F, Psychology
      180, Experimental psychology 
      

Those familiar with the "Sniffyv1p1.xml" meta-data record example will recognize this classification. Note that each level includes both an index, e.g., B, and a term or phrase, e.g., PHILOSOPHY. PSYCHOLOGY. RELIGION. Not all taxonomies contain both. class="indent"

2. Uses of Taxonomies

A taxonomy provides a controlled vocabulary for populating fields or taxonpaths. A taxonomy may be used for two major reasons: 1) to limit the choices of field values to a controlled set; and 2) to use terms that are defined by a known source. As noted above, the simplest taxonomy is a controlled vocabulary. For example, the IMS Meta-Data general.structure field (1.8) has a restricted vocabulary (i.e., Collection, Mixed, Linear, Hierarchical, Networked, Branched, Parceled, Atomic) from which the single field value can be drawn. The source of this vocabulary is the IEEE LTSC LOM; IMS maintains a mirror of that vocabulary. The Taxonomy and Vocabulary Guide section of the IMS Meta-Data Best Practices Guide (mdbestv1p1.html) provides a description of the use of taxonomies, which this document serves to supplement.

A more complete example of the use of taxonomies is in the IMS Meta-Data Classification category. For review, the organization of the Classification category (9) is:

        classification
          purpose
            taxonpath
            description
            keywords
      

"purpose" refers to the purpose of the classification, not the purpose of the resource. For example, a purpose of "discipline" means that this instance of classification describes the discipline, or subject area, of the resource. A cataloger might select the Library of Congress Classification (LCC) to describe the discipline (or subject-the name was selected to reduce confusion internationally) of the resource. The LCC is a taxonomy, thus, it can be used to populate a taxonpath. The structure of an IMS meta-data taxonpath is:

      taxonpath
        source
        taxon
          id
          entry
          taxon
            id
            entry
            taxon
              id
              entry 
      

The source specifies the controlling authority for the taxonomy used. In this example, LCC. The multiple taxons comprise an ordered list. The number of taxons in the list can range from 1 upward; the specification states that at least 16 should be supported (the minimum maxima). Each taxon in the list contains one value. The sub-terms refine the descriptions of the parent term. Each taxon has an id and entry. The id is an alphanumeric reference. Each taxon node in a taxonomy has a descriptive term, which is contained in the entry. Beneath each taxon node is a selection of sub-terms from which a value can be selected. Each sub-term is also a taxon node. Within the IMS Meta-Data taxonpath, only one value (taxon) can be selected at each level, thus the list of taxons is a specific pathway down through the source taxonomy.

Each classification instance may contain multiple taxonpaths. For example, a cataloger may choose to either use several discipline taxonomies to describe an resource, or the cataloger may choose to describe the resource's discipline through several taxonpaths within the same taxonomy if the resources covers more than one discipline or sub-discipline. Continuing with our Sniffy example, a classification taxonpath describing Sniffy could appear as follows:

       taxonpath
         source: LCC
         taxon
           id: B 
           entry: PHILOSOPHY. PSYCHOLOGY. RELIGION
           taxon
             id: F 
             entry: Psychology
               taxon 
               id: 180 
               entry: Experimental psychology
     

The concatenated id is: BF 180.

A resource may be classified several ways by repeating the classification category with several purposes. For example, a resource may have a classification with a purpose of "discipline" to describe its subject area using the LCC, and may also have a purpose of "Educational Objective" to describe the educational objectives of the resource using the McRel taxonomy (see below).

Taxonomies may also be the sources of structured controlled vocabularies for other IMS implementations, such as the educational level, skills and goals within the Learner Profile.

3. A Selection of Useful Taxonomie

IMS does not endorse any particular taxonomy or set of taxonomies. Nor do I. A selection of useful taxonomies is provided below. This selection is not exhaustive. Use of taxonomies from this selection is not required. Organizations may choose to create their own taxonomies.

LCSH: Library of Congress Subject Headings

(Introduction: http://www.tlcdelivers.com/tlc/crs/shed0014.htm) There are no online versions of the LCSH. It is a set of 5 volumes.

"The Library of Congress subject headings system was originally designed as a controlled vocabulary for representing the subject and form of the books and serials in the Library of Congress collection, with the purpose of providing subject access points to the bibliographic records contained in the Library of Congress catalogs."

"As an increasing number of other libraries have adopted the Library of Congress subject headings system, it has become a tool for subject indexing of library catalogs in general. "In recent years, it has also been used as a tool in a number of online bibliographic databases outside of the Library of Congress."

LCC: LIBRARY OF CONGRESS CLASSIFICATION OUTLINE

(http://pharos.alexandria.ucsb.edu/demos/lcc.html, http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html, http://geography.about.com/science/geography/library/congress/bllc.htm)
The LCC is a subject taxonomy maintained by the US Library of Congress. It is a good subject taxonomy, but is US-centric. Its widespread use makes it an attractive choice.

GEM: Gateway to Educational Materials

(http://www.geminfo.org/Workbench/Workbench_vocabularies.html)
Subject is a mandatory element for any GEM resource. GEM also offers a controlled vocabularies (the equivalent of a taxonomy) for this element; two levels of GEM controlled vocabularies are offered, the first approximating a top-level discipline taxonomy. The second level provides more detailed descriptions and is not, technically, a taxonomy. GEM also permits the use of other controlled vocabularies, such as ERIC and NICEM. These are optional, in contrast to GEM's. The level-one GEM controlled vocabulary is mainly oriented to K12, although they do recognize the following "grade levels" (another element) of educational materials: K12, Adult/continuing education, Higher education, Preschool education, Vocational education.

YAHOO

(http://www.yahoo.com/Education/By_Subject/)
Under education this portal site uses just a few top-level educational subjects. Each has a rich decomposition, although not necessarily in terms of sub-disciplines.

McRel

(http://www.mcrel.org/standards-benchmarks/)
McRel provides databases (and services) to help users access information about educational materials (primarily K12). They organize their content knowledge primarily in terms of standards (educational objectives; over 250) and benchmarks (specific grade-indexed skills; almost 4000). At the top-level, however, they also use subject taxonomy (14 terms); these are, in effect, mandatory elements.

CIP: Classifications of Instructional Programs (DOL/CVU)

(http://nces.ed.gov/npec/papers/cipPreface.html)
The US Department of Labor, Employment and Training Administration uses CIP (Classifications of Instructional Programs) codes in their ALMIS (America's Labor Market Information System) database. The CIP codes are also used by NCES. In addition, it has been adopted by the CVU (see http://www.california.edu/catalogs_prog_cat.asp). This is not the same classification used in another DoL database, American's Job Bank. The subject list in the table reflects the first level (2 digits of 6) of the CIP codes. Many databases use the sub-discipline categorizations as well as the first-level terms.

Career Resource Library, America's Job Bank, US Department of Labor

( http://www.acinet.org/acinet/resource/occup/occup.htm) America's Career InfoNet site displays occupational information with a two level occupational taxonomy. The resource library taxonomy contains online career information arranged under broad subject categories.

Taxonomy of Educational Technology

( http://www.lis.uiuc.edu/~chip/pubs/taxonomy/index.html)
A Taxonomy of Media for Inquiry, Communication, Construction, and Expression from the College of Education University of Illinois at Urbana-Champaign.

2000 Mathematics Subject Classification

(http://www.ams.org/msc/)
"The Mathematics Subject Classification (MSC) is used to categorize items covered by the two reviewing databases, Mathematical Reviews (MR) and Zentralblatt MATH (Zbl). The MSC is broken down into over 5,000 two-, three-, and five-digit classifications, each corresponding to a discipline of mathematics (e.g., 11 = Number theory; 11B = Sequences and sets; 11B05 = Density, gaps, topology).

"The current classification system, 2000 Mathematics Subject Classification (MSC2000), is a revision of the 1991 Mathematics Subject Classification, which is the classification that has been used by MR and Zbl since the beginning of 1991. MSC2000 is the result of a collaborative effort by the editors of MR and Zbl to update the classification."

American Mathematics Metadata Task Force

(http://mathmetadata.org/ammtf/taxonomies/)
Proposed subject classifications for school and college mathematics.

Medical Subject Headings
(http://www.nlm.nih.gov/mesh/filelist.html)

"The Medical Subject Headings comprise National Library of Medicines's controlled vocabulary used for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts." This is a large taxonomy (21MB).

 

Many of the terms in this guide are defined in the Glossary.
 

Author:

Thomas D. Wason, Ph.D. (aka Dr. Tom)
wason@mindspring.com
http://www.tomwason.com
+1 919.602.6370

Go to Top
http://www.tomwason.com