Taxonomy

Purpose

The purpose of this article is to describe and explain taxonomies so that a reader will appreciate how a taxonomy, as a type of information structure, can help bring order to a body of knowledge.

The article will also provide a fundamental insight into the basic uses of taxonomies. Upon completion of this chapter, a reader will appreciate the problem that taxonomies solve and the prominent features of a taxonomy. We will provide some examples that command instant recognition.

Taxonomy Background

Taxonomies are important because they bring order to a body of information or a collection of objects. Greater orderliness very often enhances one's ability to understand complex subject matter.

Neither the taxonomy term nor the concept is new. Probably the most famous use of a taxonomy was its use by the Swedish scientist Linnaeus to classify the biological world. It was one of the early successful attempts to bring order to an otherwise disorganized body of knowledge.

Even today taxonomies are very useful to break down a large entity into constituent parts that, when individually understood, help make sense of the larger whole. Quite often the “large body” that's broken down is knowledge about some thing or some subject. Taxonomies of this sort include glossaries, dictionaries and encyclopedia. Taxonomies need not embrace knowledge exclusively; they may also bring order to objects or collections of objects. For example, a listing of rifle parts categorized according to major parts of a rifle is a taxonomy. Also, listings of rifles categorized according to caliber classifications comprise a taxonomy.

Taxonomies are used by virtually everybody.  Often we don't even think about it.

Precisely what are taxonomies and what are they used for?

Explanation

The following definitions and examples will expand our understanding of taxonomies.

1.  Definitions

The following defines and also explains the origin of the term “taxonomy”:

Taxonomy (from Greek taxis meaning arrangement or division and nomos meaning law) is the science of classification according to a pre-determined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis, or information retrieval. In theory, the development of a good taxonomy takes into account the importance of separating elements of a group (taxon) into subgroups (taxa) that are mutually exclusive, unambiguous, and taken together, include all possibilities. In practice, a good taxonomy should be simple, easy to remember and easy to use. 

The function of a taxonomy is to classify or categorize a body of knowledge or collection of objects in a hierarchical manner. Each tier of the hierarchy “inherits” or possesses all attributes of the one immediately above - whatever those attributes might be. When one views a hierarchy from top to bottom, the matter becomes more particular and more specific the lower one goes in the hierarchy.

Advanced practitioners of information technology or knowledge representation usually refer to a body of knowledge as a “knowledge domain”.

Taxonomies are especially useful for searching for specific objects in a large collection (such as a Sears catalog) as well as for understanding concepts (such as those found in a glossary or a thesaurus).

Taxonomies may also be used to:

· Provide a controlled vocabulary for search engines.

· Delineate hierarchical relationships by means of an “IS A” relationship (e.g., a sedan IS AN automobile and an SUV IS AN automobile).

· Delineate types of hierarchical relationships such as classes, subclasses, and superclasses (e.g., the class “automobile” is a subclass of “motor vehicle.  In turn, "motor vehicle" is a superclass of “automobile”).

· Outline, catalog, or structure a given knowledge domain.

2.  Examples of Taxonomies

The following examples of taxonomies are discussed in order of increasing complexity and/or size:

    A.  Visualizing a Basic Example

To visualize a taxonomy, consider how a listing of American cities might appear when categorized according to a hierarchy of regions and states in the United States.  The following lists just a few cities for illustration purposes.

Region

Midwest

 

 

 

 

New England 

 

 

 

 

 

 

Southwest

State

Illinois

 

 

Indiana

 

Vermont 

 

 

Maine

 

Massachusetts

 

California 

 

Arizona

City

Chicago

Urbana

Springfield

Ft. Wayne

West Lafayette

Montpelier

Burlington

Middlebury

Portland

Freeport

Boston

Springfield

Los Angeles

San Diego

Tempe

Phoenix

One can readily visualize the above as a hierarchical categorization where each tier goes into increasing specificity from top to bottom.

Upper tiers pass on characteristics to lower tiers. Seen from another direction, members of lower tiers inherit characteristics from upper tiers.

Suppose we were to characterize the weather of each region as follows: Midwest, hot summers; New England, cold winters; Southwest, dry air year-round. Doing so would cause each of the constituent states and cities to inherit the same characteristics. To further differentiate states, however, we could conceivably use soil characteristics as an attribute, and to further differentiate among cities we might use relative population sizes.

By using attributes at each tier as we did above, taxonomies help us to “compare and contrast” between objects and concepts at different positions in the hierarchy. Very often that knowledge helps us to understand complex concepts or to identify a few particular objects among many in a large collection.

     B. The Dewey Decimal System

The Dewey Decimal System is the world's most actively used taxonomy. It is also known as the Dewey Decimal Classification System (DDC). Its function is to categorize all areas of knowledge so that books and pamphlets may be assigned a classification number appropriate to a particular subject.

It was invented by Melvil Dewey in 1873 when he was a librarian at Amherst College. It was first published in 1876 and was quickly adopted for general use by libraries and schools across the United States. Today it is the most widely used taxonomy in the world.

The system is now maintained by classification specialists at the Library of Congress, even though the Library of Congress has its own classification system. It has had 21 major revisions since its inception. More than 100,000 new numbers are assigned annually.

The system consists of a hierarchy of numbers delineated in classes, divisions, and sections called “summaries”. The word “decimal” is included in the name of the system in recognition of the role the number ten plays as a primary organizational feature: there are ten main classes; each main class has ten divisions and each division has ten sections. However, not all divisions and sections are in use.  

Numbers belonging to the system must be at least three digits in length. Additional numbers can be used by adding a decimal point followed by additional numbers to the extent needed to accomplish the desired degree of classification.

The following shows how the number “636.8” is assigned to the classification of cats in the section of Animal Husbandry in the division of Agricultural and Related Technologies in the class of Technology:

600 Number from (first) summary of ten classes (i.e., the class named Technology)
  630 Number from (second) summary of ten divisions (i.e., the division under class Technology named Agriculture)
     636 Number from (third) summary of ten sections (i.e., the section named Animal Husbandry)
        636.7 Number representing dogs
        636.8 Number representing cats

Thus, from the bottom up, the numbers for dogs and cats were created as (1) extensions of “636”, the classification for Animal Husbandry which in turn was (2) an extension of “630”, the classification for Agriculture which in turn was (3) an extension of “600”, the classification for Technology.

To see a description of the ten classes in the first summary, click here.  

To see a description of the Second Summary - The Hundred Divisions, click here

To see a description of the Third Summary - The Thousand Sections, click here.  

The DDS site also has an excellent tutorial, click here.   

    C. Other Examples of Taxonomies

The Taxonomy Warehouse web site is a portal to many other taxonomies. One can investigate numerous examples by picking and choosing from those listed in the site's main interface.  This is a good illustration of using a taxonomy for database search purposes.

One can also visit the Army's CALL (Center for Army Lessons Learned) web site using the Taxonomy Warehouse or by clicking here.  This site is home to a system that catalogues more than 18,000 U. S. Army terms, acronyms, and other expressions.

If other military uses of taxonomies are of interest, enter the Google search web site and perform a search using the term military taxonomy. The result will produce over 32,000 listings.

3.  Limitations of Taxonomies

Taxonomies are a promising but incomplete means of structuring information.

More precisely, one can say that they are a necessary but not sufficient condition for adequate understanding of precise meaning. While the taxonomic categorizations and sub-categorizations enhance understanding, the lack of detail in describing objects -- at each level of the taxonomy -- still leaves room for ambiguity as to the precise meaning of any particular term or object.

As we will further see, combining a taxonomy with additional detail about the data or metadata, as such information is called, greatly helps to reduce ambiguity and has other benefits as well.

4.  Conclusion

Taxonomies may not be “the ultimate” contemporary information structure, but they certainly can help reduce chaos when attempting to make order out of a body of knowledge or a collection of objects. They are simple and almost intuitive since one is encouraged early in school to use outlines (perhaps the briefest of taxonomies) when writing papers.

The taxonomy's greatest future use may be in providing foundations for other structures. Two information structures offering the greatest promise for reduced ambiguity, machine readability, and automated transfer of information are Extensible Markup Language (XML) schema and structures that use ontologies in their composition.

Back