Methods of Classification

I. Types of Characters and Character States

A. Qualitative vs. Quantitative Characters

Qualitative = nonnumeric, such as shapes, form (ovate, elliptical)

Quantitative = numeric (some measurement, e.g. leaf 2.5 cm long)

B. Discrete (Discontinuous) vs. Continuous

1. Discrete (no "1/2-way" states possible) = meristic, e.g. four stamens.

a. Binary Characters = 2 states (e.g. 0, 1). Leaves present or absent

b. Multistate = greater than 2 states (e.g. 0,1,2). Flowers red, blue, purple. Note that most multistate characters can be recoded into a series of binary characters

2. Continuous (an infinite number of states between any two states) e.g. any numeric measure (1.2 cm, 1.238 cm). These types of characters are more useful at the lower levels of the taxonomic hierarchy.

II. Phenetic Approaches to Classification

A. History

1. Michel Adanson (French, 1727-1806). Wikipedia. Worked in Africa, used 65 different characters to classify plant families. Philosophical (if not actual) basis for numerical taxonomic methods that followed.

2. R. R. Sokal and Peter H. A. Sneath. In the 1950s, Peter Sneath (England) was working on bacterial classfication systems using computers. Sokal and Sneath teamed up in 1963 to produce Principles of Numerical Taxonomy. Later, Sneath and Sokal (1973) wrote Numerical Taxonomy: The Principles and Practice of Numerical Classification. This became a standard work and started a bandwagon rush by taxonomists to employ these methods in classification. A more recent book that is a great reference for many numerical/statistical methods in biology (and is Biostatistics courses) is:

Sokal, R. R. and F. J. Rohlf. 1995. Biometry: the Principles and Practice of
Statistics in Biological Research. W. H. Freeman and Co., New York, NY.

Although the phenetic approach was adopted by many then, and it has relevance to certain types of study even today, it is not necessarily appropriate if one wishes to look at phylogenetic relationships. 

B. Features of Phenetic Classification

1. Phenetics seeks to express natural relationships among organisms by analyzing large numbers of equally weighted, noncorrelated characters. 

2. Phenetic classification makes no assumptions about phylogeny, no implications on ancestry, no statements on evolution of the group. But, evolutionary considerations DO enter into phenetic analyses. One must compare homologous features. For example, what purpose would it serve to compare cladodes (flattened, modified stems) in one group with phyllodes (flattened, modified petioles) in another? I will further discuss the topic of homology later in this lecture.

C. Phenetics in Practice

1. Steps in a phenetic analysis

a. Select the organisms under study. Taxa are called OTUs (Operational Taxonomic Units).
b. Choose the characters that will be scored for each OTU.
c. Construct an OTU by character matrix.
d. Use some mathematical formula to describe the degree of similarity or distance between each pariwise comparison of OTUs. For example, the Simple Matching Coefficient (S) is:
 
S = # matches between two OTUs X 100 [matches = 1, 1 and 0,0]
Total # of characters
 
e. Construct another matrix that gives all pairwise S values
f. Use a clustering technique (more on this below) to produce a dendrogram. A commonly used one is UPGMA
(Unweighted Pair Group Method with Arithmetic Averaging).

2. Example:

Taxon by Character Matrix

Taxon/char. 1 2 3 4 5 6 7 8 9 10
A 0 1 1 0 0 0 1 1 1 0
B 0 0 0 1 1 1 0 1 1 1
C 0 0 1 0 0 1 0 0 0 1
D 1 1 0 0 0 1 1 1 1 0

Similarity Matrix

   A  B  C  D
 A  -  0.3  0.4  0.7
 B    -  0.5  0.4
 C      -  0.3
 D        -

So, A is paired with D and B is paired with C. At what similarity value are these two clusters joined?

(A to B) + (A to C) + (D to B) + (D to C)
0.3 + 0.4 + 0.4 + 0.3 = 14/4 = 0.35

UPGMA Phenogram

UPGMA phenogram

 D. Clustering Techniques

1.  Figure 7.2 from Stuessy (1990) showing that there are three basic kinds of clustering: agglomerative, divisive, and ordination (simultaneous estimation of group structure).

2.  Treatment of cluster analysis from Wikipedia (if you want to learn more about this!).

III. Cladistic Approaches to Classification

A. Warren H. Wagner, Jr. (1920-2000, U of Michigan, Ann Arbor). Biography by D. Farar.  

1. While working on Hawaiian ferns, Wagner wanted to depict the amount and pattern of phylogenetic divergence. He developed the Groundplan/Divergence Method (Syst. Bot. 5:173-193; 1980) which was published by is student J. Hardin (1957) and later computerized. The method, termed Wagner Parsimony, is similar to that of Hennig (next), but differs in the way it deals with primitive character states.

2. Relationships between taxa are determined by the number of shared derived features (synapomorphies), but this cladogram (Figure from Stuessy 1990) is superimposed on an advancement index which is really a distance measure. Taxa are plotted on semicircles the correct distance from the ancestor which is assumed to be all 0 states.

B. Wilhelm (Willi) Hennig (1913-1976). Wikipedia. A German Entomologist who worked on bees in the 1950s. Published his ideas on phylogenetic classification entitled: Basic Outline of a Theory of Phylogenetic Systematics (in German). First to fully explain the philosophy and methodology of the cladistic approach. Not widely known until reprinted in English (U of I Press) in 1966. In 1980, zealous cladists "split" from the pheneticists and formed the Willi Hennig Society - their journal is called Cladistics.

C. The term cladistics derives from the word "clade" which is from Latin for "branch." This refers to the branching nature of phylogenetic trees called "cladograms". Unlike phenetics that uses overall similarity, cladistics uses nested sets of shared derived characters called synapomorphies (see below) to determine relationships among the taxa. Using the above example matrix (phenetics part of the lecture), only the shared "1" character states are used. The shared "0" states (symplesiomorphies) have no bearing on determining relationships. See below.

IV. Cladistic Terminology

A. Cladistic terminology (modified from Kendrick and Crane 1997, p. 8). Best to learn them by reference to the following tree:

cladogram

1. Ingroup. Group being studied.

2. Outgroup. Group "outside (but not too distantly related) the one being studied. Used to root the tree (set character state polarities). See below.

3. Plesiomorphy. The original (ancestral) character state. Shared plesiomorphies are called symplesiomorphies - uninformative similarity. Two kinds shown on tree, a) (1) which is shared by the ingroup and outgroup and b) (3, 4, 5) which are shared by the ingroup taxa only.

4. Apomorphy. A derived character or character state. Shared apomorphies are called synapomorphies - informative similarity. The only synapomorphic characters on the tree are 6, 7, and 8. These provide information on branching relationships within the ingroup.

5. Autapomorphy. Uninformative differences unique to particular ingroup taxa. Characters 9, 10, and 11 are all autapomorphies for their respective taxa. Even though these characters are different between the taxa, they provide no cladistically useful information. Bear in mind though that they do provide information on branch length. A cladogram that shows branch length proportional to the amount of character change (including autapomorphic changes) is called a phylogram.

6. Homoplasy. Uninformative similarity - i.e. due to convergence or parallelism. Character 12 is an example where it is present in only some of the ingroup taxa. Clade A, B supported by character 12. .Clade B,C supported by characters 6, 7, and 8. So, by the principle of parsimony, clade A, B is favored. Character 12 can also be interpreted as convergent in A and B or as holapomorphic (i.e. present in the ingroup and their ancestor) but was lost in taxon C. Parsimony simply means "economy in reasoning," or, if all things are equal, the simplest explanation is the preferred one (Ocam's Razor).

V. Cladistics in Practice

A. User-defined character polarization. Requires knowledge of the primitive and derived states of all or some of the characters being used.

Example from "Evolutionary Trends in Angiosperms" (from Radford, et. al. 1974). 

Character 2  "habit" with state 1 = "woody" (0) and state 2 "climbing or herbaceous" (1) 

In general, woodiness is considered primitive, thus climbing would be considered advanced (derived). But how do we really know what is primitive and advanced without having a known phylogeny in hand?  If we are using these data to reconstruct a phylogeny, do the above assumptions cause us to run the risk of using circular reasoning?

B. Polarization based upon the outgroup criterion. Given that we do not always know whether one character state is primitive and another advanced, another method is available for cladistic analyses. Here, one assigns character states (e.g. 0, 1) but without any presumption about which of the two is primitive. These are then entered into a matrix for all ingroup taxa. Next, an outgroup is chosen that is clearly outside the group being studied, but not too distant. The analysis is conducted and the tree is rooted based upon the known outgroup. After the fact, one can then go back and look at all characters to determine which of the two states is primitive and advanced based upon the way they plot on the tree.

C. Computerized methods.  Although the Wagner Groundplan Divergence method can be conducted manually, most people these days use computer programs to implement cladistic analyses. One program in especially wide use is PAUP* (Phylogenetic Analysis using Parsimony) written by David Swofford.  Another program useful in exploring character changes on a tree is MacClade written by David and Wayne Maddison. PAUP* infers phylogenies by selecting the trees that minimize tree length (number of steps) and minimize homoplasy. This is the parsimony criterion. The algorithms available to accomplish this criterion are varied. Imposing no constraints on character state transformations is called Fitch Parsimony (Fitch 1971), whereas imposing minimal constraints is called Wagner Parsimony (formalized by Kluge and Farris 1969).

VI. Relationships of Taxa Derived from Cladistic Analyses

A. Monophyly. A monophyletic group composed of a common ancestor and all its descendents. Using the "cut" method, one cut below the group, no cuts above this point. For the most part, your book strives to classify only monophyletic groups.

B. Paraphyly. A paraphyletic group contains some but not all the descendants of a common ancestor. A group defined by a sympleisomorphic and apomorphic features (Hennig 1966). Using the "cut" method, one cut below the group, one cut above this point.

C. Polyphyly. A polyphyletic group that does not contain an ancester common to all taxa. Convergent similarity (Hennig 1966). Using the "cut" method, one cut below the group, two or more cuts above this point.

monophyletic


SIUC / College of Science / Elements of Plant Systematics
URL: http://www.plantbiology.siu.edu/PLB304/PhenClads.html
Last updated: 02-Feb-09 / dln

 

.

.