Skip to main content
Alpha Diversity

An overview of common alpha diversity metrics for assessing within-sample community diversity.

Denise Lynch avatar
Written by Denise Lynch
Updated over 2 months ago

Alpha diversity is a term used to describe the "within-sample" diversity. It's a measure of how diverse a single sample is, usually taking into account the number of different species observed. Alpha diversity metrics are also often weighted by the abundances at which the individual microbes are observed. There are a number of commonly-used metrics to measure alpha diversity. We'll use these mock samples to describe some of these common metrics.

Alpha Diversity Metrics

Observed Taxa (Richness)

This is the most simple alpha diversity metric. It just counts up the number of different taxa you observe in a sample at a given taxonomic level, usually species. You may also see "observed species" or "observed OTUs" in some analyses, or perhaps other taxonomic levels. In our example above, sample A has 3 species, whereas samples B and C each have 5 species.

Shannon Index (H)

The Shannon Index (a.k.a. Shannon's diversity index, the Shannon–Wiener index, the Shannon–Weaver index, the Shannon entropy) is an estimator for both species richness and evenness, but with weight on the richness. The idea behind this metric is that the more species you observe, and the more even their abundances are, the higher the entropy, or the higher the uncertainty of predicting which species you would see next if you were to look at another read from this sample.

Note that we utilize SciKit-Bio to calculate our Shannon diversity. This historically uses a log with base 2 in the formula. Other Shannon calculators often use base e (natural log). Be aware when comparing between different sources for Shannon Index.

From this, we find the Shannon index for our above samples:

Shannon for sample A = 1.005

Shannon for sample B = 1.606

Shannon for sample C = 1.212

You can see that the evenness of the abundances of the microbes in sample B has led to an increased diversity compared to sample C, even though they have the same number of species. Sample C is dominated by one species in particular (blue), making it less diverse.

Simpson Index

There are a number of different formulae that are derived from, and named, the Simpson Index. The most commonly used one, shown below, is the Gini-Simpson Index. The Gini-Simpson Index (or Simpson Index) follows a similar idea to the Shannon Index. It is based on the probability that two entities (microbes, or reads) taken from the sample at random are of different types (e.g. species). As this is a probability, the resulting scores range from 0 to 1.

In our example dataset:

Simpson for sample A = 0.601

Simpson for sample B = 0.799

Simpson for sample C = 0.608

Inverse Simpson Index

The Inverse Simpson Index is another derivation of the above Simpson index, calculated as:

In this case, our examples show:

Inverse Simpson for sample A = 2.504

Inverse Simpson for sample B = 4.966

Inverse Simpson for sample C = 2.549

Phylogenetic Diversity

The above metrics consider the number of different species observed, and some consider the abundances at which those species are found in the sample. However, if all of the species you have observed are from the same genus, for instance, your sample will not have as diverse metagenomic content (or different genes) as another sample with the same number and abundances of species, where the species come from many different genera, or from different branches at any other taxonomic level. Phylogenetic Diversity (Faith's Phylogenetic Diversity; PD) is the sum of branch lengths between the observed species on a phylogenetic tree.

Uses for Alpha Diversity

Rarefaction Curves

Usually, the deeper you sequence or sample your data, the more species you'll observe, until such a point as you have observed all of the species in your sample. We often see researchers use this observation to give themselves an idea of whether or not they have sequenced their samples deeply enough. Rarefaction curves help with this. Researchers rarefy (sub-sample) their sequence data or reads for a sample to multiple different depths, calculate the alpha diversity, and plot how the alpha diversity changes with increasing number of reads. Using this approach, you should see the diversity level off eventually. One caveat of rarefaction curves is that, due to sequencing error, you can see artificial inflation of alpha diversity with sequencing depth. Our Targeted Loci Database includes post-classification filtering to correct for this issue.

Group Comparisons

Most researchers will examine alpha diversity to determine if there are major differences between two populations or groups in their data set, or if there have been major changes within a group over time. One example of this, which you can see in our blog post, examines the microbiota of infants from three countries. We plotted the species richness ("Chao1"), Shannon, and Simpson diversities for these three groups, below.

The Chao1 Index indicates that the three groups have very similar numbers of observed species. However, both Simpson and Shannon indices hint that the microbiota of the infants from Finland may have a less even spread of abundances than the two other countries.

This is a common "first look" at the microbiome, giving an idea that there may be some differences between these populations. These differences are quite small, with large overlaps. Statistical tests would give further insight on these data. These tests could be as simple as a t-test if the data come from a normally-distributed population, or a Mann-Whitney test if not. Microbiota samples often show differences due to the age of the participant, or other confounding factors, which could be accounted for by using linear modelling.

Alpha Diversity through One Codex

We've made it easier to examine alpha diversity within the One Codex platform. Using the Compare Analyses tab, you will find both the Simpson Index and Shannon Index under the bar plot for each sample. These are calculated on the taxonomic rank you have selected to plot, so if you choose to change the plotted rank, you will see these indices change accordingly.

Our new Custom Plots tools lets you compare and visualize a handful of different alpha diversity metrics between groups of samples. You can use the Metadata associated with your samples to identified differences in alpha diversity between groups or within a group over time.

For our users who are comfortable with scientific computing in Python, we have also made it possible to calculate alpha diversity using the onecodex python library in our Notebooks application.

You can calculate the alpha diversity for your samples with this command:

my_samples.alpha_diversity(metric='shannon')

The metrics we've made available for you include shannon, simpson, and observed_taxa. By default, these metrics are calculated on species-level abundance data. However, you can choose to calculate at a different taxonomic rank, such as:

my_samples.alpha_diversity(metric='shannon', rank='genus')

The available ranks include kingdom , phylum , class , order , family ,

genus , and species .

You can also generate box plots for your metadata variable of interest, like those shown above, using this command:

my_samples.plot_metadata(
vaxis='shannon', haxis=<your_metadata_variable>,
return_chart=True
)

Next Steps

Take a look at our Beta Diversity article to see some of the other commonly-used microbiome analyses and metrics.

Did this answer your question?