Organizing the sampling – OpenWings Project

Although it’s been a little over a year since our last post, we’ve been busy. As I noted in the introductory post, the goal of OpenWings is to generate a time-calibrated phylogeny for all bird species using vouchered museum specimens. The important word here being “vouchered”.

A voucher or voucher specimen is a preserved organism that represents the animal used in a study (or studies). The voucher can be a source of phenotypic or genetic data (or both). Because vouchers are housed in museums, they provide a resilient, tangible reference that can be used to confirm (or define) the identity of a species, subspecies, or population. Vouchered specimens are important because they allow researchers to deal with problems like misidentification, incorrect or changing taxonomy, and differences in species concepts. Vouchered specimens can also provide a temporal record of how species, subspecies, and populations have changed over time — provided sufficient specimens have been collected over the intervals of interest. Vouchered specimens can also be type specimens.

One of the hurdles for a project like OpenWings is identifying vouchered specimens from which we can collect genetic data and doing that across a number of ornithological collections throughout the United States and the rest of the world. Luckily, we have some excellent collaborating institutions who have provided their specimen databases, and we’ve integrated the information in these databases with publicly available sources of specimen information like those accumulated through VertNet and The Atlas of Living Australia. The goal of all this integration is to provide OpenWings with the best possible way of: (1) dealing with taxonomic issues that make identifying species difficult, (2) choosing tissue sources for genetic data collection having specimen vouchers, and (3) in lieu of identifying tissues for some species, selecting those specimens from which we may be able to collect toepads for genetic data collection. This post lays out some of the challenges of selecting these specimens.

The Problem of Taxonomy

Our project seeks to infer a phylogeny of “all bird species”, but what exactly is a species is a matter of some debate (commonly referred to as “the species problem“). With birds, “the species problem” also causes some difficulty because there are a number of different systems or “taxonomies” used to name, count, and organize the world’s bird species.

Birders may be familiar with some of these approaches, which can be downloaded as “species checklists” or available, more formally, as published volumes. Well known “species checklists” for birds include the Clements Checklist, the IOC World Bird List, and the Howard and Moore Complete Checklist of the Birds of the World. Regardless of which which checklist or taxonomy one selects, there are differences among them – for example the IOC World Bird List recognizes the Andean Goose as Chloephaga melanoptera, while the Clements Checklist recognizes the same species as Oressochen melanopterus.

These differences may arise as a function of age (one checklist is older than another and the newer checklist incorporates new information) or disagreements by the checklists authors over how, why, and when new bird species are defined. Generally, there are three sources of taxonomic disagreement among checklists — they may disagree about: (1) newly discovered species previously unknown to science, (2) newly defined species which can result from “splitting” an older species in two, and (3) newly collapsed species which can result from “lumping” two older species into one.

These taxonomic differences are problematic to a study like like OpenWings for a couple of reasons. First, each taxonomy may disagree about the total number of bird species that exist — because one taxonomy may call a newly split species two different things while another taxonomy continues to refer to this species using a single species name. The second problem arises because different institutions may use different taxonomies to organize their specimen collections — for example Institution 1 may use the Clements Checklist to name and identify species while Institution 2 may use the Howard and Moore Complete Checklist of the Birds of the World. When you are trying to generate a phylogeny of all bird species, these two problems make it hard to determine both the number and the names of all bird species and which institutions have tissues or specimens of which particular species.

Why We Are Using the IOC World Bird List

For OpenWings, we are trying to work through these problems by selecting a single, recently updated taxonomy that, if anything, is more liberal with its definition of species (meaning it tends towards including more recent “splits” and fewer recent “lumps”). We also wanted a taxonomy that might, at least, ease the burden of making some comparisons among all of the different taxonomies by providing a table of how species names in the taxonomy we chose map onto species names of other taxonomies.

After comparing a number of examples, we decided to use the IOC World Bird List as the base taxonomy for OpenWings because it met these requirements: it’s continuously updated while also providing a very nice comparison of the differences and similarities between itself and other taxonomies.

Computation to the Rescue

Now, to work through the larger problem of which institutional holdings contain which bird specimens (and/or which institutions have tissues from bird specimens), we generated a back-end database that contains much of the content of the taxonomic comparison table from the IOC World Bird List. This database also has a number of tables that contain the holdings of some of our collaborating institutions (as well as tables detailing the holdings from all institutions in VertNet and The Atlas of Living Australia.

To identify which institutions have which specimens, we use some computer code to iterate across all possible names one species could have (e.g. from several different checklists) and we check for all of those entries in all possible specimen tables. We then do this across all of the named bird species in the IOC World Bird List.

What this produces is effectively a spreadsheet of which institutions have specimens of which species and what types of material are present for each particular specimen. From this list, we basically hand-curate which individual specimen we want to collect according to a hierarchy of rules that do things like (1) prefer tissues to toepads, (2) prefer collaborating institutions to other institutions, (3) prefer younger versus older specimens, etc.