Print Email

Digitization workflows for paleontology collections

Talia S. Karim, Roger Burkhalter, Úna C. Farrell, Ann Molineux, Gil Nelson, Jessica Utrup, and Susan H. Butts

Plain Language Abstract

With the advent of the U.S. National Science Foundation's (NSF) Advancing the Digitization of Biodiversity Collections program in 2011, the conversion of fossil museum specimens and their associated data to digital format has made great progress. With the help of NSF funding, several networks of paleontological museums and academic collections have joined forces to create an important and rapidly growing dataset of fossil occurrence records. Working with Integrated Digitized Biocollections (iDigBio), the U.S. national coordinating center for natural history collections digitization, participating institutions have also made strides in developing, implementing, and documenting digitization workflows applicable to a wide array of fossil collections. The nine workflow modules include a well-vetted set of recommended practices that reflect current practice in fossil digitization and are designed to provide guidance to institutions just beginning formalized digitization programs as well as those searching for ways to enhance and optimize existing digitization efforts. Each module offers a series of tasks to be accomplished, detailed explanations of what each task entails, and a set of the recommended resources to ensure success. Readers are encouraged to customize these workflows for specific implementations and to effect the best match between institutional infrastructure and module implementation.

Resumen en Español

Digitalización de flujos de trabajo para colecciones paleontológicas

El desarrollo de la digitalización de flujos de trabajo es una parte esencial de cualquier programa formalizado de digitalización de gran escala. La literatura sobre colecciones paleontológicas ha mencionado la necesidad y utilidad de colecciones digitalizadas por casi cuatro décadas pero no se ha adoptado ampliamente, con este fin, ningún set moderno de digitalización de flujos de trabajo aprobado por la comunidad. Con el advenimiento en el 2011 del programa Avance de la Digitalización de las Colecciones de Biodiversidad (ADBC, Advancing the Digitization of Biodiversity Collections) de la Fundación Nacional de Ciencia (NSF, National Science Foundation) estadounidense, el iDigBio, el centro de coordinación nacional de la NSF para facilitar la digitalización, en colaboración con una amplia representación de la comunidad en distintos institutos, lanzaron una serie de grupos de trabajo para abordar el desarrollo de flujos de trabajo abarcando los principales tipos de preparación. Los módulos de flujo de trabajo han sido desarrollados para la curación pre-digitalización, ingreso de datos, escaneo de objetos (catálogos, libretas de campo y otros materiales no almacenados con los especímenes, etiquetas, especímenes preservados bi- y tridimensionalmente), escaneo y procesado de imágenes, y digitalización proactiva. Los módulos y las tareas que incluyen pueden ser implementados en cualquier orden y personalizados para configuraciones específicas y parámetros institucionales. Los flujos de trabajo son públicos y disponibles para su descarga y personalización en GitHub y vía las páginas de documentación de iDigBio. También se provee una revisión de plataformas de publicación electrónica de datos a través de agregadores online, un paso crucial en cualquier programa de digitalización.

Palabras clave: flujo de trabajo; digitalización; escaneo; publicación de datos; paleontología; colecciones

Traducción: Diana Elizabeth Fernández

Résumé en Français

in progress

Translator: Kenny J. Travouillon or Antoine Souron

Deutsche Zusammenfassung

Digitalisierungs-Arbeitsabläufe für paläontologische Sammlungen

Die Entwicklung von Digitalisierungs-Arbeitsabläufen ist ein wichtiger Teil eines jeden formalisierten Digitalisierungsprogramms. Die paläontologische Sammlungsliteratur befasst sich mit dem Bedarf an digitalisierten Sammlungen und dem Nutzen derselben nun schon seit beinahe 40 Jahre aber bisher fand kein modernes allgemein-geprüftes Set von Digitalisierungs-Arbeitsabläufen eine breite Zustimmung. Mit der Gründung des Programms "Advancing the Digitization of Biodiversity Collections (ADBC)" der U.S. National Science Foundation's (NSF) im Jahr 2011, iDigBio, hat das nationale Koordinierungszentrum zur Vereinfachung von Digitalisierung des NSF in Zusammenarbeit mit einer breiten

Gemeinschaftsvertretung von zahlreichen Institutionen eine Reihe von Arbeitsgruppen eingeführt, um die Workflow-Entwicklung in allen wichtigen Präparationstypen anzusprechen

Workflow-Module wurden entwickelt für: Prädigitalisierungs-Kuration, Dateneingabe, Abbildung von Objekten (Kataloge, Feldnotizen und andere Materialien die nicht bei den Stücken aufbewahrt werden, Etiketten, zwei-und dreidimensional erhaltene Stücke), Bildverarbeitung und proaktive Digitalisierung. Module und die Aufgaben die sie beinhalten, können in jeder beliebigen Reihenfolge umgesetzt werden und auf spezifische Konfigurationen und institutionelle Parameter angepasst werden. Die Workflows werden über den GitHub und über die iDigBio Dokumentationsseiten öffentlich zum Download zugänglich gemacht. Eine Überprüfung der Plattformen für die Publikation elektronischer Daten durch Online-Aggregatoren, ein entscheidender Schritt bei jedem Digitalisierungsprogramm, wird ebenso zur Verfügung gestellt.

Schlüsselwörter: Workflow; Digitalisierung; Bildgebung; Datenveröffentlichung; Paläontologie; Sammlungen

Translator: Eva Gebauer


566 arab

Translator: Ashraf M.T. Elewa



karimTalia S. Karim. University of Colorado Museum of Natural History, Boulder, Colorado 80503, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Talia Karim is the Collection Manager for Invertebrate Paleontology and Paleobotany at the University of Colorado Museum of Natural History. She is also an instructor for the Museum and Field Studies graduate program at the University of Colorado. Talia received a B.S. in Geology and a B.A. in Classical Culture from the University of Oklahoma in 2001. She attended Oxford University on a Marshall Scholarship and earned an MSc in Earth Sciences in 2004. She went on to complete a PhD at the University of Iowa in 2009. Talia's main area of research is trilobite systematics and biostratigraphy, focused mainly on the Ordovician of Laurentia. She is also interested in museum collections care and management, digitization of collections, and cyber infrastructure as related to sharing museum data.


burkhalterRoger Burkhalter. Sam Noble Museum, University of Oklahoma, 2401 Chautauqua Avenue, Norman, Oklahoma 73072, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Roger Burkhalter is Collections Manager in the Department of Invertebrate Paleontology at the Sam Noble Museum. He attended the University of Oklahoma before receiving a B.S. in Geology at the University of Maryland and graduate coursework at Stanford. He has been identifying and digitizing specimens in the collections for the last eighteen years. His main interests are museum database management, digital imaging, museum collections care and management, and the digitization of collections.


farrellÚna C. Farrell. Department of Geological Sciences, Stanford University, 450 Serra Mall, Stanford, California 94305, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Una Farrell is a lab and data manager for the Historical Geobiology research group at Stanford University. Prior to that she was the Invertebrate Paleontology Collection Manager at the University of Kansas Biodiversity Institute. She received a B.A. in Geology from Trinity College Dublin and a PhD from Yale University, where she worked on the exceptional preservation of fossils in pyrite. Currently, her primary interests are in database design and management, digitization of paleontological and geological samples, and data accessibility.


molineuxAnn Molineux. Non-vertebrate Paleontology Lab, University of Texas, 10100 Burnet Road, Austin, Texas 78758, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Ann Molineux is the Curator and Collections manager of the Non-vertebrate Paleontology Collections (NPL) at the Jackson School of Geosciences, The University of Texas at Austin (UT). She gained her BA and MA degrees at Cambridge University, her PhD in Geology, specialty Paleontology, at UT, with major focus on the paleoenvironment and paleoecology in the later Paleozoic. Current research projects are linked to specimens within the collections; these projects are concerned with reef forming organisms such as corals, sponges and rudist bivalves. Her recent focus has been the development of digital access to large collections and management of a team focused on the upgrading of the NPL collections. These improvements, while aimed at long term archival conservation of specimens and related data, are expanding research and public access to these resources in a local and shared environment. She is active in various NSF sponsored digital programs and several scholarly organizations.


nelsonGil Nelson. Department of Biological Sciences, Florida State University, Tallahassee, Florida 32303, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Gil Nelson is Assistant Professor in Research and Courtesy Professor in Biology at Florida State University. He also holds a Beadel Fellowship in Botany at Tall Timbers Research Station and Land Conservancy. Gil specializes in research and implementation of biodiversity specimen digitization and data mobilization for Integrated Digitized Biocollections (iDigBio), the U.S. National Science Foundation's national coordinating center for the Advancing Digitization of Biodiversity Collections program. His botanical research emphases include biogeography of pteridophytes in the southeastern United States and systematics of temperate and tropical trees native to eastern North America.


utrup2Jessica Utrup. Yale University, Peabody Museum of Natural History, Division of Invertebrate Paleontology, 170 Whitney Avenue, PO Box 208118, New Haven, Connecticut 06520, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Jessica Utrup is a museum assistant in the Division of Invertebrate Paleontology at the Yale Peabody Museum of Natural History. She has been identifying and digitizing specimens in this division for the last ten years. She completed coursework towards a MSc in Paleontology at the University of Cincinnati and holds an MLIS from the University of Kentucky.


buttsSusan H. Butts. Yale University, Peabody Museum of Natural History, Division of Invertebrate Paleontology, 170 Whitney Avenue, PO Box 208118, New Haven, Connecticut 06520, USA. This email address is being protected from spambots. You need JavaScript enabled to view it.

Susan Butts is the Senior Collections Manager for Invertebrate Paleontology at the Yale Peabody Museum and the Executive Editor for the Bulletin of the Peabody Museum.
She holds a B.A. in Geoscience from Hobart and William Smith Colleges, a Ph.D. in Geological Science, and was a Postdoctoral Fellow at Yale University (Geology and Geophysics and Yale Peabody Museum). Her scientific research focuses on taphonomy and taphonomic bias and using brachiopod paleoecology to interpret climate change. She is involved with national digitization efforts supported by the NSF ADBC program, supporting cyber infrastructure and exchange in the geological and paleontological community, and increasing public awareness of museum collections.



TABLE 1. Summary of iDigBio Paleontology Digitization Working Group workshops and symposia contributing to workflow development.


TABLE 2. Example of modular workflow format from Module 0: Pre-Digitization Curation and Setup.



FIGURE 1. List of workflow modules developed by the iDigBio DROID and Paleontology working groups.


FIGURE 2. Example implementation of the workflow modules described herein.