IDEC Poster at eScience 2008
Johann van Reenen1,
Diana E. Northup1, Linn Marks Collins2, Mark L.B. Martinez2, M. Alex Baker1, Christy Crowley1, James E. Powell2, Brian Freels-Stendel1
1University of New Mexico, Albuquerque, NM 87131 USA and
2Los Alamos National Laboratory, Los Alamos, NM 87545 USA
Contact: Johann van Reenen <jreenen@unm.edu>

Imagery Data Mining: The IDEC Experiment
Scientific fields of study such as nanotechnology, astrobiology, and cave and karst science contain large collections of images and their associated biological, physiochemical, and geological datasets. The vast majority of these images are never shared and examined by more than a handful of scientists. Being able to mine these kinds of data to make new linkages and discoveries becomes ever more important as multidisciplinary and interdisciplinary studies grow in importance. The goal of our project is to design and implement an online workspace to allow scientists to collaboratively view, analyze, and annotate visual datasets and to train future scientists in the power of Internet collaborative workspaces. Discussion will occur around content contained in geomicrobiological images to explore unresolved scientific questions. We are creating a series of electronic scenarios that address unresolved questions in geomicrobiology, such as the question of which materials are biologically created and which are abiological in origin. These questions have important implications for the detection of life on other planets (astrobiology) and in our subterranean worlds of caves that are a key feature of karst terrains. Our initial albums focus on the geomicrobiology of caves and karst. The target users are the interdisciplinary community of scientists who study karst samples to learn more about critical biological and geological processes and the microbial communities often found in karst terrain.
Karst is defined as “a type of topography that is formed over limestone, dolomite or gypsum by solution of the rock and is characterized by closed depressions or sinkholes, caves and underground drainage.” (www.epa.gov/region5/water/uic/glossary.htm). The drinking water supply of about 1.6 billion people globally depends upon the health of karst terrains and aquifers (Ford and Williams, 1989; Smith, 1993; Williams, 1993). Humans are increasingly moving into formerly unoccupied or lightly occupied karst lands, resulting in intensified impacts on karst systems that can lead to a variety of impacts, including sinkhole collapses that swallow homes. Dealing with such geologic hazards in karst costs billions of dollars each year. Additionally, caves and karst host remarkable, but poorly studied, biodiversity and contains endemic, rare, and endangered species (Christman and Culver, 2001; Culver et al., 2000, 2001; Northup et al., 2003). Spectacular cave formations, other geological features, and significant archaeological and paleontological resources contribute to scientific, aesthetic, cultural, and economic value (LaMoreaux, 2005). Karst is the least understood and most vulnerable type of terrestrial landscape (Veni et al., 2001; Williams, 1993). A significant challenge to effective scientific endeavors is that karst scientists are geographically dispersed across the globe and are poorly integrated. Effective linkages among these scientists around their data can promote innovation, creative solutions to growing problems in karst terrains and multidisciplinary knowledge discovery. Astrobiology, a much better connected science, still faces significant challenges in knowledge discovery from imagery. Sherry Cady (Cady 2003) states:
“At the same time, millions of images and spectra from dubiofossils (of unknown origin) and pseudofossils (abiotic mimics) will also continue to accumulate, yet they will rarely appear in publication….Clearly there is a need for a common database of bona fide biosignatures, dubio-biosignatures, and pseudobiosignatures.”
Our pilot project of such a collaborative workspace is IDEC: Imagery Data Extraction Collaborative, created by our group, that consists of an integration of three open-source tools: Drupal, Gallery, and DSpace. Drupal is an open-source content management system that has been rated best-in-class by the IBM Internet Technologies group. Gallery is an open-source image management system that can be integrated directly into Drupal. DSpace is an open-source digital object repository platform (http://www.dspace.org/) that houses the repository of available Scanning Electron Microscopy images. A commenting function has been implemented in DSpace to allow viewers to provide their insights from viewing the images. All three tools are relatively easy to install and configure, and are widely used globally. We are using this configuration as a base for developing our broader collaborative workspaces for knowledge discovery with weblogs, forums, feeds, and image functionality enabled. Our current prototype has been explored by a group of students in the Cave and Karst Program at New Mexico Tech (Socorro, NM). These students found the site easy to use and useful.
To help structure our collaborative data mining experiment, we created scenarios of how karst scientists might use the images and associated datasets. Students and scientists provided input to refine these scenarios to be congruent with their view of future collaborative e-Science.
Scenario 1: To answer the questions:
Are the reticulated filaments found just in cave pool precipitates or also in other cave settings? Are they found just in limestone caves or in other types of caves. Are all reticulated filaments the same or are there variations? Have other groups found these morphologies?
To pursue these questions a scientist would want to:
- Pull up a variety of images with this morphotype.
- Be able to measure the width and length, or find data on this.
- Find data on cave setting and type of cave for each occurrence.
- Determine whether the reticulated filaments occurred on the outside or inside.
- Find information about the mineralogical setting in which filaments occurred.
- Create a dataset of these measurements and associated mineralogical data.
- Run statistical tests on the dataset.
- Access contact information of scientists who have entered images.
These scenarios take the collaborative workspace beyond an environment in which to discuss images to an environment in which scientists can ask higher-level questions. We have used these scenarios to design our first prototype and will use them to design more sophisticated tools in the future.
Cady, SL, Farmer JD, Grotzinger
JP, Schopf JW. 2003. Astrobiology 3: 351-368.
Culver DC, Deharveng L,
Gibert J, Sasowsky ID. (eds). 2001. Karst Waters Institute Special Publication
6, Charles Town, WV, 82 p.
Culver DC, Master LL, Christman
MC, Hobbs HH III. 2000. Conservation Biology 14:386-401.
Ford DC, Williams PW. 1989.
Karst Geomorphology and Hydrology Unwin Hyman, London, 601 p.
LaMoreaux P. 2005. Foreword,
in Culver, D.C., and White, W.B., Encyclopedia of Caves: Elsevier Academic
Press, Amsterdam, p. xvii-xviii.
Northup DE, et al.
2003. Environmental Microbiology 5: 1071-1086.
Smith DI. 1993. IN:
Williams, P.W. (ed.). 1993. Karst Terrains: Environmental Change and
Human Impact: Catena Verlag, Cremlingen-Destedt, Germany, 268 p.
Veni G, et al. 2001. Living
with Karst: A Fragile Foundation. AGI Environmental Awareness Series
4. American Geological Institute, Alexandria (VA), 64 p.
Williams PW (ed.) 1993.
Karst Terrains: Environmental Change and Human Impact: Catena Verlag,
Cremlingen-Destedt, Germany, 268 p
