AISTI Emerging Research Summit 2007 Questions

AISTI EMERGING RESEARCH SUMMIT
Santa Fe, NM - May 16-17 2007

QUESTION:
In the area of creating and managing data in your organization or for the greater community,
what keeps you up at night and what excites you?

Margaret AlexanderMargaret Alexander - View Bio

What worries me is that libraries will become marginalized, simply irrelevant to most college students. Older scholars use our library a lot, but will younger ones? Are libraries graying?

Sometimes it feels as though libraries are surrounded by systems people who think they know how to organize information, jumping in and coming up with products of marginal value. I worry the mediocre (and worse) search engines we find on many web sites will discourage users from the carefully crafted tools libraries can provide.

Just today an email arrived from Canada, inviting all Canadian research libraries to join in discussing their collective digital future. It's hard to imagine US research libraries doing the same. In the paper era, CISTI provided (and provides) an effective collaborative. Will they do the same electronically? Can we emulate their cohesiveness?

What excites me is the vision that libraries are the repositories of knowledge in all of its print and electronic forms. Free from propaganda, advertising, profit, and ulterior motives, libraries are research centers of quality, reliability, and comprehensiveness.


Miriam BlakeMiriam Blake - View Bio

The vision to re-create the mission of the digital library - to understand how information professionals can be immersed far enough in disciplines to embed ourselves in the appropriate, relevant places in the lifecycle of information - is a huge cultural shift for libraries. We have to remake ourselves AND remake how others understand us (the relevance thing...). Issues of scale, of metadata, of preservation, of interoperability, are somewhat "known" unknowns, but how the library itself emerges on another side of the information services paradigm and continues as a valuable part of the scholarly communication chain is really uncharted. What kind of libraries/preservation centers will we need (and how many)? How will the business models work? New collaborative models will emerge as we deal with the fundemental need to manage exponential growth of information. As a Research Library at a National Laboratory, there is huge potential to look at the new paradigm of supporting science through supporting scientists and their needs as a "national center" and this is very exciting.


Larry CarverLarry Carver - View Bio

"Imagination is more important than knowledge" - Albert Einstein

Einstein suggested a requirement for moving forward in the new information age. From this assertion, knowledge is achieved that helps plan a path for building digital libraries: plot a path, leverage what we've learned and be open to new ideas, sensitive to cultural directions and mobile enough for quick changes.

Even though we have been on this course (analog to digital, deskbound to instant gratification) for more than 10 years, we are still struggling to find a new model for long-term information collection and dissemination. One lesson we have learned is that we must work together. Given the magnitude and variety of information being generated daily, no single overall solution will be possible that is based on the old stovepipe model. As with any new structure, an architecture followed closely by a flexible and imaginative infrastructure must come first.

"It is now possible to capture, curate and archive the evolving research continuum that includes distributed joint research projects and resulting datasets...." Long-term preservation and access best practices of digital material are not well understood. Many believe that federated content sharing would be ideal but no infrastructure is in place to connect heterogeneous collections. Our current state of understanding also lacks guidelines for building useful collections for the long-term. "..Facilitate distributed use and re-use of information generated"... Rights management is proving to be a major obstacle: section 108 of the Library of Congress is working on proposals that include educational sharing of digital content but few systems are designed to implement responsible management. "We also can facilitate e-collaboration, capture interaction patterns, mine relationships between ideas and people, and finally add value through, e.g. social software." Few processes are available to facilitate transparent and easy ingest of non-text content even though the vast amount of digital data being generated is graphical and few tools exist for easy processing, documenting and use of this material. Developing the technologies, finding the resources, and identifying the practices to meet these goals require great imagination. With any imaginative effort comes risk - and risks must be taken to move forward.


Sayeed ChoudhurySayeed Choudhury - View Bio

"The same topic keeps me up at night and excites me: data curation. As science and engineering (and increasingly the social sciences and humanities) become increasingly engaged in data intensive and collaborative research, there is a challenge and opportunity for libraries. Scholars will need arching and data management services that ensure long-term access to their datasets in manner that supports citation and querying. As projects move from initial active phases, researchers become less interested in managing the datasets from specific projects. During this transition time, datasets could be moved into libraries, which would act as long-term stewards and curators for these data. Over the long-term, it is entirely probable that datasets will become "dormant" but eventually interest may rise again. During this stage of renewed interest, data curation practices will be tested to the limit.

On a more immediate note, I wonder about how many libraries might be prepared, or even able to prepare, for the challenges that lie ahead in the new few years (much less decades from now). The sheer storage alone seems overwhelming. When one considers the additional layers of technology and expertise necessary for robust data curation activities, it seems rather daunting. It's worthwhile to think about whether it will be easier to make technological--or human--adjustments in this regard. Data curation represents one of the most exciting and important opportunities for libraries as they seek to define their role in the digital age."


Paul CourantPaul Courant - View Bio

My biggest worry is that for any of a number of reasons, ranging from legal interpretations, to an emerging culture of ownership that throws sand in the gears of collaboration, to the simple fact that most working scientists in most contexts don't put a high priority on the reliable retrieval and reuse of data, we will never get our act enough together so that folks like me can actually lose sleep around the question of whether we are doing the job well. What excites me are the possibilities implicit and explicit in the premise to this question. With enough interest and the solution to the problems above, we can imagine building an infrastructure that makes it possible for distributed and organized groups of people to solve problems that used to be too hard to solve, and to engage colleagues who used to be too hard to find. And we can get much better at teaching both novices and experts, in part because participation is very good for learning.


Brad EdenBrad Eden - View Bio

What keeps me up at night is how to help move my people (i.e., the human element) through this time of intense change. What excites me is the same thing, but also the challenge of finding solutions to the managing the data mentioned below.


James FrewJames Frew - View Bio

Much of my research involves information provenance (aka lineage, pedigree, etc.): how it's captured, stored, queried, and communicated. A lot of people are working on this problem but I don't think we yet have a general solution, by which I mean an unambiguous way to determine, for any arbitrary piece of online information, who created it, how it was derived, and what's been done to it. In the short run, lack of universal provenance means that information reliability is determined largely anecdotally (credibility of the source, community endorsement, etc. -- the "well known to those that know it well" phenomenon), by mechanisms familiar (indeed, fundamental) to traditional libraries, but not scalable to a digital universe. In the long run, a general, evolvable solution for information provenance is critical to both the management and use of long-term archives, wherein stewardship decisions are made on behalf of providers long dead and users yet unborn.

What excites me is that I am encouraged that at least some research libraries are reinventing themselves as digital information stewards, to accommodate the primarily digital research products of the 21st century. Libraries at major research universities are particularly well positioned both to take advantage of cutting-edge research in information management and curation, and to add their unique institutional perspectives (preservation, open access, user service.) I believe the most interesting, and ultimately useful, approaches to digital stewardship will arise from these university-level collaborations between researchers and libraries.


Chuck HenryChuck Henry - View Bio


James L. Hilton, Ph.D.James L. Hilton, Ph.D. - View Bio

Two related issues keep me awake. Whether it is fear or excitement at any given moment is hard to tell. First, one of the grand and obvious challenges is how to create the workflow and infrastructure to aggregate, distribute, and archive objects (data) in a born digital and data intensive world. With analog works, we have had hundreds of years to develop a workflow that is effective in moving an idea/discovery/dataset all the way from its inception in the mind of an individual to its final resting place in a publicly accessible archive. We know how to do that in common and predictable ways. In the networked world, where ideas are as likely collaborative as individual, and where the data are born digital, we have no agreed upon system to move things from inception to archive. We desperately need a workflow in which things transparently end up in publicly accessible archives that can be sustained for the long haul. Second, I worry about the emerging "intellectual property" climate. Driven largely by the concerns of the entertainment industry, our legal and cultural frameworks increasingly treat ideas, data, and expressions as pure property with monopolistic rights accruing to the owner. What happens to scholarship and innovation as scholars begin to assume that their first priority is
protecting their intellectual property?


Greg JaneeGreg Janee - View Bio

You've probably heard of the adage "good, fast, cheap, pick any two" as applied to engineering and manufacturing. I believe we are in a similar situation with respect to long-term preservation of digital information. We would like to preserve information for a long period of time, 100 years or more; we would like to preserve lots of information, perhaps most of it; and we would like to do so cheaply. The adage for digital preservation might be stated: "longevity, scale, economy, pick any two."

Why is this the case? First, the size of the preservation problem is frighteningly large in every measurement dimension, whether counting bytes, providers, or information types. Second, if computer technology continues to evolve as it has in the past (and there is no reason to think that it won't), then digital information will continue to effectively degrade over time. But the effort required to preserve information does not grow linearly with the degradation: conversions upon conversions and emulations upon emulations encounter exponentially more problems and require exponentially more effort. Third, "preservation" rarely refers to just the avoidance of bit lossage; the term is inevitably encumbered with other desired features such as discoverability in contemporary search systems and direct usability by applications du jour. That is, "preservation" usually means that information is kept as usable in the future as it is today, an increasingly difficult goal to maintain over time. And fourth, there are no funding structures in place to pay for information upkeep. A tax on current information providers seems unlikely to be broadly applied, and even then, such a tax doesn't cover the cost of keeping older information held in archives up-to-date.

So, if "longevity, scale, economy, pick any two" does indeed describe the problem of digital preservation... which two?


Ron LarsenRon Larsen - View Bio

The character of the transformation in scholarly communication that has been anticipated for nearly fifteen years is finally beginning to take shape. The advent of the Web, projects like arXiv, the National Virtual Observatory, and the initiation of institutional repositories all signaled an impending change in scholarly research and dissemination, but the full character of that change is only beginning to be apparent. The arXiv e-print project provided early evidence of the transformative potential of timeliness to scholarly communication among physicists, suggesting a broad potential for change in other disciplines, as well. NVO demonstrated the potential for data-driven research across vast data resources accumulated by multiple observatories. These powerful examples of new forms of scholarly communication and scientific discovery suggest the emergence of "cyberscholarship" built upon the growing cyberinfrastructure, in which content becomes as vital a component of the infrastructure as are the computing and network facilities. The NSF/JISC repositories workshop (http:/ www.sis.pitt.edu/~repwkshop/ ) held in Phoenix in April proposed an international goal of ensuring that all (publicly-funded) research products and primary resources be readily available, accessible, and usable via common infrastructure and tools through space, time, and across disciplines, stages of research, and modes of human expression. Through an early series of pilot experiments followed by a more directed implementation phase, a new era of cyberscholarship could be achievable as early as 2015, but a coordinated effort comparable to that mounted in the 1990s for the Federal High Performance Computing and Communications Program will be required.


David LewisDavid Lewis - View Bio

What excites me about the potential for library engagement in the management of the digital assets created by the research and teaching activities of my campus is that it appears doable and is an important future role for the library. It seems to me that much of what is required is understood. Libraries know about metadata and strategies exist for keeping bits for decades (though we still have to figure out how to keep them for centuries). Systems with the required capacities, for example DSpace, exist and are not overly complex to manage. The strategies that are available appear to be scalable and data storage technologies appear to be keeping up with the increases in digital data. Faculty and campus administrators appear to recognize the significance of the issue, even though to date there has not been much decisive action. It seems to me that those libraries that move assertively will be able to develop both the human and technical capacity to stay ahead of campus demand. It also seems likely that the growth in open access publishing will allow libraries to reallocate some resources from the purchasing of materials to the curation of digital assets. Because of this it may be possible to support the curation function without the influx of significant new monies.

What keeps me up at night is the concern that my rather optimistic view is simply naive and that the problems are much more complex and intractable than I imagine and that my library and libraries in general have no where near the skills or resources to manage the problem.


Rick LuceRick Luce - View Bio

1. In the area of creating and managing data in your organization or for the greater community, what keeps you up at night and what excites you?"

Keeping me awake: scale and taking on digital data curation; the need to organizationally move faster; the knowledge and understanding gap between digital library staff and the traditional library staff; where's the next generation of library leadership?

Exciting: a great time to pioneer digital scholarship; building collaborations around the campus and beyond; morphing the library into a 21C knowledge center.


Carol MandelCarol Mandel - View Bio

What keeps me up at night is the challenge of assembling and organizing the human resource, the expertise, to: partner with researchers in a variety of data-dependent disciplines, bring a substantial added value to their work, and do so in a way that ensures the values of community, shared access and secure longevity in the management of their data. That human resource is not currently available; we have to create it, and we don't have much time.

What excites me is the opportunity to (and, in the insomnia department, the necessity of) creating entirely new instantiations of "library." It was exciting and interesting to shape what we described as the transition from print to electronic information at the turn of the century, but what we are doing now is now much harder and requires deeper thought. The external infrastructures and expectations have now changed, not just the forms of documentary content. A few years ago, we were the same organizations managing information in more and new formats. Now our very nature and our infrastructure are up for grabs. What remains the same is our mission and values, but we have to start over and figure out how to deliver them. Meeting the challenge of managing research data is a key part of this bigger picture.


Tom MoritzTom Moritz - View Bio

Many things:

a) In the arts and humanities, I am greatly concerned with the establishment of "empirically responsible philosophy" - testing conventional approaches to the humanities by the standards of science - challenging the humanities in particular to produce impacts comparable to atomic/genomic/cognitive/silicon science... I believe this will involve a deeper testing of the potential contributions of logic and rhetoric but only in combination with a strong grasp of cognitive neuroscience... Operationally, it will also mean that the humanities must rise to the challenges posed by radical transformations of collaborative practices in the sciences

At GRI we are just beginning to implement truly collaborative environments...

b) "Ambient knowledge capture" - seeking solutions to the problem of on-demand , spontaneous "knowledge capture" in any context related to the scholarly work of the GRI - this obviously has implications for infrastructure, programming and cultural adoption. The concept is that the entire GRI should be understood as a rich knowledge environment. Has implications for development of the concepts around scholarly discourse both as synchronous and asynchronous dialog and involves a thorough analysis and taxonomy of the knowledge periphery - including marginalia, graffiti, conventions around authorship/ citation/attributions, etc.

c) Bringing archives into the present (related to #1) transforming our vision of archives from mausoleums for documents into vibrant, contemporaneous partners in knowledge capture - one interesting effort we have undertaken is the video taping of California artict, George Hermes as he works through his own archives (a meta-archive project) - we now have about 80 hours of digital video...

d) In the conservation realm we have been making preliminary steps at mining and recombining the collective knowledge of protected areas managers - virtually all protected areas (there are ca. 117,00 in the World Database on Protected Areas) have Protected Areas Management Plans - and it is possible to extract from those plans both common denominator identification of key problems in PA management as well as emergent problems and unique approaches... We have made a few preliminary steps in that direction as a part of the "PALNet" (PA Learning Network) project of the WCPA (funded by the GEF - Global Environmental Facility)

e) visualization tools: I am frustrated by the apparent lack of adequate tools sets to depict complex/ multi-dimensional, multi-variate relations - this has become particularly clear to me in working with ontologies and complex ontological relationships...


Susan NutterSusan Nutter - View Bio


Karin WittenborgKarin Wittenborg - View Bio

What excites Karin? "The vast potential of these library activities to transform scholarship and learning."


Johann van ReenenJohann van Reenen - View Bio

I worry about:

  • losing the opportunity to capture the March of Science (capturing, archiving and curating the evolving research continuum) before it is done commercially outside universities
  • Creating the urgency needed for my own university to start evaluating its investment in data and information creation and prepare to *own* it

I am exited about:

  • Evolving opportunities to handle large data sets and the interest shown in this by funding organizations
  • The opportunities to form partnerships to deal with all the above
  • The evolving virtual *communities of scientists* and of social software to use in adding value to joint work and to shared information

Terry YatesTerry Yates - View Bio