The Collections Interoperability (CI) group of Project Bamboo is working to define and build shared infrastructure that will support collections interoperability and enable scholars to locate and work with collections and content dispersed across multiple repositories. During this initial phase of Project Bamboo, the CI group is leveraging and extending existing and emerging community standards to provide seamless access from Project Bamboo Work Spaces and Services applications to a distinguished set of digital libraries and repositories of particular interest to scholars, including:
- the inter-university HathiTrust repository (more than 2.5 million digitized texts in the public domain);
- selected content from the Perseus Digital Library;
- selected subsets of the 400-years of English texts transcribed by the Text Creation Partnership (TCP);
- selected content from AustLit, the online Australian Literature Resource.
Collaborating with other Bamboo working groups and external initiatives and experts, the CI team is identifying the most appropriate implementation standards for both content access and structural interoperability. For example, in order to facilitate easy, cross-collection use of the “target” digital libraries listed above within the Bamboo environment, the CI group is building an initial suite of adapters and connectors conforming to the OASIS Content Management Interoperability Services (CMIS) standard. These adapters and connectors then allow Bamboo tools to interact with a diverse array of disparate content using the same standardized protocol, avoiding the need to customize each tool to each collection. Northwestern University and the University of Illinois, Urbana-Champaign are leading this work area. Contributing institutions include Australia National University; Indiana University; Tufts University; University of Chicago; and University of Wisconsin, Madison.
Already during the first nine months of Project Bamboo, the CI Hub Team based at Northwestern has developed and implemented a modified version of the Apache OpenCMIS server that supports an extensible plug-in model for development of connectors from Bamboo target collections and repositories. The team has developed a connector that accesses HathiTrust content through their published Data API (Applications Publishing Interface), and has also finished the development of a generic Fedora repository connector that provides the core code base for specific CMIS adapters. The generic Fedora connector is being made specific this summer for Fedora-based collections of Perseus Digital Library content (completed July 15) and the ECCO subset of Text Creation Partnership content (anticipated early September 2011). The CMIS connector for TCP-ECCO will be implemented initially at University of Illinois, Urbana-Champaign, and will provide access to sub-component objects (e.g., page images, page text, multiple TEI transcripts) served from separate repositories (at Illinois and at Michigan), including TEI created at Nebraska from TCP SGML and morphadorned TEI created at Northwestern. Design discussions and preliminary implementation work is also underway for a CMIS connector to a subset of AustLit content (anticipated October 2011).
CI Librarians and technical staff at Northwestern, Illinois, Indiana, Australia National University and Wisconsin have collaborated to define standard structural and descriptive metadata models for digitized text volumes and collections of digitized texts such as those found in the repositories and digital libraries mentioned above. These models facilitate presentation, navigation and retrieval of text collections and content by Bamboo client applications such as those being developed by Work Spaces. The metadata models being developed are also designed to support basic discovery and the identification and differentiation of digitized content and collection-level services with a fidelity sufficient to support high-level literary and linguistic scholarship.
The Bamboo CI group is also working to support the broad range of different ways that scholars want to access digitized collections and content. As a harvesting and collecting aid of Bamboo content for Bamboo users, the CI team has developed a rudimentary Zotero RDF parser that will process Zotero bookmark files uploaded to the CI hub (via a CMIS client). The parser software will find links to resources enumerated in the Zotero bookmark file to known repositories and content locator objects in order to directly connect users and software tools to the specified full-text content.
In collaboration with librarians and scholars at Indiana University, University of Chicago, Penn State University, Michigan State University, University of Iowa, and University of Nebraska, the CI team has prepared a survey to obtain more information about scholarly collection and content priorities of potential interest for subsequent phases of Bamboo and as a possible target of CI investigation toward the end of Phase I (Fall 2011).
As always, please visit the Collections Interoperability area of the wiki for detailed activities.