There are a large number of metadata standards and initiatives that have relevance to digital preservation, e.g. those designed to support the work of national and research libraries, archives and digitization initiatives. This paper introduces some of these, noting that the developers of some have acknowledged the importance of maintaining or re-using existing metadata. It is argued here that the implementation of metadata registries as part of a digital preservation system may assist repositories in enabling the management and re-use of this metadata and may also help interoperability, namely the exchange of metadata and information packages between repositories.
Publisher
2003 Dublin Core Conference: Supporting Communities of Discourse and Practice-Metadata Research & Applications
Publication Location
Seatle, WA
Critical Arguements
CA "This paper will introduce a range of preservation metadata initiatives including the influential Open Archival Information System (OAIS) reference model and a number of other initiatives originating from national and research libraries, digitization projects and the archives community. It will then comment on the need for interoperability between these specifications and propose that the implementation of metadata registries as part of a digital preservation system may help repositories manage diverse metadata and facilitate the exchange of metadata or information packages between repositories."
Conclusions
RQ "The plethora of metadata standards and formats that have been developed to support the management and preservation of digital objects leaves us with several questions about interoperability. For example, will repositories be able to cope with the wide range of standards and formats that exist? Will they be able to transfer metadata or information packages containing metadata to other repositories? Will they be able to make use of the 'recombinant potential' of existing metadata?" ... "A great deal of work needs to be done before this registry-based approach can be proved to be useful. While it would undoubtedly be useful to have registries of the main metadata standards developed to support preservation, it is less clear how mapping-based conversions between them would work in practice. Metadata specifications are based on a range of different models and conversions often lead to data loss. Also, much more consideration needs to be given to the practical issues of implementation." 
SOW
DC Michael Day is a research officer at UKOLN, which is based at the University of Bath. He belongs to UKOLN's research and development team, and works primarily on projects concerning metadata, interoperability and digital preservation. 
Type
Conference Proceedings
Title
Practical experiences of the Digital Preservation Testbed
CA "The Digital Preservation Testbed is researching three different approaches to long-term digital preservation: migration, emulation and XML. Not only will the effectiveness of each approach be evaluated, but also their limits, costs and application potential. Experiments are taking place on text documents, spreadsheets, emails and databases of different size, complexity and nature."
Conclusions
RQ "New experiments expected in 2002 are the migration of spreadsheets, conversion of spreadsheets and databases into XML and a proof of concept with the UVC for text documents and spreadsheets. ... Eventually at the end of 2003 the Testbed project will provide: advice on how to deal with current digital records; recommendations for an appropriate preservation strategy or a combination ofstrategies; functional requirements for a preservation function; cost models of the various preservation strategies; a decision model for preservation strategy; recommendations concerning guidelines and regulations."
SOW
DC "The Digital Preservation Testbed is part of the non-profit organisation ICTU. ICTU isthe Dutch organisation for ICT and government. ICTU's goal is to contribute to the structural development of e-government. This will result in improving the work processes of government organisations, their service to the community and interaction with the citizens. ... In case of the Digital Preservation Testbed the principals are the Ministry of the Interior, Jan Lintsen and the Dutch National Archives, Maarten van Boven. Together with Public Key Infrastructure, Digital Longevity is the fundament of the ELO-house."
Type
Journal
Title
When Documents Deceive: Trust and Provenance as New Factors for Information Retrieval in a Tangled Web
Journal of the American Society for Information Science and Technology
Periodical Abbreviation
JASIST
Publication Year
2001
Volume
52
Issue
1
Pages
12
Publisher
John Wiley & Sons
Critical Arguements
"This brief and somewhat informal article outlines a personal view of the changing framework for information retrieval suggested by the Web environment, and then goes on to speculate about how some of these changes may manifest in upcoming generations of information retrieval systems. It also sketches some ideas about the broader context of trust management infrastructure that will be needed to support these developments, and it points towards a number of new research agendas that will be critical during this decade. The pursuit of these agendas is going to call for new collaborations between information scientists and a wide range of other disciplines." (p. 12) Discusses public key infrastructure (PKI) and Pretty Good Practice (PGP) systems as steps toward ensuring the trustworthiness of metadata online, but explains their limitations. Makes a distinction between the identify of providers of metadata and their behavior, arguing that it is the latter we need to be concerned with.
Phrases
<P1> Surrogates are assumed to be accurate because they are produced by trusted parties, who are the only parties allowed to contribute records to these databases. Documents (full documents or surrogate records) are viewed as passive; they do not actively deceive the IR system.... Compare this to the realities of the Web environment. Anyone can create any metadata they want about any object on the net, with any motivation. (p. 13) <P2> Sites interested in manipulating the results of the indexing process rapidly began to exploit the difference between the document as viewed by the user and the document as analyzed by the indexing crawler through a set of techniques broadly called "index spamming." <P3> Pagejacking might be defined generally as providing arbitrary documents with independent arbitrary index entries. Clearly, building information retrieval systems to cope with this environment is a huge problem. (p. 14) <P4> [T]he tools are coming into place that let one determine the source of a metadata assertion (or, more precisely and more generally) the identity of the person or organization that stands behind the assertion, and to establish a level of trust in this identity. (p. 16) <P5> It is essential to recognize that in the information retrieval context one is not concerned so much with identity as with behavior. ... This distinction is often overlooked or misunderstood in discussions about what problems PKI is likely to solve: identity alone does not necessarily solve the problem of whether to trust information provided by, or warranted by, that identity. ... And all of the technology for propagating trust, either in hierarchical (PKI) or web-of-trust identity management, is purely about trust in identity. (p. 16) <P6> The question of formalizing and recording expectations about behavior, or trust in behavior, are extraordinarily complex, and as far as I know, very poorly explored. (p. 16) <P7> [A]n appeal to certification or rating services simply shifts the problem: how are these services going to track, evaluate, and rate behavior, or certify skills and behavior? (p. 16) <P8> An individual should be able to decide how he or she is willing to have identity established, and when to believe information created by or associated with such an identity. Further, each individual should be able to have this personal database evolve over time based on experience and changing beliefs. (p. 16) <P9> [T]he ability to scale and to respond to a dynamic environment in which new information sources are constantly emerging is also vital.<P10> In determining what data a user (or an indexing system, which may make global policy decisions) is going to consider in matching a set of search criteria, a way of defining the acceptable level of trust in the identity of the source of the data will be needed. (p. 16) <P10> Only if the data is supported by both sufficient trust in the identity of the source and the behavior of that identity will it be considered eligible for comparison to the search criteria. Alternatively, just as ranking of result sets provided a more flexible model of retrieval than just deciding whether documents or surrogates did or did not match a group of search criteria, one can imagine developing systems that integrate confidence in the data source (both identity and behavior, or perhaps only behavior, with trust in identity having some absolute minimum value) into ranking algorithms. (p. 17) <P11> As we integrate trust and provenance into the next generations of information retrieval systems we must recognize that system designers face a heavy burden of responsibility. ... New design goals will need to include making users aware of defaults; encouraging personalization; and helping users to understand the behavior of retrieval systems <warrant> (p. 18) <P12> Powerful paternalistic systems that simply set up trust-related parameters as part of the indexing process and thus automatically apply a fixed set of such parameters to each search submitted to the retrieval system will be a real danger. (p. 17)
Conclusions
RQ "These developments suggest a research agenda that addresses indexing countermeasures and counter-countermeasures; ways of anonymously or pseudononymously spot-checking the results of Web-crawling software, and of identifying, filtering out, and punishing attempts to manipulate the indexing process such as query-source-sensitive responses or deceptively structured pages that exploit the gap between presentation and content." (p. 14) "Obviously, there are numerous open research problems in designing such systems: how can the user express these confidence or trust constraints; how should the system integrate them into ranking techniques; how can efficient index structures and query evaluation algorithms be designed that integrate these factors. ... The integration of trust and provenance into information retrieval systems is clearly going to be necessary and, I believe, inevitable. If done properly, this will inform and empower users; if done incorrectly, it threatens to be a tremendously powerful engine of censorship and control over information access. (p. 17)
Type
Electronic Journal
Title
ARTISTE: An integrated Art Analysis and Navigation Environment
This article focuses on the description of the objectives of the ARTISTE project (for "An integrated Art Analysis and Navigation environment") that aims at building a tool for the intelligent retrieval and indexing of high resolution images. The ARTISTE project will address professional users in the fine arts as the primary end-user base. These users provide services for the ultimate end-user, the citizen.
Critical Arguements
CA "European museums and galleries are rich in cultural treasures but public access has not reached its full potential. Digital multimedia can address these issues and expand the accessible collections. However, there is a lack of systems and techniques to support both professional and citizen access to these collections."
Phrases
<P1> New technology is now being developed that will transform that situation. A European consortium, partly funded by the EU under the fifth R&D framework, is working to produce a new management system for visual information. <P2> Four major European galleries (The Uffizi in Florence, The National Gallery and the Victoria and Albert Museum in London and the Louvre related restoration centre, Centre de Recherche et de Restauration des Mus├®es de France) are involved in the project. They will be joining forces with NCR, a leading player in database and Data Warehouse technology; Interactive Labs, the new media design and development facility of Italy's leading art publishing group, Giunti; IT Innovation, Web-based system developers; and the Department of Electronics and Computer Science at the University of Southampton. Together they will create web based applications and tools for the automatic indexing and retrieval of high-resolution art images by pictorial content and information. <P3> The areas of innovation in this project are as follows: Using image content analysis to automatically extract metadata based on iconography, painting style etc; Use of high quality images (with data from several spectral bands and shadow data) for image content analysis of art; Use of distributed metadata using RDF to build on existing standards; Content-based navigation for art documents separating links from content and applying links according to context at presentation time; Distributed linking and searching across multiple archives allowing ownership of data to be retained; Storage of art images using large (>1TeraByte) multimedia object relational databases. <P4> The ARTISTE approach will use the power of object-related databases and content-retrieval to enable indexing to be made dynamically, by non-experts. <P5> In other words ARTISTE would aim to give searchers tools which hint at links due to say colour or brush-stroke texture rather than saying "this is the automatically classified data". <P6> The ARTISTE project will build on and exploit the indexing scheme proposed by the AQUARELLE consortia. The ARTISTE project solution will have a core component that is compatible with existing standards such as Z39.50. The solution will make use of emerging technical standards XML, RDF and X-Link to extend existing library standards to a more dynamic and flexible metadata system. The ARTISTE project will actively track and make use of existing terminology resources such as the Getty "Art and Architecture Thesaurus" (AAT) and the "Union List of Artist Names" (ULAN). <P7> Metadata will also be stored in a database. This may be stored in the same object-relational database, or in a separate database, according to the incumbent systems at the user partners. <P8> RDF provides for metadata definition through the use of schemas. Schemas define the relevant metadata terms (the namespace) and the associated semantics. Individual RDF queries and statements may use multiple schemas. The system will make use of existing schemas such as the Dublin Core schema and will provide wrappers for existing resources such as the Art and Architecture thesaurus in a RDF schema wrapper. <P9> The Distributed Query and Metadata Layer will also provide facilities to enable queries to be directed towards multiple distributed databases. The end user will be able to seamlessly search the combined art collection. This layer will adhere to worldwide digital library standards such as Z39.50, augmenting and extending as necessary to allow the richness of metadata enabled by the RDF standard.
Conclusions
RQ "In conclusion the Artiste project will result into an interesting and innovative system for the art analysis, indexing storage and navigation. The actual state of the art of content-based retrieval systems will be positively influenced by the development of the Artiste project, which will pursue the following goals: A solution which can be replicated to European galleries, museums, etc.; Deep-content analysis software based on object relational database technology.; Distributed links server software, user interfaces, and content-based navigation software.; A fully integrated prototype analysis environment.; Recommendations for the exploitation of the project solution by European museums and galleries. ; Recommendations for the exploitation of the technology in other sectors.; "Impact on standards" report detailing augmentations of Z39.50 with RDF." ... ""Not much research has been carried out worldwide on new algorithms for style-matching in art. This is probably not a major aim in Artiste but could be a spin-off if the algorithms made for specific author search requirements happen to provide data which can be combined with other data to help classify styles." >
SOW
DC "Four major European galleries (The Uffizi in Florence, The National Gallery and the Victoria and Albert Museum in London and the Louvre related restoration centre, Centre de Recherche et de Restauration des Mus├®es de France) are involved in the project. They will be joining forces with NCR, a leading player in database and Data Warehouse technology; Interactive Labs, the new media design and development facility of Italy's leading art publishing group, Giunti; IT Innovation, Web-based system developers; and the Department of Electronics and Computer Science at the University of Southampton. Together they will create web based applications and tools for the automatic indexing and retrieval of high-resolution art images by pictorial content and information."
Type
Electronic Journal
Title
A Spectrum of Interoperability: The Site for Science Prototype for the NSDL
"Currently, NSF is funding 64 projects, each making its own contribution to the library, with a total annual budget of about $24 million. Many projects are building collections; others are developing services; a few are carrying out targeted research.The NSDL is a broad program to build a digital library for education in science, mathematics, engineering and technology. It is funded by the National Science Foundation (NSF) Division of Undergraduate Education. . . . The Core Integration task is to ensure that the NSDL is a single coherent library, not simply a set of unrelated activities. In summer 2000, the NSF funded six Core Integration demonstration projects, each lasting a year. One of these grants was to Cornell University and our demonstration is known as Site for Science. It is at http://www.siteforscience.org/ [Site for Science]. In late 2001, the NSF consolidated the Core Integration funding into a single grant for the production release of the NSDL. This grant was made to a collaboration of the University Corporation for Atmospheric Research (UCAR), Columbia University and Cornell University. The technical approach being followed is based heavily on our experience with Site for Science. Therefore this article is both a description of the strategy for interoperability that was developed for Site for Science and an introduction to the architecture being used by the NSDL production team."
ISBN
1082-9873
Critical Arguements
CA "[T]his article is both a description of the strategy for interoperability that was developed for the [Cornell University's NSF-funded] Site for Science and an introduction to the architecture being used by the NSDL production team."
Phrases
<P1> The grand vision is that the NSDL become a comprehensive library of every digital resource that could conceivably be of value to any aspect of education in any branch of science and engineering, both defined very broadly. <P2> Interoperability among heterogeneous collections is a central theme of the Core Integration. The potential collections have a wide variety of data types, metadata standards, protocols, authentication schemes, and business models. <P3> The goal of interoperability is to build coherent services for users, from components that are technically different and managed by different organizations. This requires agreements to cooperate at three levels: technical, content and organizational. <P4> Much of the research of the authors of this paper aims at . . . looking for approaches to interoperability that have low cost of adoption, yet provide substantial functionality. One of these approaches is the metadata harvesting protocol of the Open Archives Initiative (OAI) . . . <P5> For Site for Science, we identified three levels of digital library interoperability: Federation; Harvesting; Gathering. In this list, the top level provides the strongest form of interoperability, but places the greatest burden on participants. The bottom level requires essentially no effort by the participants, but provides a poorer level of interoperability. The Site for Science demonstration concentrated on the harvesting and gathering, because other projects were exploring federation. <P6> In an ideal world all the collections and services that the NSDL wishes to encompass would support an agreed set of standard metadata. The real world is less simple. . . . However, the NSDL does have influence. We can attempt to persuade collections to move along the interoperability curve. <warrant> <P7> The Site for Science metadata strategy is based on two principles. The first is that metadata is too expensive for the Core Integration team to create much of it. Hence, the NSDL has to rely on existing metadata or metadata that can be generated automatically. The second is to make use of as much of the metadata available from collections as possible, knowing that it varies greatly from none to extensive. Based on these principles, Site for Science, and subsequently the entire NSDL, developed the following metadata strategy: Support eight standard formats; Collect all existing metadata in these formats; Provide crosswalks to Dublin Core; Assemble all metadata in a central metadata repository; Expose all metadata records in the repository for service providers to harvest; Concentrate limited human effort on collection-level metadata; Use automatic generation to augment item-level metadata. <P8> The strategy developed by Site for Science and now adopted by the NSDL is to accumulate metadata in the native formats provided by the collections . . . If a collection supports the protocols of the Open Archives Initiative, it must be able to supply unqualified Dublin Core (which is required by the OAI) as well as the native metadata format. <P9> From a computing viewpoint, the metadata repository is the key component of the Site for Science system. The repository can be thought of as a modern variant of the traditional library union catalog, a catalog that holds comprehensive catalog records from a group of libraries. . . . Metadata from all the collections is stored in the repository and made available to providers of NSDL service.
Conclusions
RQ 1 "Can a small team of librarians manage the collection development and metadata strategies for a very large library?" RQ 2 "Can the NSDL actually build services that are significantly more useful than the general web search services?"
The Semantic Web activity is a W3C project whose goal is to enable a 'cooperative' Web where machines and humans can exchange electronic content that has clear-cut, unambiguous meaning. This vision is based on the automated sharing of metadata terms across Web applications. The declaration of schemas in metadata registries advance this vision by providing a common approach for the discovery, understanding, and exchange of semantics. However, many of the issues regarding registries are not clear, and ideas vary regarding their scope and purpose. Additionally, registry issues are often difficult to describe and comprehend without a working example.
ISBN
1082-9873
Critical Arguements
CA "This article will explore the role of metadata registries and will describe three prototypes, written by the Dublin Core Metadata Initiative. The article will outline how the prototypes are being used to demonstrate and evaluate application scope, functional requirements, and technology solutions for metadata registries."
Phrases
<P1> Establishing a common approach for the exchange and re-use of data across the Web would be a major step towards achieving the vision of the Semantic Web. <warrant> <P2> The Semantic Web Activity statement articulates this vision as: 'having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration, and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be shared and processed by automated tools as well as by people.' <P3> In parallel with the growth of content on the Web, there have been increases in the amount and variety of metadata to manipulate this content. An inordinate amount of standards-making activity focuses on metadata schemas (also referred to as vocabularies or data element sets), and yet significant differences in schemas remain. <P4> Different domains typically require differentiation in the complexity and semantics of the schemas they use. Indeed, individual implementations often specify local usage, thereby introducing local terms to metadata schemas specified by standards-making bodies. Such differentiation undermines interoperability between systems. <P5> This situation highlights a growing need for access by users to in-depth information about metadata schemas and particular extensions or variations to schemas. Currently, these 'users' are human  people requesting information. <warrant> <P6> It would be helpful to make available easy access to schemas already in use to provide both humans and software with comprehensive, accurate and authoritative information. <warrant> <P7> The W3C Resource Description Framework (RDF) has provided the basis for a common approach to declaring schemas in use. At present the RDF Schema (RDFS) specification offers the basis for a simple declaration of schema. <P8> Even as it stands, an increasing number of initiatives are using RDFS to 'publish' their schemas. <P9> Registries provide 'added value' to users by indexing schemas relevant to a particular 'domain' or 'community of use' and by simplifying the navigation of terms by enabling multiple schemas to be accessed from one view. <warrant> <P10> Additionally, the establishment of registries to index terms actively being used in local implementations facilitates the metadata standards activity by providing implementation experience transferable to the standards-making process. <warrant> <P11> The overriding goal has been the development of a generic registry tool useful for registry applications in general, not just useful for the DCMI. <P12> The formulation of a 'definitive' set of RDF schemas within the DCMI that can serve as the recommended, comprehensive and accurate expression of the DCMI vocabulary has hindered the development of the DCMI registry. To some extent, this has been due to the changing nature of the RDF Schema specification and its W3C candidate recommendation status. However, it should be recognized that the lack of consensus within the DCMI community regarding the RDF schemas has proven to be equally as impeding. <P13> The automated sharing of metadata across applications is an important part of realizing the goal of the Semantic Web. Users and applications need practical solutions for discovering and sharing semantics. Schema registries provide a viable means of achieving this. <warrant>
Conclusions
RQ "Many of the issues regarding metadata registries are unclear and ideas regarding their scope and purpose vary. Additionally, registry issues are often difficult to describe and comprehend without a working example. The DCMI makes use of rapid prototyping to help solve these problems. Prototyping is a process of quickly developing sample applications that can then be used to demonstrate and evaluate functionality and technology."
SOW
DC "New impetus for the development of registries has come with the development activities surrounding creation of the Semantic Web. The motivation for establishing registries arises from domain and standardization communities, and from the knowledge management community." ... "The original charter for the DCMI Registry Working Group was to establish a metadata registry to support the activity of the DCMI. The aim was to enable the registration, discovery, and navigation of semantics defined by the DCMI, in order to provide an authoritative source of information regarding the DCMI vocabulary. Emphasis was placed on promoting the use of the Dublin Core and supporting the management of change and evolution of the DCMI vocabulary." ... "Discussions within the DCMI Registry Working Group (held primarily on the group's mailing list) have produced draft documents regarding application scope and functionality. These discussions and draft documents have been the basis for the development of registry prototypes and continue to play a central role in the iterative process of prototyping and feedback." ... The overall goal of the DCMI Registry Working Group (WG) is to provide a focus for continued development of the DCMI Metadata Registry. The WG will provide a forum for discussing registry-related activities and facilitating cooperation with the ISO 11179 community, the Semantic Web, and other related initiatives on issues of common interest and relevance.
Type
Electronic Journal
Title
Collection-Based Persistent Digital Archives - Part 1
The preservation of digital information for long periods of time is becoming feasible through the integration of archival storage technology from supercomputer centers, data grid technology from the computer science community, information models from the digital library community, and preservation models from the archivistÔÇÖs community. The supercomputer centers provide the technology needed to store the immense amounts of digital data that are being created, while the digital library community provides the mechanisms to define the context needed to interpret the data. The coordination of these technologies with preservation and management policies defines the infrastructure for a collection-based persistent archive. This paper defines an approach for maintaining digital data for hundreds of years through development of an environment that supports migration of collections onto new software systems.
ISBN
1082-9873
Critical Arguements
CA "Supercomputer centers, digital libraries, and archival storage communities have common persistent archival storage requirements. Each of these communities is building software infrastructure to organize and store large collections of data. An emerging common requirement is the ability to maintain data collections for long periods of time. The challenge is to maintain the ability to discover, access, and display digital objects that are stored within an archive, while the technology used to manage the archive evolves. We have implemented an approach based upon the storage of the digital objects that comprise the collection, augmented with the meta-data attributes needed to dynamically recreate the data collection. This approach builds upon the technology needed to support extensible database schema, which in turn enables the creation of data handling systems that interconnect legacy storage systems."
Phrases
<P1> The ultimate goal is to preserve not only the bits associated with the original data, but also the context that permits the data to be interpreted. <warrant> <P2> We rely on the use of collections to define the context to associate with digital data. The context is defined through the creation of semi-structured representations for both the digital objects and the associated data collection. <P3>A collection-based persistent archive is therefore one in which the organization of the collection is archived simultaneously with the digital objects that comprise the collection. <P4> The goal is to preserve digital information for at least 400 years. This paper examines the technical issues that must be addressed and presents a prototype implementation. <P5>Digital object representation. Every digital object has attributes that define its structure, physical context, and provenance, and annotations that describe features of interest within the object. Since the set of attributes (such as annotations) will vary across all objects within a collection, a semi-structured representation is needed. Not all digital objects will have the same set of associated attributes. <P6> If possible, a common information model should be used to reference the attributes associated with the digital objects, the collection organization, and the presentation interface. An emerging standard for a uniform data exchange model is the eXtended Markup Language (XML). <P7> A particular example of an information model is the XML Document Type Definition (DTD) which provides a description for the allowed nesting structure of XML elements. Richer information models are emerging such as XSchema (which provides data types, inheritance, and more powerful linking mechanisms) and XMI (which provides models for multiple levels of data abstraction). <P8> Although XML DTDs were originally applied to documents only, they are now being applied to arbitrary digital objects, including the collections themselves. More generally, OSDs can be used to define the structure of digital objects, specify inheritance properties of digital objects, and define the collection organization and user interface structure. <P9> A persistent collection therefore needs the following components of an OSD to completely define the collection context: Data dictionary for collection semantics; Digital object structure; Collection structure; and User interface structure. <P10> The re-creation or instantiation of the data collection is done with a software program that uses the schema descriptions that define the digital object and collection structure to generate the collection. The goal is to build a generic program that works with any schema description. <P11> The information for which driver to use for access to a particular data set is maintained in the associated Meta-data Catalog (MCAT). The MCAT system is a database containing information about each data set that is stored in the data storage systems. <P12> The data handling infrastructure developed at SDSC has two components: the SDSC Storage Resource Broker (SRB) that provides federation and access to distributed and diverse storage resources in a heterogeneous computing environment, and the Meta-data Catalog (MCAT) that holds systemic and application or domain-dependent meta-data about the resources and data sets (and users) that are being brokered by the SRB. <P13> A client does not need to remember the physical mapping of a data set. It is stored as meta-data associated with the data set in the MCAT catalog. <P14> A characterization of a relational database requires a description of both the logical organization of attributes (the schema), and a description of the physical organization of attributes into tables. For the persistent archive prototype we used XML DTDs to describe the logical organization. <P15> A combination of the schema and physical organization can be used to define how queries can be decomposed across the multiple tables that are used to hold the meta-data attributes. <P16> By using an XML-based database, it is possible to avoid the need to map between semi-structured and relational organizations of the database attributes. This minimizes the amount of information needed to characterize a collection, and makes the re-creation of the database easier. <warrant> <P17> Digital object attributes are separated into two classes of information within the MCAT: System-level meta-data that provides operational information. These include information about resources (e.g., archival systems, database systems, etc., and their capabilities, protocols, etc.) and data objects (e.g., their formats or types, replication information, location, collection information, etc.); Application-dependent meta-data that provides information specific to particular data sets and their collections (e.g., Dublin Core values for text objects). <P18> Internally, MCAT keeps schema-level meta-data about all of the attributes that are defined. The schema-level attributes are used to define the context for a collection and enable the instantiation of the collection on new technology. <P19> The logical structure should not be confused with database schema and are more general than that. For example, we have implemented the Dublin Core database schema to organize attributes about digitized text. The attributes defined in the logical structure that is associated with the Dublin Core schema contains information about the subject, constraints, and presentation formats that are needed to display the schema along with information about its use and ownership. <P20> The MCAT system supports the publication of schemata associated with data collections, schema extension through the addition or deletion of new attributes, and the dynamic generation of the SQL that corresponds to joins across combinations of attributes. <P21> By adding routines to access the schema-level meta-data from an archive, it is possible to build a collection-based persistent archive. As technology evolves and the software infrastructure is replaced, the MCAT system can support the migration of the collection to the new technology.
Conclusions
RQ Collection-Based Persistent Digital Archives - Part 2
SOW
DC "The technology proposed by SDSC for implementing persistent archives builds upon interactions with many of these groups. Explicit interactions include collaborations with Federal planning groups, the Computational Grid, the digital library community, and individual federal agencies." ... "The data management technology has been developed through multiple federally sponsored projects, including the DARPA project F19628-95-C-0194 "Massive Data Analysis Systems," the DARPA/USPTO project F19628-96-C-0020 "Distributed Object Computation Testbed," the Data Intensive Computing thrust area of the NSF project ASC 96-19020 "National Partnership for Advanced Computational Infrastructure," the NASA Information Power Grid project, and the DOE ASCI/ASAP project "Data Visualization Corridor." Additional projects related to the NSF Digital Library Initiative Phase II and the California Digital Library at the University of California will also support the development of information management technology. This work was supported by a NARA extension to the DARPA/USPTO Distributed Object Computation Testbed, project F19628-96-C-0020."
Type
Electronic Journal
Title
Collection-Based Persistent Digital Archives - Part 2
"Collection-Based Persistent Digital Archives: Part 2" describes the creation of a one million message persistent E-mail collection. It discusses the four major components of a persistent archive system: support for ingestion, archival storage, information discovery, and presentation of the collection. The technology to support each of these processes is still rapidly evolving, and opportunities for further research are identified.
ISBN
1082-9873
Critical Arguements
CA "The multiple migration steps can be broadly classified into a definition phase and a loading phase. The definition phase is infrastructure independent, whereas the loading phase is geared towards materializing the processes needed for migrating the objects onto new technology. We illustrate these steps by providing a detailed description of the actual process used to ingest and load a million-record E-mail collection at the San Diego Supercomputer Center (SDSC). Note that the SDSC processes were written to use the available object-relational databases for organizing the meta-data. In the future, it may be possible to go directly to XML-based databases."
Phrases
<P1> The processes used to ingest a collection, transform it into an infrastructure independent form, and store the collection in an archive comprise the persistent storage steps of a persistent archive. The processes used to recreate the collection on new technology, optimize the database, and recreate the user interface comprise the retrieval steps of a persistent archive. <P2> In order to build a persistent collection, we consider a solution that "abstracts" all aspects of the data and its preservation. In this approach, data object and processes are codified by raising them above the machine/software dependent forms to an abstract format that can be used to recreate the object and the processes in any new desirable forms. <P3> The SDSC infrastructure uses object-relational databases to organize information. This makes data ingestion more complex by requiring the mapping of the XML DTD semi-structured representation onto a relational schema. <P4> The SDSC infrastructure uses object-relational databases to organize information. This makes data ingestion more complex by requiring the mapping of the XML DTD semi-structured representation onto a relational schema. <P5> The steps used to store the persistent archive were: (1) Define Digital Object: define meta-data, define object structure (OBJ-DTD) --- (A), define object DTD to object DDL mapping --- (B) (2) Define Collection: define meta-data, define collection structure (COLL-DTD) --- (C), define collection DTD structure to collection DDL mapping --- (D) (3) Define Containers: define packing format for encapsulating data and meta-data (examples are the AIP standard, Hierarchical Data Format, Document Type Definition) <P5> In the ingestion phase, the relational and semi-structured organization of the meta-data is defined. No database is actually created, only the mapping between the relational organization and the object DTD. <P6> Note that the collection relational organization does not have to encompass all of the attributes that are associated with a digital object. Separate information models are used to describe the objects and the collections. It is possible to take the same set of digital objects and form a new collection with a new relational organization. <P7> Multiple communities across academia, the federal government, and standards groups are exploring strategies for managing very large archives. The persistent archive community needs to maintain interactions with these communities to track development of new strategies for data management and storage. <warrant> <P8>
Conclusions
RQ "The four major components of the persistent archive system are support for ingestion, archival storage, information discovery, and presentation of the collection. The first two components focus on the ingestion of data into collections. The last two focus on access to the resulting collections. The technology to support each of these processes is still rapidly evolving. Hence consensus on standards has not been reached for many of the infrastructure components. At the same time, many of the components are active areas of research. To reach consensus on a feasible collection-based persistent archive, continued research and development is needed. Examples of the many related issues are listed below:
Type
Electronic Journal
Title
Metadata: The right approach, An integrated model for descriptive and rights metadata in E-commerce
If you've ever completed a large and difficult jigsaw puzzle, you'll be familiar with that particular moment of grateful revelation when you find that two sections you've been working on separately actually fit together. The overall picture becomes coherent, and the task at last seems achievable. Something like this seems to be happening in the puzzle of "content metadata." Two communities -- rights owners on one hand, libraries and cataloguers on the other -- are staring at their unfolding data models and systems, knowing that somehow together they make up a whole picture. This paper aims to show how and where they fit.
ISBN
1082-9873
Critical Arguements
CA "This paper looks at metadata developments from this standpoint -- hence the "right" approach -- but does so recognising that in the digital world many Chinese walls that appear to separate the bibliographic and commercial communities are going to collapse." ... "This paper examines three propositions which support the need for radical integration of metadata and rights management concerns for disparate and heterogeneous materials, and sets out a possible framework for an integrated approach. It draws on models developed in the CIS plan and the DOI Rights Metadata group, and work on the ISRC, ISAN, and ISWC standards and proposals. The three propositions are: DOI metadata must support all types of creation; The secure transaction of requests and offers data depends on maintaining an integrated structure for documenting rights ownership agreements; All elements of descriptive metadata (except titles) may also be elements of agreements. The main consequences of these propositions are: A cross-sector vocabulary is essential; Non-confidential terms of rights ownership agreements must be generally accessible in a standard form. (In its purest form, the e-commerce network must be able to automatically determine the current owner of any right in any creation for any territory.); All descriptive metadata values (except titles) must be stored as unique, coded values. If correct, the implications of these propositions on the behaviour, and future inter-dependency, of the rights-owning and bibliographic communities are considerable."
Phrases
<P1> Historically, metadata -- "data about data" -- has been largely treated as an afterthought in the commercial world, even among rights owners. Descriptive metadata has often been regarded as the proper province of libraries, a battlefield of competing systems of tags and classification and an invaluable tool for the discovery of resources, while "business" metadata lurked, ugly but necessary, in distribution systems and EDI message formats. Rights metadata, whatever it may be, may seem to have barely existed in a coherent form at all. <P2> E-commerce offers the opportunity to integrate the functions of discovery, access, licensing and accounting into single point-and-click actions in which metadata is a critical agent, a glue which holds the pieces together. <warrant> <P3> E-commerce in rights will generate global networks of metadata every bit as vital as the networks of optical fibre -- and with the same requirements for security and unbroken connectivity. <warrant> <P4> The sheer volume and complexity of future rights trading in the digital environment will mean that any but the most sporadic level of human intervention will be prohibitively expensive. Standardised metadata is an essential component. <warrant> <P5> Just as the creators and rights holders are the sources of the content for the bibliographic world, so it seems inevitable they will become the principal source of core metadata in the web environment, and that metadata will be generated simultaneously and at source to meet the requirements of discovery, access, protection, and reward. <P6> However, under the analysis being carried out within the communities identified above and by those who are developing technology and languages for rights-based e-commerce, it is becoming clear that "functional" metadata is also a critical component. It is metadata (including identifiers) which defines a creation and its relationship to other creations and to the parties who created and variously own it; without a coherent metadata infrastructure e-commerce cannot properly flow. Securing the metadata network is every bit as important as securing the content, and there is little doubt which poses the greater problem. <warrant> <P7> Because creations can be nested and modified at an unprecedented level, and because online availability is continuous, not a series of time-limited events like publishing books or selling records, dynamic and structured maintenance of rights ownership is essential if the currency and validity of offers is to be maintained. <warrant> <P8> Rights metadata must be maintained and linked dynamically to all of its related content. <P9> A single, even partial, change to rights ownership in the original creation needs to be communicated through this chain to preserve the currency of permissions and royalty flow. There are many options for doing this, but they all depend, among other things, on the security of the metadata network. <warrant> <P10>As digital media causes copyright frameworks to be rewritten on both sides of the Atlantic, we can expect measures of similar and greater impact at regular intervals affecting any and all creation types: yet such changes can be relatively simple to implement if metadata is held in the right way in the right place to begin with. <warrant> <P11> The disturbing but inescapable consequence is that it is not only desirable but essential for all elements of descriptive metadata, except for titles, to be expressed at the outset as structured and standardised values to preserve the integrity of the rights chain. <P12> Within the DOI community, which embraces commercial and library interests, the integration of rights and descriptive metadata has become a matter of priority. <P13> What is required is that the establishment of a creation description (for example, the registration of details of a new article or audio recording) or of change of rights control (for example, notification of the acquisition of a work or a catalogue of works) can be done in a standardised and fully structured way. <warrant> <P14> Unless the chain is well maintained at source, all downstream transactions will be jeopardised, for in the web environment the CIS principle of "do it once, do it right" is seen at its ultimate. A single occurrence of a creation on the web, and its supporting metadata, can be the source for all uses. <P15> One of the tools to support this development is the RDF (Resource Description Framework). RDF provides a means of structuring metadata for anything, and it can be expressed in XML. <P16> Although formal metadata standards hardly exist within ISO, they are appearing through the "back door" in the form of mandatory supporting data for identifier standards such as ISRC, ISAN and ISWC. A major function of the INDECS project will be to ensure the harmonisation of these standards within a single framework. <P17> In an automated, protected environment, this requires that the rights transaction is able to generate automatically a new descriptive metadata set through the interaction of the agreement terms with the original creation metadata. This can only happen (and it will be required on a massive scale) if rights and descriptive metadata terminology is integrated and standardised. <warrant> <P18>As resources become available virtually, it becomes as important that the core metadata itself is not tampered with as it is that the object itself is protected. Persistence is now not only a necessary characteristic of identifiers but also of the structured metadata that attends them. <P19> This leads us also to the conclusion that, ideally, standardised descriptive metadata should be embedded into objects for its own protection. <P20> It also leads us to the possibility of metadata registration authorities, such as the numbering agencies, taking wider responsibilities. <P21>If this paper is correct in its propositions, then rights metadata will have to rewrite half of Dublin Core or else ignore it entirely. <P22> The web environment with its once-for-all means of access provides us with the opportunity to eliminate duplication and fragmentation of core metadata; and at this moment, there are no legacy metadata standards to shackle the information community. We have the opportunity to go in with our eyes open with standards that are constructed to make the best of the characteristics of the new digital medium. <warrant>
Conclusions
RQ "The INDECS project (assuming its formal adoption next month), in which the four major communities are active, and with strong links to ISO TC46 and MPEG, will provide a cross-sector framework for this work in the short-term. The DOI Foundation itself may be an appropriate umbrella body in the future. We may also consider that perhaps the main function of the DOI itself may not be, as originally envisaged, to link user to content -- which is a relatively trivial task -- but to provide the glue to link together creation, party, and agreement metadata. The model that rights owners may be wise to follow in this process is that of MPEG, where the technology industry has tenaciously embraced a highly-regimented, rolling standardisation programme, the results of which are fundamental to the success of each new generation of products. Metadata standardisation now requires the same technical rigour and commercial commitment. However, in the meantime the bibliographic world, working on what it has always seen its own part of the jigsaw puzzle, is actively addressing many of these issues in an almost parallel universe. The question remains as to how in practical terms the two worlds, rights and bibliographic, can connect, and what may be the consequences of a prolonged delay in doing so." ... "The former I encourage to make a case for continued support and standardisation of a flawed Dublin Core in the light of the propositions I have set out in this paper, or else engage with the DOI and rights owner communities in its revision to meet the real requirements of digital commerce in its fullest sense."
SOW
DC "There are currently four major active communities of rights-holders directly confronting these questions: the DOI community, at present based in the book and electronic publishing sector; the IFPI community of record companies; the ISAN community embracing producers, users, and rights owners of audiovisuals; and the CISAC community of collecting societies for composers and publishers of music, but also extending into other areas of authors' rights, including literary, visual, and plastic arts." ... "There are related rights-driven projects in the graphic, photographic, and performers' communities. E-commerce means that metadata solutions from each of these sectors (and others) require a high level of interoperability. As the trading environment becomes common, traditional genre distinctions between creation-types become meaningless and commercially destructive."
Type
Report
Title
D6.2 Impact on World-wide Metadata Standards Report
This document presents the ARTISTE three-level approach to providing an open and flexible solution for combined metadata and image content-based search and retrieval across multiple, distributed image collections. The intended audience for this report includes museum and gallery owners who are interested in providing or extending services for remote access, developers of collection management and image search and retrieval systems, and standards bodies in both the fine art and digital library domains.
Notes
ARTISTE (http://www.artisteweb.org/) is a European Commission supported project that has developed integrated content and metadata-based image retrieval across several major art galleries in Europe. Collaborating galleries include the Louvre in Paris, the Victoria and Albert Museum in London, the Uffizi Gallery in Florence and the National Gallery in London.
Edition
Version 2.0
Publisher
The ARTISTE Consortium
Publication Location
Southampton, United Kindom
Accessed Date
08/24/05
Critical Arguements
<CA>  Over the last two and a half years, ARTISTE has developed an image search and retrieval system that integrates distributed, heterogeneous image collections. This report positions the work achieved in ARTISTE with respect to metadata standards and approaches for open search and retrieval using digital library technology. In particular, this report describes three key aspects of ARTISTE: the transparent translation of local metadata to common standards such as Dublin Core and SIMI consortium attribute sets to allow cross-collection searching; A methodology for combining metadata and image content-based analysis into single search galleries to enable versatile retrieval and navigation facilities within and between gallery collections; and an open interface for cross-collection search and retrieval that advances existing open standards for remote access to digital libraries, such as OAI (Open Archive Initiative) and ZING SRW (Z39.50 International: Next Generation Search and Retrieval Web Service).
Conclusions
RQ "A large part of ARTISTE is concerned with use of existing standards for metadata frameworks. However, one area where existing standards have not been sufficient is multimedia content-based search and retrieval. A proposal has been made to ZING for additions to SRW. This will hopefully enable ARTISTE to make a valued contribution to this rapidly evolving standard." ... "The work started in ARTISTE is being continued in SCULTEUR, another project funded by the European Commission. SCUPLTEUR will develop both the technology and the expertise to create, manage, and present cultural archives of 3D models and associated multimedia objects." ... "We believe the full benefit of multimedia search and retrieval can only be realised through seamless integration of content-based analysis techniques. However, not only does introduction of content-bases analysis require modification to existing standards as outlines in this report, but it also requires a review if the use of semantics in achieving digital library interoperability. In particular, machine understandable description of the semantics of textual metadata, multimedia content, and content-based analysis, can provide a foundation for a new generation of flexible and dynamic digital library tools and services. " ... "Existing standards do not use explicit semantics to describe query operators or their application to metadata and multimedia content at individual sites. However, dynamically determining what operators and types are supported by a collection is essential to robust and efficient cross-collection searching. Dynamic use of published semantics would allow a collection and any associated content-based analysis to be changed  by its owner without breaking conformance to search and retrieval standards. Furthermore, individual sites would not need to publish detailed, human readable descriptions of available functionality.  
SOW
DC "Four major European galleries are involved in the project: the Uffizi in Florence, the national Gallery and the Victoria and Albert Museum in London, and the Centre de Recherche et de Restauration des Musees de France (C2RMF) which is the Louvre related restoration centre. The ARTISTE system currently holds over 160,000 images from four separate collections owned by these partners. The galleries have partnered with NCR, leading player in database and Data Warehouse technology; Interactive Labs, the new media design and development facility of Italy's leading art publishing group, Giunti; IT Innovation, a specialist in building innovative IT systems, and the Department of Electronics and Computer Science at the University of Southhampton." 
Type
Report
Title
RLG Best Practice Guidelines for Encoded Archival Description
These award-winning guidelines, released in August 2002, were developed by the RLG EAD Advisory Group to provide practical, community-wide advice for encoding finding aids. They are designed to: facilitate interoperability of resource discovery by imposing a basic degree of uniformity on the creation of valid EAD-encoded documents; encourage the inclusion of particular elements, and; develop a set of core data elements. 
Publisher
Research Libraries Group
Publication Location
Mountain View, CA, USA
Language
English
Critical Arguements
<CA> The objectives of the guidelines are: 1. To facilitate interoperability of resource discovery by imposing a basic degree of uniformity on the creation of valid EAD-encoded documents and to encourage the inclusion of elements most useful for retrieval in a union index and for display in an integrated (cross-institutional) setting; 2. To offer researchers the full benefits of XML in retrieval and display by developing a set of core data elements to improve resource discovery. It is hoped that by identifying core elements and by specifying "best practice" for those elements, these guidelines will be valuable to those who create finding aids, as well as to vendors and tool builders; 3. To contribute to the evolution of the EAD standard by articulating a set of best practice guidelines suitable for interinstitutional and international use. These guidelines can be applied to both retrospective conversion of legacy finding aids and the creation of new finding aids.  
Conclusions
<RQ>
SOW
<DC> "RLG organized the EAD working group as part of our continuing commitment to making archival collections more accessible on the Web. We offer RLG Archival Resources, a database of archival materials; institutions are encouraged to submit their finding aids to this database." ... "This set of guidelines, the second version promulgated by RLG, was developed between October 2001 and August 2002 by the RLG EAD Advisory Group. This group consisted of ten archivists and digital content managers experienced in creating and managing EAD-encoded finding aids at repositories in the United States and the United Kingdom."
Type
Web Page
Title
CDL Digital Object Standard: Metadata, Content and Encoding
This document addresses the standards for digital object collections for the California Digital Library 1. Adherence to these standards is required for all CDL contributors and may also serve University of California staff as guidelines for digital object creation and presentation. These standards are not intended to address all of the administrative, operational, and technical issues surrounding the creation of digital object collections.
Critical Arguements
CA These standards describe the file formats, storage and access standards for digital objects created by or incorporated into the CDL as part of the permanent collections. They attempt to balance adherence to industry standards, reproduction quality, access, potential longevity and cost.
Conclusions
RQ not applicable
SOW
DC "This is the first version of the CDL Digital Object Standard. This version is based upon the September 1, 1999 version of the CDL's Digital Image Standard, which included recommendations of the Museum Educational Site Licensing Project (MESL), the Library of Congress and the MOA II participants." ... "The Museum Educational Site Licensing Project (MESL) offered a framework for seven collecting institutions, primarily museums, and seven universities to experiment with new ways to distribute visual information--both images and related textual materials. " ... "The Making of America (MoA II) Testbed Project is a Digital Library Federation (DLF) coordinated, multi-phase endeavor to investigate important issues in the creation of an integrated, but distributed, digital library of archival materials (i.e., digitized surrogates of primary source materials found in archives and special collections). The participants include Cornell University, New York Public Library, Pennsylvania State University, Stanford University and UC Berkeley. The Library of Congress white papers and standards are based on the experience gained during the American Memory Pilot Project. The concepts discussed and the principles developed still guide the Library's digital conversion efforts, although they are under revision to accomodate the capabilities of new technologies and new digital formats." ... "The CDL Technical Architecture and Standards Workgroup includes the following members with extensive experience with digital object collection and management: Howard Besser, MESL and MOA II digital imaging testbed projects; Diane Bisom, University of California, Irvine; Bernie Hurley, MOA II, University of California, Berkeley; Greg Janee, Alexandria Digital Library; John Kunze, University of California, San Francisco; Reagan Moore and Chaitanya Baru, San Diego Supercomputer Center, ongoing research with the National Archives and Records Administration on the long term storage and retrieval of digital content; Terry Ryan, University of California, Los Angeles; David Walker, California Digital Library"
There are many types of standards used to manage museum collections information. These "standards", which range from precise technical  standards to general guidelines, enable museum data to be efficiently  and consistently indexed, sorted, retrieved, and shared, both  in automated and paper-based systems. Museums often use metadata standards  (also called data structure standards) to help them: define what types of information to record in their database  (or card catalogue); structure this information (the relationships between the  different types of information). Following (or mapping data to) these standards makes it possible  for museums to move their data between computer systems, or share  their data with other organizations.
Notes
The CHIN Web site features sections dedicated to Creating and Managing Digital Content, Intellectual Property, Collections Management, Standards, and more. CHIN's array of training tools, online publications, directories and databases are especially designed to meet the needs of both small and large institutions. The site also provides access to up-to-date information on topics such as heritage careers, funding and conferences.
Critical Arguements
CA "Museums often want to use their collections data for many purposes, (exhibition catalogues, Web access for the public, and curatorial research, etc.), and they may want to share their data with other museums, archives, and libraries in an automated way. This level of interoperability between systems requires cataloguing standards, value standards, metadata standards, and interchange standards to work together. Standards enable the interchange of data between cataloguer and searcher, between organizations, and between computer systems."
Conclusions
RQ "HIN is also involved in a project to create metadata for a pan-Canadian inventory of learning resources available on Canadian museum Web sites. Working in consultation with the Consortium for the Interchange of Museum Information (CIMI), the Gateway to Educational Materials (GEM) [link to GEM in Section G], and SchoolNet, the project involves the creation of a Guide to Best Practices and cataloguing tool for generating metadata for online learning materials. " 
SOW
DC "CHIN is involved in the promotion, production, and analysis of standards for museum information. The CHIN Guide to Museum Documentation Standards includes information on: standards and guidelines of interest to museums; current projects involving standards research and implementation; organizations responsible for standards research and development; Links." ... "CHIN is a member of CIMI (the Consortium for the Interchange of Museum Information), which works to enable the electronic interchange of museum information. From 1998 to 1999, CHIN participated in a CIMI Metadata Testbed which aimed to explore the creation and use of metadata for facilitating the discovery of electronic museum information. Specifically, the project explored the creation and use of Dublin Core metadata in describing museum collections, and examined how Dublin Core could be used as a means to aid in resource discovery within an electronic, networked environment such as the World Wide Web." 
This document provides some background on preservation metadata for those interested in digital preservation. It first attempts to explain why preservation metadata is seen as an essential part of most digital preservation strategies. It then gives a broad overview of the functional and information models defined in the Reference Model for an Open Archival Information System (OAIS) and describes the main elements of the Cedars outline preservation metadata specification. The next sections take a brief look at related metadata initiatives, make some recommendations for future work and comment on cost issues. At the end there are some brief recommendations for collecting institutions and the creators of digital content followed by some suggestions for further reading.
Critical Arguements
CA "This document is intended to provide a brief introduction to current preservation metadata developments and introduce the outline metadata specifications produced by the Cedars project. It is aimed in particular at those who may have responsibility for digital preservation in the UK further and higher education community, e.g. senior staff in research libraries and computing services. It should also be useful for those undertaking digital content creation (digitisation) initiatives, although it should be noted that specific guidance on this is available elsewhere. The guide may also be of interest to other kinds of organisations that have an interest in the long-term management of digital resources, e.g. publishers, archivists and records managers, broadcasters, etc. This document aimes to provide: A rationale for the creation and maintenance of preservation metadata to support digital preservation strategies, e.g. migration or emulation; An introduction to the concepts and terminology used in the influential ISO Reference Model for an Open Archival Information System (OAIS); Brief information on the Cedars outline preservation metadata specification and the outcomes of some related metadata initiatives; Some notes on the cost implications of preservation metadata and how these might be reduced.
Conclusions
RQ "In June 2000, a group of archivists, computer scientists and metadata experts met in the Netherlands to discuss metadata developments related to recordkeeping and the long-term preservation of archives. One of the key conclusions made at this working meeting was that the recordkeeping metadata communities should attempt to co-operate more with other metatdata initiatives. The meeting also suggested research into the contexts of creation and use, e.g. identifying factors that might encourage or discourage creators form meeting recordkeeping metadata requirements. This kind of research would also be useful for wider preservation metadata developments. One outcome of this meeting was the setting up of an Archiving Metadata Forum (AMF) to form the focus of future developments." ... "Future work on preservation metadata will need to focus on several key issues. Firstly, there is an urgent need for more practical experience of undertaking digital preservation strategies. Until now, many preservation metadata initiatives have largely been based on theoretical considerations or high-level models like the OAIS. This is not in itself a bad thing, but it is now time to begin to build metadata into the design of working systems that can test the viability of digital preservation strategies in a variety of contexts. This process has already begun in initiatives like the Victorian Electronic Records Stategy and the San Diego Supercomputer Center's 'self-validating knowledge-based archives'. A second need is for increased co-operation between the many metadata initiatives that have an interest in digital preservation. This may include the comparison and harmonisation of various metadata specifications, where this is possible. The OCLC/LG working group is an example of how this has been taken forward whitin a particular domain. There is a need for additional co-operation with recordkeeping metadata specialists, computing scientists and others in the metadata research community. Thirdly, there is a need for more detailed research into how metadata will interact with different formats, preservation strategies and communities of users. This may include some analysis of what metadata could be automatically extracted as part of the ingest process, an investigation of the role of content creators in metadata provision, and the production of user requirements." ... "Also, thought should be given to the development of metadata standards that will permit the easy exchange of preservation metadata (and information packages) between repositories." ... "As well as ensuring that digital repositories are able to facilitate the automatic capture of metadata, some thought should also be given to how best digital repositories could deal with any metadata that might already exist."
SOW
DC "Funded by JISC (the Joint Information Systems Committee of the UK higher education funding councils), as part of its Electronic Libraries (eLib) Programme, Cedars was the only project in the programme to focus on digital preservation." ... "In the digitial library domain, the development of a recommendation on preservation metadata is being co-ordinated by a working group supported by OCLC and the RLG. The membership of the working group is international, and inlcudes key individuals who were involved in the development of the Cedars, NEDLIB and NLA metadata specifications."
The CDISC Submission Metadata Model was created to help ensure that the supporting metadata for these submission datasets should meet the following objectives: Provide FDA reviewers with clear describtions of the usage, structure, contents, and attributes of all datasets and variables; Allow reviewers to replicate most analyses, tables, graphs, and listings with minimal or no transformations; Enable reviewers to easily view and subset the data used to generate any analysis, table, graph, or listing without complex programming. ... The CDISC Submission Metadata Model has been defined to guide sponsors in the preparation of data that is to be submitted to the FDA. By following the principles of this model, sponsors will help reviewers to accurately interpret the contents of submitted data and work with it more effectively, without sacrificing the scientific objectives of clinical development.
Publisher
The Clinical Data Interchange Standards Consortium
Critical Arguements
CA "The CDISC Submission Data Model has focused on the use of effective metadata as the most practical way of establishing meaningful standards applicable to electronic data submitted for FDA review."
Conclusions
RQ "Metadata prepared for a domain (such as an efficacy domain) which has not been described in a CDISC model should follow the general format of the safety domains, including the same set of core selection variables and all of the metadata attributes specified for the safety domains. Additional examples and usage guidelines are available on the CDISC web site at www.cdisc.org." ... "The CDISC Metadata Model describes the structure and form of data, not the content. However, the varying nature of clinical data in general will require the sponsor to make some decisions about how to represent certain real-world conditions in the dataset. Therefore, it is useful for a metadata document to give the reviewer an indication of how the datasets handle certain special cases."
SOW
DC CDISC is an open, multidisciplinary, non-profit organization committed to the development of worldwide standards to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development. CDISC members work together to establish universally accepted data standards in the pharmaceutical, biotechnology and device industries, as well as in regulatory agencies worldwide. CDISC currently has more than 90 members, including the majority of the major global pharmaceutical companies.
Type
Web Page
Title
CDISC Achieves Two Significant Milestones in the Development of Models for Data Interchange
CA "The Clinical Data Interchange Standards Consortium has achieved two significant milestones towards its goal of standard data models to streamline drug development and regulatory review processes. CDISC participants have completed metadata models for the 12 safety domains listed in the FDA Guidance regarding Electronic Submissions and have produced a revised XML-based data model to support data acquisition and archive."
Conclusions
RQ "The goal of the CDISC XML Document Type Definition (DTD) Version 1.0 is to make available a first release of the definition of this CDISC model, in order to support sponsors, vendors and CROs in the design of systems and processes around a standard interchange format."
SOW
DC "This team, under the leadership of Wayne Kubick of Lincoln Technologies, and Dave Christiansen of Genentech, presented their metadata models to a group of representatives at the FDA on Oct. 10, and discussed future cooperative efforts with Agency reviewers."... "CDISC is a non-profit organization with a mission to lead the development of standard, vendor-neutral, platform-independent data models that improve process efficiency while supporting the scientific nature of clinical research in the biopharmaceutical and healthcare industries"
The creation and use of metadata is likely to become an important part of all digital preservation strategies whether they are based on hardware and software conservation, emulation or migration. The UK Cedars project aims to promote awareness of the importance of digital preservation, to produce strategic frameworks for digital collection management policies and to promote methods appropriate for long-term preservation - including the creation of appropriate metadata. Preservation metadata is a specialised form of administrative metadata that can be used as a means of storing the technical information that supports the preservation of digital objects. In addition, it can be used to record migration and emulation strategies, to help ensure authenticity, to note rights management and collection management data and also will need to interact with resource discovery metadata. The Cedars project is attempting to investigate some of these issues and will provide some demonstrator systems to test them.
Notes
This article was presented at the Joint RLG and NPO Preservation Conference: Guidelines for Digital Imaging, held September 28-30, 1998.
Critical Arguements
CA "Cedars is a project that aims to address strategic, methodological and practical issues relating to digital preservation (Day 1998a). A key outcome of the project will be to improve awareness of digital preservation issues, especially within the UK higher education sector. Attempts will be made to identify and disseminate: Strategies for collection management ; Strategies for long-term preservation. These strategies will need to be appropriate to a variety of resources in library collections. The project will also include the development of demonstrators to test the technical and organisational feasibility of the chosen preservation strategies. One strand of this work relates to the identification of preservation metadata and a metadata implementation that can be tested in the demonstrators." ... "The Cedars Access Issues Working Group has produced a preliminary study of preservation metadata and the issues that surround it (Day 1998b). This study describes some digital preservation initiatives and models with relation to the Cedars project and will be used as a basis for the development of a preservation metadata implementation in the project. The remainder of this paper will describe some of the metadata approaches found in these initiatives."
Conclusions
RQ "The Cedars project is interested in helping to develop suitable collection management policies for research libraries." ... "The definition and implementation of preservation metadata systems is going to be an important part of the work of custodial organisations in the digital environment."
SOW
DC "The Cedars (CURL exemplars in digital archives) project is funded by the Joint Information Systems Committee (JISC) of the UK higher education funding councils under Phase III of its Electronic Libraries (eLib) Programme. The project is administered through the Consortium of University Research Libraries (CURL) with lead sites based at the Universities of Cambridge, Leeds and Oxford."
Type
Web Page
Title
Metadata for preservation : CEDARS project document AIW01
This report is a review of metadata formats and initiatives in the specific area of digital preservation. It supplements the DESIRE Review of metadata (Dempsey et al. 1997). It is based on a literature review and information picked-up at a number of workshops and meetings and is an attempt to briefly describe the state of the art in the area of metadata for digital preservation.
Critical Arguements
CA "The projects, initiatives and formats reviewed in this report show that much work remains to be done. . . . The adoption of persistent and unique identifiers is vital, both in the CEDARS project and outside. Many of these initiatives mention "wrappers", "containers" and "frameworks". Some thought should be given to how metadata should be integrated with data content in CEDARS. Authenticity (or intellectual preservation) is going to be important. It will be interesting to investigate whether some archivists' concerns with custody or "distributed custody" will have relevance to CEDARS."
Conclusions
RQ Which standards and initiatives described in this document have proved viable preservation metadata models?
SOW
DC OAIS emerged out of an initiative spearheaded by NASA's Consultative Committee for Space Data Systems. It has been shaped and promoted by the RLG and OCLC. Several international projects have played key roles in shaping the OAIS model and adapting it for use in libraries, archives and research repositories. OAIS-modeled repositories include the CEDARS Project, Harvard's Digital Repository, Koninklijke Bibliotheek (KB), the Library of Congress' Archival Information Package for audiovisual materials, MIT's D-Space, OCLC's Digital Archive and TERM: the Texas Email Repository Model.
Type
Web Page
Title
Approaches towards the Long Term Preservation of Archival Digital Records
The Digital Preservation Testbed is carrying out experiments according to pre-defined research questions to establish the best preservation approach or combination of approaches. The Testbed will be focusing its attention on three different digital preservation approaches - Migration; Emulation; and XML - evaluating the effectiveness of these approaches, their limitations, costs, risks, uses, and resource requirements.
Language
English; Dutch
Critical Arguements
CA "The main problem surrounding the preservation of authentic electronic records is that of technology obsolescence. As changes in technology continue to increase exponentially, the problem arises of what to do with records that were created using old and now obsolete hardware and software. Unless action is taken now, there is no guarantee that the current computing environment (and thus also records) will be accessible and readable by future computing environments."
Conclusions
RQ "The Testbed will be conducting research to discover if there is an inviolable way to associate metadata with records and to assess the limitations such an approach may incur. We are also working on the provision of a proposed set of preservation metadata that will contain information about the preservation approach taken and any specific authenticity requirements."
SOW
DC The Digital Preservation Testbed is part of the non-profit organisation ICTU. ICTU is the Dutch organisation for ICT and government. ICTU's goal is to contribute to the structural development of e-government. This will result in improving the work processes of government organisations, their service to the community and interaction with the citizens. Government institutions, such as Ministries, design the policies in the area of e-government, and ICTU translates these policies into projects. In many cases, more than one institution is involved in a single project. They are the principals in the projects and retain control concerning the focus of the project. In case of the Digital Preservation Testbed the principals are the Ministry of the Interior and the Dutch National Archives.
Type
Web Page
Title
Towards a Digital Rights Expression Language Standard for Learning Technology
CA The Learning Technology Standards Committee (LTSC) of the Institute for Electrical and Electronic Engineers (IEEE) concentrated on making recommendations for standardizing a digital rights expression language (DREL) with the specific charge to (1) Investigate existing standards development efforts for DREL and digital rights. (2) Gather DREL requirements germane to the learning, education, and training industries. (3) Make recommendations as to how to proceed. (4) Feed requirements into ongoing DREL and digital rights standardization efforts, regardless of whether the LTSC decides to work with these efforts or embark on its own. This report represents the achievement of these goals in the form a of a white paper that can be used as reference for the LTSC, that reports on the current state of existing and proposed standardization efforts targeting digital rights expression languages and makes recommendations concerning future work.
Conclusions
RQ The recommendations of this report are: 1. Maintain appropriate liaisons between learning technology standards development organizations and those standards development organizations standardizing rights expression languages. The purpose of these liaisons is to continue to feed requirements into broader standardization efforts and to ensure that the voice of the learning, education and training community is heard. 2. Support the creation of application profiles or extensions of XrML and ODRL that include categories and vocabularies for roles common in educational and training settings. In the case of XrML, a name space for local context may be needed. (A name space is required for both XrML and ODRL for the ÔÇ£application profileÔÇØ or specifically the application ÔÇôLT application- extension) 3. Advocate the creation of a standard for expressing local policies in ways that can be mapped to rights expressions. This could be either through a data model or through the definition of an API or service. 4. Launch an initiative to identify models of rights enforcement in learning technology and to possibly abstract a common model for use by architecture and framework definition projects. 5. Further study the implications of patent claims, especially for educational and research purposes.
Type
Web Page
Title
Softening the borderlines of archives through XML - a case study
Archives have always had troubles getting metadata in formats they can process. With XML, these problems are lessening. Many applications today provide the option of exporting data into an application-defined XML format that can easily be post-processed using XSLT, schema mappers, etc, to fit the archives┬┤ needs. This paper highlights two practical examples for the use of XML in the Swiss Federal Archives and discusses advantages and disadvantages of XML in these examples. The first use of XML is the import of existing metadata describing debates at the Swiss parliament whereas the second concerns preservation of metadata in the archiving of relational databases. We have found that the use of XML for metadata encoding is beneficial for the archives, especially for its ease of editing, built-in validation and ease of transformation.
Notes
The Swiss Federal Archives defines the norms and basis of records management and advises departments of the Federal Administration on their implementation. http://www.bar.admin.ch/bar/engine/ShowPage?pageName=ueberlieferung_aktenfuehrung.jsp
Critical Arguements
CA "This paper briefly discusses possible uses of XML in an archival context and the policies of the Swiss Federal Archives concerning this use (Section 2), provides a rough overview of the applications we have that use XML (Section 3) and the experiences we made (Section 4)."
Conclusions
RQ "The systems described above are now just being deployed into real world use, so the experiences presented here are drawn from the development process and preliminary testing. No hard facts in testing the sustainability of XML could be gathered, as the test is time itself. This test will be passed when we can still access the data stored today, including all metadata, in ten or twenty years." ... "The main problem area with our applications was the encoding of the XML documents and the non-standard XML document generation of some applications. When dealing with the different encodings (UTF-8, UTF-16, ISO-8859-1, etc) some applications purported a different encoding in the header of the XML document than the true encoding of the document. These errors were quickly identified, as no application was able to read the documents."
SOW
DC The author is currently a private digital archives consultant, but at the time of this article, was a data architect for the Swiss Federal Archives. The content of this article owes much to the work being done by a team of architects and engineers at the Archives, who are working on an e-government project called ARELDA (Archiving of Electronic Data and Records).
Type
Web Page
Title
Report of the Ad Hoc Committee for Development of a Standardized Tool for Encoding Finding Aids
This report focuses on the development of tools for the description and intellectual control of archives and the discovery of relevant resources by users. Other archival functions, such as appraisal, acquisition, preservation, and physical control, are beyond the scope for this project. The system developed as a result of this report should be useable on stand-alone computers in small institutions, by multiple users in larger organisations, and by local, regional, national, and international networks. The development of such a system should take into account the strategies, experiences, and results of other initiatives such as the European Union Archival Network (EUAN), the Linking and Exploring Authority Files (LEAF) initiative, the European Visual Archives (EVA) project, and the Canadian Archival Information Network (CAIN). This report is divided into five sections. A description of the conceptual structure of an archival information system, described as six layers of services and protocols, follows this introduction. Section three details the functional requirements for the software tool and is followed by a discussion of the relationship of these requirements to existing archival software application. The report concludes with a series of recommendations that provide a strategy for the successful development, deployment, and maintenance of an Open Source Archival Resource Information System (OSARIS). There are two appendices: a data model and a comparison of the functional requirements statements to several existing archival systems.
Notes
3. Functional Requirements Requirements for Information Interchange 3.2: The system must support the current archival standards for machine-readable data communication, Encoded Archival Description (EAD) and Encoded Archival Context (EAC). A subset of elements found in EAD may be used to exchange descriptions based on ISAD(G) while elements in EAC may be used to exchange ISAAR(CPF)-based authority data.
Publisher
International Council on Archives Committee on Descriptive Standards
Critical Arguements
CA The Ad Hoc Committee agrees that it would be highly desirable to develop a modular, open source software tool that could be used by archives worldwide to manage the intellectual control of their holdings through the recording of standardized descriptive data. Individual archives could combine their data with that of other institutions in regional, national or international networks. Researchers could access this data either via a stand-alone computerized system or over the Internet. The model for this software would be the successful UNESCO-sponsored free library program, ISIS, which has been in widespread use around the developing world for many years. The software, with appropriate supporting documentation, would be freely available via an ICA or UNESCO web site or on CD-ROM. Unlike ISIS, however, the source code and not just the software should be freely available.
Conclusions
RQ "1. That the ICA endorses the functional requirements presented in this document as the basis for moving the initiative forward. 2. That the functional desiderata and technical specifications for the software applications, such as user requirements, business rules, and detailed data models, should be developed further by a team of experts from both ICA/CDS and ICA/ITC as the next stage of this project. 3. That following the finalization of the technical specifications for OSARIS, the requirements should be compared to existing systems and a decision made to adopt or adapt existing software or to build new applications. At that point in time, it will then be possible to estimate project costs. 4. That a solution that incorporates the functional requirements result in the development of several modular software applications. 5. That the implementation of the system should follow a modular strategy. 6. That the development of software applications must include a thorough investigation and assessment of existing solutions beginning with those identified in section four and Appendix B of this document. 7. That the ICA develop a strategy for communicating the progress of this project to members of the international archival community on a regular basis. This would include the distribution of progress reports in multiple languages. The communication strategy must include a two-way exchange of ideas. The project will benefit strongly from the ongoing comments, suggestions, and input of the members of the international archival community. 8. That a test-bed be developed to allow the testing of software solutions in a realistic archival environment. 9. That the system specifications, its documentation, and the source codes for the applications be freely available. 10. That training courses for new users, ongoing education, and webbased support groups be established. 11. That promotion of the software be carried out through the existing regional infrastructure of ICA and through UNESCO. 12. That an infrastructure for ongoing maintenance, distribution, and technical support be developed. This should include a web site to download software and supporting documentation. The ICA should also establish and maintain a mechanism for end-users to recommend changes and enhancements to the software. 13. That the ICA establishes and maintains an official mechanism for regular review of the software by an advisory committee that includes technical and archival experts. "
SOW
DC "The development of such a system should take into account the strategies, experiences, and results of other initiatives such as the European Union Archival Network (EUAN), the Linking and Exploring Authority Files (LEAF) initiative, the European Visual Archives (EVA) project, and the Canadian Archival Information Network (CAIN)."
Type
Web Page
Title
Metadata Reference Guide: ONIX ONline Information eXchange
CA According to Editeur, the group responsible for the maintenance of the ONIX standard, ONIX is the international standard for representing book, serial, and video product information in electronic form.
"The ERMS Metadata Standard forms Part 2 of the National Archives' 'Requirements for Electronic Records Management Systems' (commonly known as the '2002 Requirements'). It is specified in a technology independent manner, and is aligned with the e-Government Metadata Standard (e-GMS) version 2, April 2003. A version of e-GMS v2 including XML examples was published in the autumn of 2003. This Guide should be read in conjunction with the ERMS Metadata Standard. Readers may find the GovTalk Schema Guidelines (available via http://www.govtalk.gov.uk ) helpful regarding design rules used in building the schemas."
Conclusions
RQ Electronically enabled processes need to generate appropriate records, according to established records management principles. These records need to reach the ERMS that captures them with enough information to enable the ERMS to classify them appropriately, allocate an appropriate retention policy, etc.
SOW
DC This document is a draft.
Type
Web Page
Title
Use of Encoded Archival Description (EAD) for Manuscript Collection Finding Aids
Presented in 1999 to the Library's Collection Development & Management Committee, this report outlines support for implementing EAD in delivery of finding aids for library collections over the Web. It describes the limitations of HTML, provides an introduction to SGML, XML, and EAD, outlines the advantages of conversion from HTML to EAD, the conversion process, the proposed outcome, and sources for further information.
Publisher
National Library of Australia
Critical Arguements
CA As use of the World Wide Web has increased, so has the need of users to be able to discover web-based information resources easily and efficiently, and to be able to repeat that discovery in a consistent manner. Using SGML to mark up web-based documents facilitates such resource discovery.
Conclusions
RQ To what extent have the mainstream web browser companies fulfilled their committment to support native viewing of SGML/XML documents?
During the past decade, the recordkeeping practices in public and private organizations have been revolutionized. New information technologies from mainframes, to PC's, to local area networks and the Internet have transformed the way state agencies create, use, disseminate, and store information. These new technologies offer a vastly enhanced means of collecting information for and about citizens, communicating within state government and between state agencies and the public, and documenting the business of government. Like other modern organizations, Ohio state agencies face challenges in managing and preserving their records because records are increasingly generated and stored in computer-based information systems. The Ohio Historical Society serves as the official State Archives with responsibility to assist state and local agencies in the preservation of records with enduring value. The Office of the State Records Administrator within the Department of Administrative Services (DAS) provides advice to state agencies on the proper management and disposition of government records. Out of concern over its ability to preserve electronic records with enduring value and assist agencies with electronic records issues, the State Archives has adapted these guidelines from guidelines created by the Kansas State Historical Society. The Kansas State Historical Society, through the Kansas State Historical Records Advisory Board, requested a program development grant from the National Historical Publications and Records Commission to develop policies and guidelines for electronic records management in the state of Kansas. With grant funds, the KSHS hired a consultant, Dr. Margaret Hedstrom, an Associate Professor in the School of Information, University of Michigan and formerly Chief of State Records Advisory Services at the New York State Archives and Records Administration, to draft guidelines that could be tested, revised, and then implemented in Kansas state government.
Notes
These guidelines are part of the ongoing effort to address the electronic records management needs of Ohio state government. As a result, this document continues to undergo changes. The first draft, written by Dr. Margaret Hedstrom, was completed in November of 1997 for the Kansas State Historical Society. That version was reorganized and updated and posted to the KSHS Web site on August 18, 1999. The Kansas Guidelines were modified for use in Ohio during September 2000
Critical Arguements
CA "This publication is about maintaining accountability and preserving important historical records in the electronic age. It is designed to provide guidance to users and managers of computer systems in Ohio government about: the problems associated with managing electronic records, special recordkeeping and accountability concerns that arise in the context of electronic government; archival strategies for the identification, management and preservation of electronic records with enduring value; identification and appropriate disposition of electronic records with short-term value, and
Type
Web Page
Title
Online Archive of California Best Practice Guidelines for Encoded Archival Description, Version 1.1
These guidelines were prepared by the OAC Working Group's Metadata Standards Subcommittee during the spring and summer of 2003. This version of the OAC BPG EAD draws substantially on the
Language
Anonymous
Type
Web Page
Title
Requirements for Electronic Records Management Systems: (2) Metadata Standard
Requirements for Electronic Records Management Systems includes: (1) "Functional Requirements" (http://www.nationalarchives.gov.uk/electronicrecords/reqs2002/pdf/requirementsfinal.pdf); (2) "Metadata Standard" (the subject of this record); (3) Reference Document (http://www.nationalarchives.gov.uk/electronicrecords/reqs2002/pdf/referencefinal.pdf); and (4) "Implementation Guidance: Configuration and Metadata Issues" (http://www.nationalarchives.gov.uk/electronicrecords/reqs2002/pdf/implementation.pdf)
Publisher
Public Records Office, [British] National Archives
Critical Arguements
CA Sets out the implications for records management metadata in compliant systems. It has been agreed with the Office of the e-Envoy that this document will form the basis for an XML schema to support the exchange of records metadata and promote interoperability between ERMS and other systems
SOW
DC The National Archives updated the functional requirements for electronic records management systems (ERMS) in collaboration with the central government records management community during 2002. The revision takes account of developments in cross-government and international standards since 1999.
Type
Web Page
Title
Descriptive Metadata Guidelines for RLG Cultural Materials
To ensure that the digital collections submitted to RLG Cultural Materials can be discovered and understood, RLG has compiled these Descriptive Metadata Guidelines for contributors. While these guidelines reflect the needs of one particular service, they also represent a case study in information sharing across community and national boundaries. RLG Cultural Materials engages a wide range of contributors with different local practices and institutional priorities. Since it is impossible to find -- and impractical to impose -- one universally applicable standard as a submission format, RLG encourages contributors to follow the suite of standards applicable to their particular community (p.1).
Critical Arguements
CA "These guidelines . . . do not set a new standard for metadata submission, but rather support a baseline that can be met by any number of strategies, enabling participating institutions to leverage their local descriptions. These guidelines also highlight the types of metadata that enhance functionality for RLG Cultural Materials. After a contributor submits a collection, RLG maps that description into the RLG Cultural Materials database using the RLG Cultural Materials data model. This ensures that metadata from the various participant communities is integrated for efficient searching and retrieval" (p.1).
Conclusions
RQ Not applicable.
SOW
DC RLG comprises more than 150 research and cultural memory institutions, and RLG Cultural Materials elicits contributions from countless museums, archives, and libraries from around the world that, although they might retain local descriptive standards and metadata schemas, must conform to the baseline standards prescribed in this document in order to integrate into RLG Cultural Materials. Appendix A represents and evaluates the most common metadata standards with which RLG Cultural Materians is able to work.
Type
Web Page
Title
Interactive Fiction Metadata Element Set version 1.1, IFMES 1.1 Specification
This document defines a set of metadata elements for describing Interactive Fiction games. These elements incorporate and enhance most of the previous metadata formats currently in use for Interactive Fiction, and attempts to bridge them to modern standards such as the Dublin Core.
Critical Arguements
CA "There are already many metadata standards in use, both in the Interactive Fiction community and the internet at large. The standards used by the IF community cover a range of technologies, but none are fully compatible with bleeding-edge internet technology like the Semantic Web. Broader-based formats such as the Dublin Core are designed for the Semantic Web, but lack the specialized fields needed to describe Interactive Fiction. The Interactive Fiction Metadata Element Set was designed with three purposes. One, to fill in the specialized elements that Dublin Core lacks. Two, to unify the various metadata formats already in use in the IF community into a single standard. Three, to bridge these older standards to the Dublin Core element set by means of the RDF subclassing system. It is not IFMES's goal to provide every single metadata element needed. RDF, XML, and other namespace-aware languages can freely mix different vocabularies, therefore IFMES does not subclass Dublin Core elements that do not relate to previous Interactive Fiction metadata standards. For these elements, IFMES recommends using the existing Dublin Core vocabulary, to maximize interoperability with other tools and communities."
Conclusions
RQ "Several of the IFMES elements can take multiple values. Finding a standard method of expressing multiple values is tricky. The approved method in RDF is either to repeat the predicate with different objects, or create a container as a child object. However, some RDF parsers don't work well with either of these methods, and many other languages don't allow them at all. XML has a value list format in which the values are separated with spaces, however this precludes spaces from appearing within the values themselves. A few legacy HTML attributes whose content models were never formally defined used commas to separate values that might contain spaces, and a few URI schemes accept multiple values separated by semicolons. The IFMES discussion group continues to examine this problem, and hopes to have a well-defined solution by the time this document reaches Candidate Recommendation status. For the time being IFMES recommends repeating the elements whenever possible, and using a container when that fails (for example, JSON could set the value to an Array). If an implementation simply must concatenate the values into a single string, the recommended separator is a space for URI and numeric types, and a comma followed by a space for text types."
SOW
DC The authors are writers and programmers in the interactive fiction community.
Type
Web Page
Title
Archiving of Electronic Digital Data and Records in the Swiss Federal Archives (ARELDA): e-government project ARELDA - Management Summary
The goal of the ARELDA project is to find long-term solutions for the archiving of digital records in the Swiss Federal Archives. This includes the accession, the long-term storage, preservation of data, description, and access for the users of the Swiss Federal Archives. It is also coordinated with the basic efforts of the Federal Archives to realize a uniform records management solution in the federal administration and therefore to support the pre-archival creation of documents of archival value for the benefits of the administration as well as of the Federal Archives. The project is indispensable for the long-term execution of the Federal Archives Act; Older IT systems are being replaced by newer ones. A complete migration of the data is sometimes not possible or too expensive; A constant increase of small database applications, built and maintained by people with no IT background; More and more administrative bodies are introducing records and document management systems.
Publisher
Swiss Federal Archives
Publication Location
Bern
Critical Arguements
CA "Archiving in general is a necessary prerequisite for the reconstruction of governmental activities as well as for the principle of legal certainty. It enables citizens to understand governmental activities and ensures a democratic control of the federal administration. And finally are archives a prerequisite for the scientific research, especially in the social and historical fields and ensure the preservation of our cultural heritage. It plays a vital role for an ongoing and efficient records management. A necessary prerequisite for the Federal Archives in the era of the information society will be the system ARELDA (Archiving of Electronic Data and Records)."
Conclusions
RQ "Because of the lack of standard solutions and limited or lacking personal resources for an internal development effort, the realisation of ARELDA will have to be outsourced and the cooperation with the IT division and the Federal Office for Information Technology, Systems and Telecommunication must be intensified. The guidelines for the projects are as follows:
SOW
DC ARELDA is one of the five key projects in the Swiss government's e-government strategy.
Museums and the Online Archive of California (MOAC) builds on existing standards and their implementation guidelines provided by the Online Archive of California (OAC) and its parent organization, the California Digital Library (CDL). Setting project standards for MOAC consisted of interpreting existing OAC/CDL documents and adapting them to the projects specific needs, while at the same time maintaining compliance with OAC/CDL guidelines. The present overview over the MOAC technical standards references both the OAC/CDL umbrella document and the MOAC implementation / adaptation document at the beginning of each section, as well as related resources which provide more detail on project specifications.
Critical Arguements
CA The project implements specifications for digital image production, as well as three interlocking file exchange formats for delivering collections, digital images and their respective metadata. Encoded Archival Description (EAD) XML describes the hierarchy of a collection down to the item-level and traditionally serves for discovering both the collection and the individual items within it. For viewing multiple images associated with a single object record, MOAC utilizes Making of America 2 (MOA2) XML. MOA2 makes the images representing an item available to the viewer through a navigable table of contents; the display mimics the behavior of the analog item by e.g. allowing end-users to browse through the pages of an artist's book. Through the further extension of MOA2 with Text Encoding Initiative (TEI) Lite XML, not only does every single page of the book display in its correct order, but a transcription of its textual content also accompanies the digital images.
Conclusions
RQ "These two instances of fairly significant changes in the project's specifications may serve as a gentle reminder that despite its solid foundation in standards, the MOAC information architecture will continue to face the challenge of an ever-changing technical environment."
SOW
DC The author is Digital Media Developer at the UC Berkeley Art Museum & Pacific Film Archives, a member of the MOAC consortium.