CA "The Digital Preservation Testbed is researching three different approaches to long-term digital preservation: migration, emulation and XML. Not only will the effectiveness of each approach be evaluated, but also their limits, costs and application potential. Experiments are taking place on text documents, spreadsheets, emails and databases of different size, complexity and nature."
Conclusions
RQ "New experiments expected in 2002 are the migration of spreadsheets, conversion of spreadsheets and databases into XML and a proof of concept with the UVC for text documents and spreadsheets. ... Eventually at the end of 2003 the Testbed project will provide: advice on how to deal with current digital records; recommendations for an appropriate preservation strategy or a combination ofstrategies; functional requirements for a preservation function; cost models of the various preservation strategies; a decision model for preservation strategy; recommendations concerning guidelines and regulations."
SOW
DC "The Digital Preservation Testbed is part of the non-profit organisation ICTU. ICTU isthe Dutch organisation for ICT and government. ICTU's goal is to contribute to the structural development of e-government. This will result in improving the work processes of government organisations, their service to the community and interaction with the citizens. ... In case of the Digital Preservation Testbed the principals are the Ministry of the Interior, Jan Lintsen and the Dutch National Archives, Maarten van Boven. Together with Public Key Infrastructure, Digital Longevity is the fundament of the ELO-house."
Type
Journal
Title
When Documents Deceive: Trust and Provenance as New Factors for Information Retrieval in a Tangled Web
Journal of the American Society for Information Science and Technology
Periodical Abbreviation
JASIST
Publication Year
2001
Volume
52
Issue
1
Pages
12
Publisher
John Wiley & Sons
Critical Arguements
"This brief and somewhat informal article outlines a personal view of the changing framework for information retrieval suggested by the Web environment, and then goes on to speculate about how some of these changes may manifest in upcoming generations of information retrieval systems. It also sketches some ideas about the broader context of trust management infrastructure that will be needed to support these developments, and it points towards a number of new research agendas that will be critical during this decade. The pursuit of these agendas is going to call for new collaborations between information scientists and a wide range of other disciplines." (p. 12) Discusses public key infrastructure (PKI) and Pretty Good Practice (PGP) systems as steps toward ensuring the trustworthiness of metadata online, but explains their limitations. Makes a distinction between the identify of providers of metadata and their behavior, arguing that it is the latter we need to be concerned with.
Phrases
<P1> Surrogates are assumed to be accurate because they are produced by trusted parties, who are the only parties allowed to contribute records to these databases. Documents (full documents or surrogate records) are viewed as passive; they do not actively deceive the IR system.... Compare this to the realities of the Web environment. Anyone can create any metadata they want about any object on the net, with any motivation. (p. 13) <P2> Sites interested in manipulating the results of the indexing process rapidly began to exploit the difference between the document as viewed by the user and the document as analyzed by the indexing crawler through a set of techniques broadly called "index spamming." <P3> Pagejacking might be defined generally as providing arbitrary documents with independent arbitrary index entries. Clearly, building information retrieval systems to cope with this environment is a huge problem. (p. 14) <P4> [T]he tools are coming into place that let one determine the source of a metadata assertion (or, more precisely and more generally) the identity of the person or organization that stands behind the assertion, and to establish a level of trust in this identity. (p. 16) <P5> It is essential to recognize that in the information retrieval context one is not concerned so much with identity as with behavior. ... This distinction is often overlooked or misunderstood in discussions about what problems PKI is likely to solve: identity alone does not necessarily solve the problem of whether to trust information provided by, or warranted by, that identity. ... And all of the technology for propagating trust, either in hierarchical (PKI) or web-of-trust identity management, is purely about trust in identity. (p. 16) <P6> The question of formalizing and recording expectations about behavior, or trust in behavior, are extraordinarily complex, and as far as I know, very poorly explored. (p. 16) <P7> [A]n appeal to certification or rating services simply shifts the problem: how are these services going to track, evaluate, and rate behavior, or certify skills and behavior? (p. 16) <P8> An individual should be able to decide how he or she is willing to have identity established, and when to believe information created by or associated with such an identity. Further, each individual should be able to have this personal database evolve over time based on experience and changing beliefs. (p. 16) <P9> [T]he ability to scale and to respond to a dynamic environment in which new information sources are constantly emerging is also vital.<P10> In determining what data a user (or an indexing system, which may make global policy decisions) is going to consider in matching a set of search criteria, a way of defining the acceptable level of trust in the identity of the source of the data will be needed. (p. 16) <P10> Only if the data is supported by both sufficient trust in the identity of the source and the behavior of that identity will it be considered eligible for comparison to the search criteria. Alternatively, just as ranking of result sets provided a more flexible model of retrieval than just deciding whether documents or surrogates did or did not match a group of search criteria, one can imagine developing systems that integrate confidence in the data source (both identity and behavior, or perhaps only behavior, with trust in identity having some absolute minimum value) into ranking algorithms. (p. 17) <P11> As we integrate trust and provenance into the next generations of information retrieval systems we must recognize that system designers face a heavy burden of responsibility. ... New design goals will need to include making users aware of defaults; encouraging personalization; and helping users to understand the behavior of retrieval systems <warrant> (p. 18) <P12> Powerful paternalistic systems that simply set up trust-related parameters as part of the indexing process and thus automatically apply a fixed set of such parameters to each search submitted to the retrieval system will be a real danger. (p. 17)
Conclusions
RQ "These developments suggest a research agenda that addresses indexing countermeasures and counter-countermeasures; ways of anonymously or pseudononymously spot-checking the results of Web-crawling software, and of identifying, filtering out, and punishing attempts to manipulate the indexing process such as query-source-sensitive responses or deceptively structured pages that exploit the gap between presentation and content." (p. 14) "Obviously, there are numerous open research problems in designing such systems: how can the user express these confidence or trust constraints; how should the system integrate them into ranking techniques; how can efficient index structures and query evaluation algorithms be designed that integrate these factors. ... The integration of trust and provenance into information retrieval systems is clearly going to be necessary and, I believe, inevitable. If done properly, this will inform and empower users; if done incorrectly, it threatens to be a tremendously powerful engine of censorship and control over information access. (p. 17)
Type
Journal
Title
Metadata Strategies and Archival Description: Comparing Apples to Oranges
Advocates of a "metadata systems approach" to the description of electronic records argue that metadata's capacity to provide descriptive information about the context of electronic records creation will obviate, or reduce significantly, the need for traditional archival description. This article examines the assumptions about the nature of archival description and of metadata on which metadata strategies are grounded, for the purposes of ascertaining the following: whether the skepticism concerning the capacity of traditional description to meet the challenges posed by the so-called "second generation" of electronic records is justified; whether the use of metadata as archival description is consistent with their nature and purpose; and whether metadata are capable of servinng archival descriptive purposes.
Critical Arguements
CA "Before the archival profession assigns to traditional archival description the diminished role of "added value" (i.e. accessory) or abandons it altogether, the assumptions about the nature of archival description and of metadata on which metadata strategies are grounded ought to be carefully examined. Such an examination is necessary to ascertain the following: whether the skepticism concerning the capacity of traditional description to meet the challenges posed by the so-called "second generation" of electronic records is justified, whether the use of metadata as archival description is consistent with their nature and purpose, and whether metadata are acapable of serving archival purposes."
Phrases
<P1> In an article published in Archivaria, David Wallace summarized recent writing on the subject of metadata and concluded that "[d]ata dictionaries and the types of metadata that they house and can be built to house should be seriously evaluated by archivists" because of their potential to signficantly improve and ultimately transform traditional archival practice in the areas of appraisal, arrangement, description, reference, and access. <warrant> <P2> In the area of description, specifically, advocates of "metadata management" or a "metadata systems approach" believe that metadata's capacity to provide descriptive information about the context of electronic records creation will obviate, or reduce significantly, the need for traditional description. <P3> Charles Dollar maintains that archival participation in the IRDS standard is essential to ensure that archival requirements, including descriptive requirements, are understood and adopted within it. <warrant> <P4> According to David Wallace, "archivists will need to concentrate their efforts on metadata systems creation rather than informational content descriptions, since in the electronic realm, archivists' concern for informational value will be eclipsed by concern for the evidential value of the system." <warrant> <P5> Charles Dollar, for his part, predicts that, rather than emphasize "the products of an information system," a metadata systems approach to description will focus on "an understanding of the information system context that supports organization-wide information sharing." <P6> Because their scope and context are comparitively narrow, metadata circumscribe and atomize these various contexts of records creation. Archival description, on the other hand, enlarges and integrates them. In so doing it reveals continuities and discontinuities in the matrix of function, structure, and record-keeping over time. <P7> Metadata are part of this broader context, since they constitute a series within the creator's fonds. The partial context provided by metadata should not, however, be mistaken for the whole context. <P8> Metadata, for example, may be capable of explaining contextual attributes of the data within an electronic records system, but they are incapable of describing themselves -- i.e., their own context of creation and use -- because they cannot be detached from themselves. For this reason, it is necessary to describe the context in which the metadata are created so that their meaning also will be preserved over time. <P9> A metadata system is like a diary that, in telegraphic style, records the daily events that take place in the life of an individual as they occur and from the individual's perspective. <P10> Archival description, it could be said, is the view from the plane; metadata, the view from the field as it is plowed. <P11> While a close-up shot-- such as the capture of a database view -- may be necessary for the purposes of preserving record context and system functionality, it does not follow that such a snapshot is necessary or even desirable for the purposes of description. <P12> Because the context revealed by metadata systems is so detailed, and the volume of transactions they capture is so enormous, metadata may in fact obscure, rather than illuminate, the broader administrative context and thereby bias the users' understanding of the records' meaning. In fact, parts of actions and transactions may develop entirely outside of the electronic system and never be included in the metadata. <P13> If the metadata are kept in their entirety, users searching for documents will have to wade through a great deal of irrelevant data to find what they need. If the metadata are chopped up into bits corresponding to what has been kept, how comprehensible will they be to the uesr? <P14> The tendency to describe metadata in metaphorical terms, e.g., in relation to archival inventories, has distracted attention from consideration of what metadata are in substantial, concrete terms. They are, in fact, records created and used in the conduct of affairs of which they form a part. <P15> The transactions captured by metadata systems may be at a more microscopic level than those captured in registers and the context may be more detailed, given the technological complexity of electronic record-keeping environments. Nevertheless, their function remains the same. <P16> And, like protocol registers, whose permanent retention is legislated, metadata need to be preserved in perpetuity because they are concrete evidence of what documents were made and received, who handled them, with what results, and the transactions to which they relate. <warrant> <P17> While it is true that metadata systems show or reveal the context in which transactions occur in an electronic system and therefore constitute a kind of description of it -- Jenkinson made the same observation about registers -- their real object is to record the fact of these transactions; they should be, like registers, "preserved as a [record] of the proceedings in that connection." <P18> Viewing metadata systems as tools for achieving archival purposes, rather than as tools for achieving the creators' purposes is dangerous because it encourages us to, in effect, privilege potential secondary uses of metadata over their actual primary use; in so doing, we could reshape such use for purposes other than the conduct of affairs of which they are a part. <P19> Metadata strategies risk compromising, specifically, the impartiality of the records' creation. <P20> For archivists to introduce in the formation of metadata records requirements directed toward the future needs of archivists and researchers rather than toward the current needs of the creator would contribute an element of self-consciousness into the records creation process that is inconsistent with the preservation of the records' impartiality. <P21> If the impartiality of the metadata is compromised, their value as evidence will be compromised, which means, ultimately, that the underlying objective of metadata strategies -- the preservation of evidence -- will be defeated. <P22> None of these objections should be taken to suggest that archivists do not have a role to play in the design and maintenance of metadata systems. It is, rather, to suggest that that role must be driven by our primary obligation to protect and preserve, to the extent possible, the essential characterisitcis of the archives. <P23> The proper role of an archivist in the design of a metadata system, then, is to assist the organization in identifying its own descriptive needs as well as to ensure that the identification process is driven, not by narrowly defined system requirements, but by the organization's overarching need and obligation to create and maintain complete, reliable, and authentic records. <P24> That is why it is essential that information holdings are identified and described in a meaningful way, organized in a logical manner that fascilitates their access, and preserved in a manner that permits their continuing use. <P25> Record-keeing requirements for electronic records must address the need to render documentary relationships wisible and to build in procedures for authentication and preservation; such measures will ensure that record-keeping systems meet the criteris of "intergrity, currency an relevancy" necessary to the records creator. <P26> In other words, effective description is a consequence of effective records management and intelligent appraisal, not their purpose. If the primary objectives of metadata are met, description will be fascilitated and the need for description at lower levels (e.g., below the series level) may even be obviated. <P27> Metadata systems cannot and should not replace archival description. To meet the challenges posed by electronic records, it is more important than ever that we follow the dictates of archival science, which begin from a consideration of the nature of archives. <P28> Archival participation in the design and maintenance of metadata systems must be driven by the need to preserve them as archival documents, that is, as evidence of actions and transactions, not as descriptive tools. Our role is not to promote our own intersts, but to deepen the creator's understanding of its interests in preserving the evidence of its own actions and transactions. We can contribute to that understanding because we have a broader view of the creator's needs over time. In supporting these interests, we indirectly promote our own. <P29> To ensure that our descriptive infrastructure is sound -- that is to say, comprehensible, flexible, efficient, and effective -- we need equally to analyze our own information management methods and, out of that analysis, to develop complementary systems of administrative and intellectual control that will build upon each other. By these means we will be able to accomodate the diversity and complexity of the record-keeping environments with which we must deal.
Conclusions
RQ "Since 'current metadata systems do not account for the provenancial and contextual information needed to manage archival records,' archivists are exhorted [by Margaret Hedstrom] to direct their research efforts (and research dollars) toward the identification of the types of metadata that ought to be captured and created to meet archival descriptive requirements. "
SOW
DC Dr. Heather MacNeil is an Assistant Professor at the School of Library, Archival, and Information Studies at the University of British Columbia. Dr. MacNeilÔÇÖs major areas of interests include: trends and themes in archival research & scholarship; arrangement and description of archival documents; management of current records; trustworthiness of records as evidence; protection of personal privacy; interdisciplinary perspectives on record trustworthiness; and archival preservation of authentic electronic records
Type
Journal
Title
Grasping the Nettle: The Evolution of Australian Archives Electronic Records Policy
CA An overview of the development of electronic records policy at the Australian Archives.
Phrases
<P1> The notion of records being independent of format and of "virtual" records opens up a completely new focus on what it is that archival institutions are attempting to preserve. (p. 136) <P2> The import of Bearman's contention that not all infomation systems are recordkeeping systems challenges archivists to move attention away from managing archival records after the fact toward involvement in the creation phase of records, i.e., in the systems design and implementation process. (p. 139) <P3> The experience of the Australian Archives is but one slice of a very large pie, but I think it is a good indication of the challenges other institutions are facing internationally. (p. 144)
Conclusions
RQ How has the Australian Archives managed the transition from paper to electronic records? What issues were raised and how were they dealt with?
CA The want for hard unassailable recordkeeping rules ignores the fact that recordkeeping is contingent upon unique needs of each organization as far as acceptable risks and context. Reed argues that aiming to achieve basic agreement on a minimal set of metadata attributes is an important start.
Phrases
<P1> Recordkeeping must be tailored to the requirements of specific business functions and activities linked to related social and legal requirements, incorporated into particular business processes, and maintained through each change to those processes. (p. 222) <P2> A record core or metadata set which lacks such specificity, detailing only requirements for a unique identifier, will not support interpretation of the record outside the creating domain. To enable that, we need more detailed specification of the domain itself, data which is redundant when you know where you are, but essential to understanding and interpreting records where the domain is not explicit. (p. 229)
Conclusions
RQ To establish requirements for viable core elements, the big challenge is the issue of time and that data will change over time ÔÇöespecially as far as individual competence, business function and language.
Type
Journal
Title
Archives and the information superhighway: Current status and future challenges
CA One struggle facing us is to convince the rest of society that the ÔÇ£information superhighwayÔÇØ is very much about records, evidence and ÔÇ£recordnessÔÇØ.
Phrases
<P1> It has been argued that existing computer software applications harm recordkeeping because they are remiss in capturing the full breadth of contextual information required to document transactions and create records -- records which can serve as reliable evidence of the transactions which created them. In place of records, these systems are producing data which fails to relate the who, what, when, where, and why of human communications -- attributes which are required for record evidence. This argument has found both saliency and support in other work conducted by the Netherlands and the World Bank, which have both noted that existing software applications fail to provide for the capture of the required complement of descriptive attributes required for proper recordkeeping. These examples point to the vast opportunity presented to archivists to position themselves as substantive contributors to information infrastructure discussions. Archivists are capable of pointing out what will be necessary to create records in the electronic environment which, in the words of David Bearman, meet the requirements of ÔÇ£business acceptable commincation. (p.87) <warrant>
Conclusions
RQ Can archivists provide access to information in the unstable electronic records environment we find ourselves in today?
Type
Electronic Journal
Title
Directive 1999/93/EC of the European Parliament and of the Council of 13 December 1999 on a Community Framework for Electronic Signatures
CA "[A] clear Community framework regarding the conditions applying to electronic signatures will strengthen confidence in, and general acceptance of, the new technologies; legislation in the Member States should not hinder the free movement of goods and services in the internal market. ... The interoperability of electronic-signature products should be promoted. ... Rapid technological development and the global character of the Internet necessitate an approach which is open to various technologies and services capable of authenticating data electronically. ... This Directive contributes to the use and legal recognition of electronic signatures within the Community; a regulatory framework is not needed for electronic signatures exclusively used within systems, which are based on voluntary agreements under private law between a specified number of participants; the freedom of parties to agree among themselves the terms and conditions under which they accept electronically signed data should be respected to the extent allowed by national law; the legal effectiveness of electronic signatures used in such systems and their admissibility as evidence in legal proceedings should be recognised. ... The storage and copying of signature-creation data could cause a threat to the legal validity of electronic signatures. ... Harmonised criteria relating to the legal effects of electronic signatures will preserve a coherent legal framework across the Community; national law lays down different requirements for the legal validity of hand-written signatures; whereas certificates can be used to confirm the identity of a person signing electronically; advanced electronic signatures based on qualified certificates aim at a higher level of security; advanced electronic signatures which are based on a qualified certificate and which are created by a secure-signature-creation device can be regarded as legally equivalent to hand-written signatures only if the requirements for hand-written signatures are fulfilled. ... In order to contribute to the general acceptance of electronic authentication methods it has to be ensured that electronic signatures can be used as evidence in legal proceedings in all Member States; the legal recognition of electronic signatures should be based upon objective criteria and not be linked to authorisation of the certification-service-provider involved; national law governs the legal spheres in which electronic documents and electronic signatures may be used; this Directive is without prejudice to the power of a national court to make a ruling regarding conformity with the requirements of this Directive and does not affect national rules regarding the unfettered judicial consideration of evidence. ... In order to increase user confidence in electronic communication and electronic commerce, certification-service-providers must observe data protection legislation and individual privacy. ... Provisions on the use of pseudonyms in certificates should not prevent Member States from requiring identification of persons pursuant to Community or national law."
Phrases
<P1> Legal effects of electronic signatures: (1) Member States shall ensure that advanced electronic signatures which are based on a qualified certificate and which are created by a secure-signature-creation device: (a) satisfy the legal requirements of a signature in relation to data in electronic form in the same manner as a handwritten signature satisfies those requirements in relation to paper-based data; and (b) are admissible as evidence in legal proceedings.(2) Member States shall ensure that an electronic signature is not denied legal effectiveness and admissibility as evidence in legal proceedings solely on the grounds that it is: in electronic form, or not based upon a qualified certificate, or not based upon a qualified certificate issued by an accredited certification-service-provider, or not created by a secure signature-creation device. (Art. 5) <P2> Member States shall ensure that a certification-service-provider which issues certificates to the public may collect personal data only directly from the data subject, or after the explicit consent of the data subject, and only insofar as it is necessary for the purposes of issuing and maintaining the certificate. The data may not be collected or processed for any other purposes without the explicit consent of the data subject. (Art. 8) <P3> Requirements for qualified certificates: Qualified certificates must contain:(a) an indication that the certificate is issued as a qualified certificate; (b) the identification of the certification-service-provider and the State in which it is established; (c) the name of the signatory or a pseudonym, which shall be identified as such; (d) provision for a specific attribute of the signatory to be included if relevant, depending on the purpose for which the certificate is intended; (e) signature-verification data which correspond to signature-creation data under the control of the signatory; (f) an indication of the beginning and end of the period of validity of the certificate; (g) the identity code of the certificate; (h) the advanced electronic signature of the certification-service-provider issuing it; (i) limitations on the scope of use of the certificate, if applicable; and (j) limits on the value of transactions for which the certificate can be used, if applicable. (Annex I) <P4> Requirements for certification-service-providers issuing qualified certificates: Certification-service-providers must: (a) demonstrate the reliability necessary for providing certification services; (b) ensure the operation of a prompt and secure directory and a secure and immediate revocation service; (c) ensure that the date and time when a certificate is issued or revoked can be determined precisely; (d) verify, by appropriate means in accordance with national law, the identity and, if applicable, any specific attributes of the person to which a qualified certificate is issued; (e) employ personnel who possess the expert knowledge, experience, and qualifications necessary for the services provided, in particular competence at managerial level, expertise in electronic signature technology and familiarity with proper security procedures; they must also apply administrative and management procedures which are adequate and correspond to recognised standards; (f) use trustworthy systems and products which are protected against modification and ensure the technical and cryptographic security of the process supported by them; (g) take measures against forgery of certificates, and, in cases where the certification-service-provider generates signature-creation data, guarantee confidentiality during the process of generating such data; (h) maintain sufficient financial resources to operate in conformity with the requirements laid down in the Directive, in particular to bear the risk of liability for damages, for example, by obtaining appropriate insurance; (i) record all relevant information concerning a qualified certificate for an appropriate period of time, in particular for the purpose of providing evidence of certification for the purposes of legal proceedings. Such recording may be done electronically; (j) not store or copy signature-creation data of the person to whom the certification-service-provider provided key management services; (k) before entering into a contractual relationship with a person seeking a certificate to support his electronic signature inform that person by a durable means of communication of the precise terms and conditions regarding the use of the certificate, including any limitations on its use, the existence of a voluntary accreditation scheme and procedures for complaints and dispute settlement. Such information, which may be transmitted electronically, must be in writing and in readily understandable language. Relevant parts of this information must also be made available on request to third-parties relying on the certificate; (l) use trustworthy systems to store certificates in a verifiable form so that: only authorised persons can make entries and changes, information can be checked for authenticity, certificates are publicly available for retrieval in only those cases for which the certificate-holder's consent has been obtained, and any technical changes compromising these security requirements are apparent to the operator. (Annex II) <P5> Requirements for secure signature-creation devices: 1. Secure signature-creation devices must, by appropriate technical and procedural means, ensure at the least that: (a) the signature-creation-data used for signature generation can practically occur only once, and that their secrecy is reasonably assured; (b) the signature-creation-data used for signature generation cannot, with reasonable assurance, be derived and the signature is protected against forgery using currently available technology; (c) the signature-creation-data used for signature generation can be reliably protected by the legitimate signatory against the use of others. (2) Secure signature-creation devices must not alter the data to be signed or prevent such data from being presented to the signatory prior to the signature process. (Annex III) <P6> Recommendations for secure signature verification: During the signature-verification process it should be ensured with reasonable certainty that: (a) the data used for verifying the signature correspond to the data displayed to the verifier; (b) the signature is reliably verified and the result of that verification is correctly displayed; (c) the verifier can, as necessary, reliably establish the contents of the signed data; (d) the authenticity and validity of the certificate required at the time of signature verification are reliably verified; (e) the result of verification and the signatory's identity are correctly displayed; (f) the use of a pseudonym is clearly indicated; and (g) any security-relevant changes can be detected. (Annex IV)
The Semantic Web activity is a W3C project whose goal is to enable a 'cooperative' Web where machines and humans can exchange electronic content that has clear-cut, unambiguous meaning. This vision is based on the automated sharing of metadata terms across Web applications. The declaration of schemas in metadata registries advance this vision by providing a common approach for the discovery, understanding, and exchange of semantics. However, many of the issues regarding registries are not clear, and ideas vary regarding their scope and purpose. Additionally, registry issues are often difficult to describe and comprehend without a working example.
ISBN
1082-9873
Critical Arguements
CA "This article will explore the role of metadata registries and will describe three prototypes, written by the Dublin Core Metadata Initiative. The article will outline how the prototypes are being used to demonstrate and evaluate application scope, functional requirements, and technology solutions for metadata registries."
Phrases
<P1> Establishing a common approach for the exchange and re-use of data across the Web would be a major step towards achieving the vision of the Semantic Web. <warrant> <P2> The Semantic Web Activity statement articulates this vision as: 'having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration, and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be shared and processed by automated tools as well as by people.' <P3> In parallel with the growth of content on the Web, there have been increases in the amount and variety of metadata to manipulate this content. An inordinate amount of standards-making activity focuses on metadata schemas (also referred to as vocabularies or data element sets), and yet significant differences in schemas remain. <P4> Different domains typically require differentiation in the complexity and semantics of the schemas they use. Indeed, individual implementations often specify local usage, thereby introducing local terms to metadata schemas specified by standards-making bodies. Such differentiation undermines interoperability between systems. <P5> This situation highlights a growing need for access by users to in-depth information about metadata schemas and particular extensions or variations to schemas. Currently, these 'users' are human  people requesting information. <warrant> <P6> It would be helpful to make available easy access to schemas already in use to provide both humans and software with comprehensive, accurate and authoritative information. <warrant> <P7> The W3C Resource Description Framework (RDF) has provided the basis for a common approach to declaring schemas in use. At present the RDF Schema (RDFS) specification offers the basis for a simple declaration of schema. <P8> Even as it stands, an increasing number of initiatives are using RDFS to 'publish' their schemas. <P9> Registries provide 'added value' to users by indexing schemas relevant to a particular 'domain' or 'community of use' and by simplifying the navigation of terms by enabling multiple schemas to be accessed from one view. <warrant> <P10> Additionally, the establishment of registries to index terms actively being used in local implementations facilitates the metadata standards activity by providing implementation experience transferable to the standards-making process. <warrant> <P11> The overriding goal has been the development of a generic registry tool useful for registry applications in general, not just useful for the DCMI. <P12> The formulation of a 'definitive' set of RDF schemas within the DCMI that can serve as the recommended, comprehensive and accurate expression of the DCMI vocabulary has hindered the development of the DCMI registry. To some extent, this has been due to the changing nature of the RDF Schema specification and its W3C candidate recommendation status. However, it should be recognized that the lack of consensus within the DCMI community regarding the RDF schemas has proven to be equally as impeding. <P13> The automated sharing of metadata across applications is an important part of realizing the goal of the Semantic Web. Users and applications need practical solutions for discovering and sharing semantics. Schema registries provide a viable means of achieving this. <warrant>
Conclusions
RQ "Many of the issues regarding metadata registries are unclear and ideas regarding their scope and purpose vary. Additionally, registry issues are often difficult to describe and comprehend without a working example. The DCMI makes use of rapid prototyping to help solve these problems. Prototyping is a process of quickly developing sample applications that can then be used to demonstrate and evaluate functionality and technology."
SOW
DC "New impetus for the development of registries has come with the development activities surrounding creation of the Semantic Web. The motivation for establishing registries arises from domain and standardization communities, and from the knowledge management community." ... "The original charter for the DCMI Registry Working Group was to establish a metadata registry to support the activity of the DCMI. The aim was to enable the registration, discovery, and navigation of semantics defined by the DCMI, in order to provide an authoritative source of information regarding the DCMI vocabulary. Emphasis was placed on promoting the use of the Dublin Core and supporting the management of change and evolution of the DCMI vocabulary." ... "Discussions within the DCMI Registry Working Group (held primarily on the group's mailing list) have produced draft documents regarding application scope and functionality. These discussions and draft documents have been the basis for the development of registry prototypes and continue to play a central role in the iterative process of prototyping and feedback." ... The overall goal of the DCMI Registry Working Group (WG) is to provide a focus for continued development of the DCMI Metadata Registry. The WG will provide a forum for discussing registry-related activities and facilitating cooperation with the ISO 11179 community, the Semantic Web, and other related initiatives on issues of common interest and relevance.
Type
Electronic Journal
Title
The Warwick Framework: A container architecture for diverse sets of metadata
This paper is a abbreviated version of The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata. It describes a container architecture for aggregating logically, and perhaps physically, distinct packages of metadata. This "Warwick Framework" is the result of the April 1996 Metadata II Workshop in Warwick U.K.
ISBN
1082-9873
Critical Arguements
CA Describes the Warwick Framework, a proposal for linking together the various metadata schemes that may be attached to a given information object by using a system of "packages" and "containers." "[Warwick Workshop] attendees concluded that ... the route to progress on the metadata issue lay in the formulation a higher-level context for the Dublin Core. This context should define how the Core can be combined with other sets of metadata in a manner that addresses the individual integrity, distinct audiences, and separate realms of responsibility of these distinct metadata sets. The result of the Warwick Workshop is a container architecture, known as the Warwick Framework. The framework is a mechanism for aggregating logically, and perhaps physically, distinct packages of metadata. This is a modularization of the metadata issue with a number of notable characteristics. It allows the designers of individual metadata sets to focus on their specific requirements, without concerns for generalization to ultimately unbounded scope. It allows the syntax of metadata sets to vary in conformance with semantic requirements, community practices, and functional (processing) requirements for the kind of metadata in question. It separates management of and responsibility for specific metadata sets among their respective "communities of expertise." It promotes interoperability by allowing tools and agents to selectively access and manipulate individual packages and ignore others. It permits access to the different metadata sets that are related to the same object to be separately controlled. It flexibly accommodates future metadata sets by not requiring changes to existing sets or the programs that make use of them."
Phrases
<P1> The range of metadata needed to describe and manage objects is likely to continue to expand as we become more sophisticated in the ways in which we characterize and retrieve objects and also more demanding in our requirements to control the use of networked information objects. The architecture must be sufficiently flexible to incorporate new semantics without requiring a rewrite of existing metadata sets. <warrant> <P2> Each logically distinct metadata set may represent the interests of and domain of expertise of a specific community. <P3> Just as there are disparate sources of metadata, different metadata sets are used by and may be restricted to distinct communities of users and agents. <P4> Strictly partitioning the information universe into data and metadata is misleading. <P5> If we allow for the fact that metadata for an object consists of logically distinct and separately administered components, then we should also provide for the distribution of these components among several servers or repositories. The references to distributed components should be via a reliable persistent name scheme, such as that proposed for Universal Resources Names (URNs) and Handles. <P6> [W]e emphasize that the existence of a reliable URN implementation is a necessary to avoid the problems of dangling references that plague the Web. <warrant> <P7> Anyone can, in fact, create descriptive data for a networked resource, without permission or knowledge of the owner or manager of that resource. This metadata is fundamentally different from that metadata that the owner of a resource chooses to link or embed with the resource. We, therefore, informally distinguish between two categories of metadata containers, which both have the same implementation [internally referenced and externally referenced metadata containers].
Conclusions
RQ "We run the danger, with the full expressiveness of the Warwick Framework, of creating such complexity that the metadata is effectively useless. Finding the appropriate balance is a central design problem. ... Definers of specific metadata sets should ensure that the set of operations and semantics of those operations will be strictly defined for a package of a given type. We expect that a limited set of metadata types will be widely used and 'understood' by browsers and agents. However, the type system must be extensible, and some method that allows existing clients and agents to process new types must be a part of a full implementation of the Framework. ... There is a need to agree on one or more syntaxes for the various metadata sets. Even in the context of the relatively simple World Wide Web, the Internet is often unbearably slow and unreliable. Connections often fail or time out due to high load, server failure, and the like. In a full implementation of the Warwick Framework, access to a "document" might require negotiation across distributed repositories. The performance of this distributed architecture is difficult to predict and is prone to multiple points of failure. ... It is clear that some protocol work will need to be done to support container and package interchange and retrieval. ... Some examination of the relationship between the Warwick Framework and ongoing work in repository architectures would likely be fruitful.
Type
Electronic Journal
Title
Collection-Based Persistent Digital Archives - Part 1
The preservation of digital information for long periods of time is becoming feasible through the integration of archival storage technology from supercomputer centers, data grid technology from the computer science community, information models from the digital library community, and preservation models from the archivistÔÇÖs community. The supercomputer centers provide the technology needed to store the immense amounts of digital data that are being created, while the digital library community provides the mechanisms to define the context needed to interpret the data. The coordination of these technologies with preservation and management policies defines the infrastructure for a collection-based persistent archive. This paper defines an approach for maintaining digital data for hundreds of years through development of an environment that supports migration of collections onto new software systems.
ISBN
1082-9873
Critical Arguements
CA "Supercomputer centers, digital libraries, and archival storage communities have common persistent archival storage requirements. Each of these communities is building software infrastructure to organize and store large collections of data. An emerging common requirement is the ability to maintain data collections for long periods of time. The challenge is to maintain the ability to discover, access, and display digital objects that are stored within an archive, while the technology used to manage the archive evolves. We have implemented an approach based upon the storage of the digital objects that comprise the collection, augmented with the meta-data attributes needed to dynamically recreate the data collection. This approach builds upon the technology needed to support extensible database schema, which in turn enables the creation of data handling systems that interconnect legacy storage systems."
Phrases
<P1> The ultimate goal is to preserve not only the bits associated with the original data, but also the context that permits the data to be interpreted. <warrant> <P2> We rely on the use of collections to define the context to associate with digital data. The context is defined through the creation of semi-structured representations for both the digital objects and the associated data collection. <P3>A collection-based persistent archive is therefore one in which the organization of the collection is archived simultaneously with the digital objects that comprise the collection. <P4> The goal is to preserve digital information for at least 400 years. This paper examines the technical issues that must be addressed and presents a prototype implementation. <P5>Digital object representation. Every digital object has attributes that define its structure, physical context, and provenance, and annotations that describe features of interest within the object. Since the set of attributes (such as annotations) will vary across all objects within a collection, a semi-structured representation is needed. Not all digital objects will have the same set of associated attributes. <P6> If possible, a common information model should be used to reference the attributes associated with the digital objects, the collection organization, and the presentation interface. An emerging standard for a uniform data exchange model is the eXtended Markup Language (XML). <P7> A particular example of an information model is the XML Document Type Definition (DTD) which provides a description for the allowed nesting structure of XML elements. Richer information models are emerging such as XSchema (which provides data types, inheritance, and more powerful linking mechanisms) and XMI (which provides models for multiple levels of data abstraction). <P8> Although XML DTDs were originally applied to documents only, they are now being applied to arbitrary digital objects, including the collections themselves. More generally, OSDs can be used to define the structure of digital objects, specify inheritance properties of digital objects, and define the collection organization and user interface structure. <P9> A persistent collection therefore needs the following components of an OSD to completely define the collection context: Data dictionary for collection semantics; Digital object structure; Collection structure; and User interface structure. <P10> The re-creation or instantiation of the data collection is done with a software program that uses the schema descriptions that define the digital object and collection structure to generate the collection. The goal is to build a generic program that works with any schema description. <P11> The information for which driver to use for access to a particular data set is maintained in the associated Meta-data Catalog (MCAT). The MCAT system is a database containing information about each data set that is stored in the data storage systems. <P12> The data handling infrastructure developed at SDSC has two components: the SDSC Storage Resource Broker (SRB) that provides federation and access to distributed and diverse storage resources in a heterogeneous computing environment, and the Meta-data Catalog (MCAT) that holds systemic and application or domain-dependent meta-data about the resources and data sets (and users) that are being brokered by the SRB. <P13> A client does not need to remember the physical mapping of a data set. It is stored as meta-data associated with the data set in the MCAT catalog. <P14> A characterization of a relational database requires a description of both the logical organization of attributes (the schema), and a description of the physical organization of attributes into tables. For the persistent archive prototype we used XML DTDs to describe the logical organization. <P15> A combination of the schema and physical organization can be used to define how queries can be decomposed across the multiple tables that are used to hold the meta-data attributes. <P16> By using an XML-based database, it is possible to avoid the need to map between semi-structured and relational organizations of the database attributes. This minimizes the amount of information needed to characterize a collection, and makes the re-creation of the database easier. <warrant> <P17> Digital object attributes are separated into two classes of information within the MCAT: System-level meta-data that provides operational information. These include information about resources (e.g., archival systems, database systems, etc., and their capabilities, protocols, etc.) and data objects (e.g., their formats or types, replication information, location, collection information, etc.); Application-dependent meta-data that provides information specific to particular data sets and their collections (e.g., Dublin Core values for text objects). <P18> Internally, MCAT keeps schema-level meta-data about all of the attributes that are defined. The schema-level attributes are used to define the context for a collection and enable the instantiation of the collection on new technology. <P19> The logical structure should not be confused with database schema and are more general than that. For example, we have implemented the Dublin Core database schema to organize attributes about digitized text. The attributes defined in the logical structure that is associated with the Dublin Core schema contains information about the subject, constraints, and presentation formats that are needed to display the schema along with information about its use and ownership. <P20> The MCAT system supports the publication of schemata associated with data collections, schema extension through the addition or deletion of new attributes, and the dynamic generation of the SQL that corresponds to joins across combinations of attributes. <P21> By adding routines to access the schema-level meta-data from an archive, it is possible to build a collection-based persistent archive. As technology evolves and the software infrastructure is replaced, the MCAT system can support the migration of the collection to the new technology.
Conclusions
RQ Collection-Based Persistent Digital Archives - Part 2
SOW
DC "The technology proposed by SDSC for implementing persistent archives builds upon interactions with many of these groups. Explicit interactions include collaborations with Federal planning groups, the Computational Grid, the digital library community, and individual federal agencies." ... "The data management technology has been developed through multiple federally sponsored projects, including the DARPA project F19628-95-C-0194 "Massive Data Analysis Systems," the DARPA/USPTO project F19628-96-C-0020 "Distributed Object Computation Testbed," the Data Intensive Computing thrust area of the NSF project ASC 96-19020 "National Partnership for Advanced Computational Infrastructure," the NASA Information Power Grid project, and the DOE ASCI/ASAP project "Data Visualization Corridor." Additional projects related to the NSF Digital Library Initiative Phase II and the California Digital Library at the University of California will also support the development of information management technology. This work was supported by a NARA extension to the DARPA/USPTO Distributed Object Computation Testbed, project F19628-96-C-0020."
Type
Electronic Journal
Title
Collection-Based Persistent Digital Archives - Part 2
"Collection-Based Persistent Digital Archives: Part 2" describes the creation of a one million message persistent E-mail collection. It discusses the four major components of a persistent archive system: support for ingestion, archival storage, information discovery, and presentation of the collection. The technology to support each of these processes is still rapidly evolving, and opportunities for further research are identified.
ISBN
1082-9873
Critical Arguements
CA "The multiple migration steps can be broadly classified into a definition phase and a loading phase. The definition phase is infrastructure independent, whereas the loading phase is geared towards materializing the processes needed for migrating the objects onto new technology. We illustrate these steps by providing a detailed description of the actual process used to ingest and load a million-record E-mail collection at the San Diego Supercomputer Center (SDSC). Note that the SDSC processes were written to use the available object-relational databases for organizing the meta-data. In the future, it may be possible to go directly to XML-based databases."
Phrases
<P1> The processes used to ingest a collection, transform it into an infrastructure independent form, and store the collection in an archive comprise the persistent storage steps of a persistent archive. The processes used to recreate the collection on new technology, optimize the database, and recreate the user interface comprise the retrieval steps of a persistent archive. <P2> In order to build a persistent collection, we consider a solution that "abstracts" all aspects of the data and its preservation. In this approach, data object and processes are codified by raising them above the machine/software dependent forms to an abstract format that can be used to recreate the object and the processes in any new desirable forms. <P3> The SDSC infrastructure uses object-relational databases to organize information. This makes data ingestion more complex by requiring the mapping of the XML DTD semi-structured representation onto a relational schema. <P4> The SDSC infrastructure uses object-relational databases to organize information. This makes data ingestion more complex by requiring the mapping of the XML DTD semi-structured representation onto a relational schema. <P5> The steps used to store the persistent archive were: (1) Define Digital Object: define meta-data, define object structure (OBJ-DTD) --- (A), define object DTD to object DDL mapping --- (B) (2) Define Collection: define meta-data, define collection structure (COLL-DTD) --- (C), define collection DTD structure to collection DDL mapping --- (D) (3) Define Containers: define packing format for encapsulating data and meta-data (examples are the AIP standard, Hierarchical Data Format, Document Type Definition) <P5> In the ingestion phase, the relational and semi-structured organization of the meta-data is defined. No database is actually created, only the mapping between the relational organization and the object DTD. <P6> Note that the collection relational organization does not have to encompass all of the attributes that are associated with a digital object. Separate information models are used to describe the objects and the collections. It is possible to take the same set of digital objects and form a new collection with a new relational organization. <P7> Multiple communities across academia, the federal government, and standards groups are exploring strategies for managing very large archives. The persistent archive community needs to maintain interactions with these communities to track development of new strategies for data management and storage. <warrant> <P8>
Conclusions
RQ "The four major components of the persistent archive system are support for ingestion, archival storage, information discovery, and presentation of the collection. The first two components focus on the ingestion of data into collections. The last two focus on access to the resulting collections. The technology to support each of these processes is still rapidly evolving. Hence consensus on standards has not been reached for many of the infrastructure components. At the same time, many of the components are active areas of research. To reach consensus on a feasible collection-based persistent archive, continued research and development is needed. Examples of the many related issues are listed below:
CA This is the first of four articles describing Geospatial Standards and the standards bodies working on these standards. This article will discuss what geospatial standards are and why they matter, identify major standards organizations, and list the characteristics of successful geospatial standards.
Conclusions
RQ Which federal and international standards have been agreed upon since this article's publication?
SOW
DC FGDC approved the Content Standard for Digital Geospatial Metadata (FGDC-STD-001-1998) in June 1998. FGDC is a 19-member interagency committee composed of representatives from the Executive Office of the President, Cabinet-level and independent agencies. The FGDC is developing the National Spatial Data Infrastructure (NSDI) in cooperation with organizations from State, local and tribal governments, the academic community, and the private sector. The NSDI encompasses policies, standards, and procedures for organizations to cooperatively produce and share geographic data.
CA "The purpose of this document is: (1) To provide a better understanding of the functionality that the MPEG-21 multimedia framework should be capable of providing; (2) To offer high level descriptions of different MPEG-21 applications against which the formal requirements for MPEG-21 can be checked; (3) To act as a basis for devising Core Experiments which establish proof of concept; (4) To provide a point of reference to support the evaluation of responses submitted against ongoing MPEG-21 Calls for Proposals; (5) To be a 'Public Relations' instrument that can help to explain what MPEG-21 is about."
Conclusions
RQ not applicable
SOW
DC The Moving Picture Experts Group (MPEG) is a working group of ISO/IEC, made up of some 350 members from various industries and universities, in charge of the development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio and their combination. MPEG's official designation is ISO/IEC JTC1/SC29/WG11. So far MPEG has produced the following compression formats and ancillary standards: MPEG-1, the standard for storage and retrieval of moving pictures and audio on storage media (approved Nov. 1992); MPEG-2, the standard for digital television (approved Nov. 1994); MPEG-4, the standard for multimedia applications; MPEG-7, the content representation standard for multimedia information search, filtering, management and processing; and MPEG-21, the multimedia framework.
CA Databases have structure, and annotating them (with metadata for example) can be difficult. Work with semistructured data models in conjunction with the use of XML may solve this problem of accommodating unanticipated structures in databases.
Conclusions
RQ It's critical to develop tools to help data curators record and repeat the corrections they make.
This is one of a series of guides produced by the Cedars digital preservation project. This guide concentrates on the technical approaches that Cedars recommends as a result of its experience. The accent is on preservation, without which continued access is not possible. The time scale is at least decades, i.e. way beyond the lifetime of any hardware technology. The overall preservation strategy is to remove the data from its medium of acquisition and to preserve the digital content as a stream of bytes. There is good reason to be confident that data held as a stream of bytes can be preserved indefinitely. Just as there is no access without preservation, preservation with no prospect of future access is a very sterile exercise. As well as preserving the data as a byte-stream, Cedars adds in metadata. This includes reference to facilities (called technical metadata in this document) for accessing the intellectual content of the preserved data. This technical metadata will usually include actual software for use in accessing the data. It will be stored as a preserved object in the overall archive store, and will be revised as technology evolves making new methods of access to preserved objects appropriate. There will be big economies of scale, as most, if not all, objects of the same type will share the same technical metadata. Cedars recommends against repeated format conversions, and instead argues for keeping the preserved byte-stream, while tracking evolving technology by maintaining the technical metadata. It is for this reason that Cedars includes only a reference to the technical metadata in the preserved data object. Thus future users of the object will be pointed to information appropriate to their own era, rather than that of the object's preservation. The monitoring and updating of this aspect of the technical metadata is a vital function of the digital library. In practice, Cedars expects that very many preserved digital objects will be in the same format, and will reference the same technical metadata. Access to a preserved object then involves Migration on Request, in that any necessary migration from an obsolete format to an appropriate current day format happens at the point of request. As well as recommending actions to be taken to preserve digital objects, Cedars also recommends the use of a permanent naming scheme, with a strong recommendation that such a scheme should be infinitely extensible.
Critical Arguements
CA "This document is intended to inform technical practitioners in the actual preservation of digital materials, and also to highlight to library management the importance of this work as continuing their traditional scholarship role into the 21st century."
This document provides some background on preservation metadata for those interested in digital preservation. It first attempts to explain why preservation metadata is seen as an essential part of most digital preservation strategies. It then gives a broad overview of the functional and information models defined in the Reference Model for an Open Archival Information System (OAIS) and describes the main elements of the Cedars outline preservation metadata specification. The next sections take a brief look at related metadata initiatives, make some recommendations for future work and comment on cost issues. At the end there are some brief recommendations for collecting institutions and the creators of digital content followed by some suggestions for further reading.
Critical Arguements
CA "This document is intended to provide a brief introduction to current preservation metadata developments and introduce the outline metadata specifications produced by the Cedars project. It is aimed in particular at those who may have responsibility for digital preservation in the UK further and higher education community, e.g. senior staff in research libraries and computing services. It should also be useful for those undertaking digital content creation (digitisation) initiatives, although it should be noted that specific guidance on this is available elsewhere. The guide may also be of interest to other kinds of organisations that have an interest in the long-term management of digital resources, e.g. publishers, archivists and records managers, broadcasters, etc. This document aimes to provide: A rationale for the creation and maintenance of preservation metadata to support digital preservation strategies, e.g. migration or emulation; An introduction to the concepts and terminology used in the influential ISO Reference Model for an Open Archival Information System (OAIS); Brief information on the Cedars outline preservation metadata specification and the outcomes of some related metadata initiatives; Some notes on the cost implications of preservation metadata and how these might be reduced.
Conclusions
RQ "In June 2000, a group of archivists, computer scientists and metadata experts met in the Netherlands to discuss metadata developments related to recordkeeping and the long-term preservation of archives. One of the key conclusions made at this working meeting was that the recordkeeping metadata communities should attempt to co-operate more with other metatdata initiatives. The meeting also suggested research into the contexts of creation and use, e.g. identifying factors that might encourage or discourage creators form meeting recordkeeping metadata requirements. This kind of research would also be useful for wider preservation metadata developments. One outcome of this meeting was the setting up of an Archiving Metadata Forum (AMF) to form the focus of future developments." ... "Future work on preservation metadata will need to focus on several key issues. Firstly, there is an urgent need for more practical experience of undertaking digital preservation strategies. Until now, many preservation metadata initiatives have largely been based on theoretical considerations or high-level models like the OAIS. This is not in itself a bad thing, but it is now time to begin to build metadata into the design of working systems that can test the viability of digital preservation strategies in a variety of contexts. This process has already begun in initiatives like the Victorian Electronic Records Stategy and the San Diego Supercomputer Center's 'self-validating knowledge-based archives'. A second need is for increased co-operation between the many metadata initiatives that have an interest in digital preservation. This may include the comparison and harmonisation of various metadata specifications, where this is possible. The OCLC/LG working group is an example of how this has been taken forward whitin a particular domain. There is a need for additional co-operation with recordkeeping metadata specialists, computing scientists and others in the metadata research community. Thirdly, there is a need for more detailed research into how metadata will interact with different formats, preservation strategies and communities of users. This may include some analysis of what metadata could be automatically extracted as part of the ingest process, an investigation of the role of content creators in metadata provision, and the production of user requirements." ... "Also, thought should be given to the development of metadata standards that will permit the easy exchange of preservation metadata (and information packages) between repositories." ... "As well as ensuring that digital repositories are able to facilitate the automatic capture of metadata, some thought should also be given to how best digital repositories could deal with any metadata that might already exist."
SOW
DC "Funded by JISC (the Joint Information Systems Committee of the UK higher education funding councils), as part of its Electronic Libraries (eLib) Programme, Cedars was the only project in the programme to focus on digital preservation." ... "In the digitial library domain, the development of a recommendation on preservation metadata is being co-ordinated by a working group supported by OCLC and the RLG. The membership of the working group is international, and inlcudes key individuals who were involved in the development of the Cedars, NEDLIB and NLA metadata specifications."
Type
Web Page
Title
Practical Tools for Electronic Records Management and Preservation
"This briefing paper summarizes the results of a cooperative project sponsored in part, by a research grant from the National Historical Publications and Records Commission. The project, called "Models for Action: Practical Approaches to Electronic Records Management and Preservation," focused on the development of practical tools to support the integration of essential electronic records management requirements into the design of new information systems. The project was conducted from 1996 to 1998 through a partnership between the New York State Archives and Records Administration and the Center for Technology in Government. The project team also included staff from the NYS Adirondack Park Agency, eight corporate partners led by Intergraph Corporation, and University at Albany faculty and graduate students."
Publisher
Center for Technology in Government
Critical Arguements
CA "This briefing paper bridges the gap between theory and practice by presenting generalizable tools that link records management practices to business objectives."
Type
Web Page
Title
eXtensible rights Markup Language (XrML) 2.0 Specification Part I: Primer
This specification defines the eXtensible rights Markup Language (XrML), a general-purpose language in XML used to describe the rights and conditions for using digital resources.
Publisher
ContentGuard
Critical Arguements
CA This chapter provides an overview of XrML. It provides a basic definition of XrML, describes the need that XrML is meant to address, and explains design goals for the language.
Conclusions
RQ not applicable
SOW
DC ContentGuard contributed XrML to MPEG-21, the OASIS Rights Language Technical Committee and the Open eBook Forum (OeBF). In each case they are using XrML as the base for their rights language specification. Furthest along is MPEG, where the process has reached Committee Draft. They have also recommended to other standards bodies to build on this work. ContentGuard will propose XrML to any standards organization seeking a rights language. Because of this progress ContentGuard has frozen its release of XrML at Version 2.0.
CA ContentGuard intends to submit XrML to standards bodies that are developing specifications that enable the exchange and trading of content as well as the creation of repositories for storage and management of digital content.
SOW
DC ContentGuard contributed XrML to MPEG-21, the OASIS Rights Language Technical Committee and the Open eBook Forum (OeBF). In each case they are using XrML as the base for their rights language specification. Furthest along is MPEG, where the process has reached Committee Draft. They have also recommended to other standards bodies to build on this work. ContentGuard will propose XrML to any standards organization seeking a rights language. Because of this progress ContentGuard has frozen its release of XrML at Version 2.0.
Type
Web Page
Title
Approaches towards the Long Term Preservation of Archival Digital Records
The Digital Preservation Testbed is carrying out experiments according to pre-defined research questions to establish the best preservation approach or combination of approaches. The Testbed will be focusing its attention on three different digital preservation approaches - Migration; Emulation; and XML - evaluating the effectiveness of these approaches, their limitations, costs, risks, uses, and resource requirements.
Language
English; Dutch
Critical Arguements
CA "The main problem surrounding the preservation of authentic electronic records is that of technology obsolescence. As changes in technology continue to increase exponentially, the problem arises of what to do with records that were created using old and now obsolete hardware and software. Unless action is taken now, there is no guarantee that the current computing environment (and thus also records) will be accessible and readable by future computing environments."
Conclusions
RQ "The Testbed will be conducting research to discover if there is an inviolable way to associate metadata with records and to assess the limitations such an approach may incur. We are also working on the provision of a proposed set of preservation metadata that will contain information about the preservation approach taken and any specific authenticity requirements."
SOW
DC The Digital Preservation Testbed is part of the non-profit organisation ICTU. ICTU is the Dutch organisation for ICT and government. ICTU's goal is to contribute to the structural development of e-government. This will result in improving the work processes of government organisations, their service to the community and interaction with the citizens. Government institutions, such as Ministries, design the policies in the area of e-government, and ICTU translates these policies into projects. In many cases, more than one institution is involved in a single project. They are the principals in the projects and retain control concerning the focus of the project. In case of the Digital Preservation Testbed the principals are the Ministry of the Interior and the Dutch National Archives.
Type
Web Page
Title
Towards a Digital Rights Expression Language Standard for Learning Technology
CA The Learning Technology Standards Committee (LTSC) of the Institute for Electrical and Electronic Engineers (IEEE) concentrated on making recommendations for standardizing a digital rights expression language (DREL) with the specific charge to (1) Investigate existing standards development efforts for DREL and digital rights. (2) Gather DREL requirements germane to the learning, education, and training industries. (3) Make recommendations as to how to proceed. (4) Feed requirements into ongoing DREL and digital rights standardization efforts, regardless of whether the LTSC decides to work with these efforts or embark on its own. This report represents the achievement of these goals in the form a of a white paper that can be used as reference for the LTSC, that reports on the current state of existing and proposed standardization efforts targeting digital rights expression languages and makes recommendations concerning future work.
Conclusions
RQ The recommendations of this report are: 1. Maintain appropriate liaisons between learning technology standards development organizations and those standards development organizations standardizing rights expression languages. The purpose of these liaisons is to continue to feed requirements into broader standardization efforts and to ensure that the voice of the learning, education and training community is heard. 2. Support the creation of application profiles or extensions of XrML and ODRL that include categories and vocabularies for roles common in educational and training settings. In the case of XrML, a name space for local context may be needed. (A name space is required for both XrML and ODRL for the ÔÇ£application profileÔÇØ or specifically the application ÔÇôLT application- extension) 3. Advocate the creation of a standard for expressing local policies in ways that can be mapped to rights expressions. This could be either through a data model or through the definition of an API or service. 4. Launch an initiative to identify models of rights enforcement in learning technology and to possibly abstract a common model for use by architecture and framework definition projects. 5. Further study the implications of patent claims, especially for educational and research purposes.
CA Overview of the program, including keynote speakers, papers presented, invited talks, future directions and next steps.
Conclusions
RQ Some steps to be taken: (1) Investigate potential move to a formal standards body/group and adopt their procedures and processes. Potential groups include; W3C, OASIS, ECMA, IEEE, IETF, CEN/ISS, Open Group. The advantages and disadvantages of such a move will be documented and discussed within the ODRL community. (2) Potential to submit current ODRL version to national bodies for adoption. (3) Request formal liaison relationship with the OMA. <warrant>
This document is a draft version 1.0 of requirements for a metadata framework to be used by the International Press Telecommunications Council for all new and revised IPTC standards. It was worked on and agreed to by members of the IPTC Standards Committee, who represented a variety of newspaper, wire agencies, and other interested members of the IPTC.
Notes
Misha Wolf is also listed as author.
Publisher
International Press Telecommunications Council (IPTC)
Critical Arguements
CA "This Requirements document forms part of the programme of work called ITPC Roadmap 2005. The Specification resulting from these Requirements will define the use of metadata by all new IPTC standards and by new major versions of existing IPTC standards." (p. 1) ... "The purpose of the News Metadata Framework (NMDF) WG is to specify how metadata will be expressed, referenced, and managed in all new major versions of IPTC standards. The NMF WG will: Gather, discuss, agree and document functional requirements for the ways in which metadata will be expressed, referenced and managed in all new major versions of IPTC standards; Discuss, agree and document a model, satisfying these requirements; Discuss, agree and document possible approaches to expressing this model in XML, and select those most suited to the tasks. In doing so, the NMDF WG will, where possible, make use of the work of other standards bodies. (p. 2)
Conclusions
RQ "Open issues include: The versioning of schemes, including major and minor versions, and backward compatibility; the versioning of TopicItems; The design of URIs for TopicItem schemes and TopicItem collections, including the issues of: versions (relating to TopicItems, schemes, and collections); representations (relating to TopicItems and collections); The relationship between a [scheme, code] pair, the corresponding URI and the scheme URI." (p. 17)
SOW
DC The development of this framework came out of the 2003 News Standards Summit, which was attended by representatives from over 80 international press and information agencies ... "The News Standards Summit brings together major players--experts on news metadata standards as well as commercial news providers, users, and aggregators. Together, they will analyze the current state and future expectations for news and publishing XML and metadata efforts from both the content and processing model perspectives. The goal is to increase understanding and to drive practical, productive convergence." ... This is a draft version of the standard.
Joined-up government needs joined-up information systems. The e-Government Metadata Standard (e-GMS) lays down the elements, refinements and encoding schemes to be used by government officers when creating metadata for their information resources or designing search interfaces for information systems. The e-GMS is needed to ensure maximum consistency of metadata across public sector organisations.
Publisher
Office of the e-Envoy, Cabinet Office, UK.
Critical Arguements
CA "The e-GMS is concerned with the particular facets of metadata intended to support resource discovery and records management. The Standard covers the core set of ÔÇÿelementsÔÇÖ that contain data needed for the effective retrieval and management of official information. Each element contains information relating to a particular aspect of the information resource, e.g. 'title' or 'creator'. Further details on the terminology being used in this standard can be found in Dublin Core and Part Two of the e-GIF."
Conclusions
RQ "The e-GMS will need to evolve, to ensure it remains comprehensive and consistent with changes in international standards, and to cater for changes in use and technology. Some of the elements listed here are already marked for further development, needing additional refinements or encoding schemes. To limit disruption and cost to users, all effort will be made to future-proof the e-GMS. In particular we will endeavour: not to remove any elements or refinements; not to rename any elements or refinements; not to add new elements that could contain values contained in the existing elements."
SOW
DC The E-GMS is promulgated by the British government as part of its e-government initiative. It is the technical cornerstone of the e-government policy for joining up the public sector electronically and providing modern, improved public services.
During the past decade, the recordkeeping practices in public and private organizations have been revolutionized. New information technologies from mainframes, to PC's, to local area networks and the Internet have transformed the way state agencies create, use, disseminate, and store information. These new technologies offer a vastly enhanced means of collecting information for and about citizens, communicating within state government and between state agencies and the public, and documenting the business of government. Like other modern organizations, Ohio state agencies face challenges in managing and preserving their records because records are increasingly generated and stored in computer-based information systems. The Ohio Historical Society serves as the official State Archives with responsibility to assist state and local agencies in the preservation of records with enduring value. The Office of the State Records Administrator within the Department of Administrative Services (DAS) provides advice to state agencies on the proper management and disposition of government records. Out of concern over its ability to preserve electronic records with enduring value and assist agencies with electronic records issues, the State Archives has adapted these guidelines from guidelines created by the Kansas State Historical Society. The Kansas State Historical Society, through the Kansas State Historical Records Advisory Board, requested a program development grant from the National Historical Publications and Records Commission to develop policies and guidelines for electronic records management in the state of Kansas. With grant funds, the KSHS hired a consultant, Dr. Margaret Hedstrom, an Associate Professor in the School of Information, University of Michigan and formerly Chief of State Records Advisory Services at the New York State Archives and Records Administration, to draft guidelines that could be tested, revised, and then implemented in Kansas state government.
Notes
These guidelines are part of the ongoing effort to address the electronic records management needs of Ohio state government. As a result, this document continues to undergo changes. The first draft, written by Dr. Margaret Hedstrom, was completed in November of 1997 for the Kansas State Historical Society. That version was reorganized and updated and posted to the KSHS Web site on August 18, 1999. The Kansas Guidelines were modified for use in Ohio during September 2000
Critical Arguements
CA "This publication is about maintaining accountability and preserving important historical records in the electronic age. It is designed to provide guidance to users and managers of computer systems in Ohio government about: the problems associated with managing electronic records, special recordkeeping and accountability concerns that arise in the context of electronic government; archival strategies for the identification, management and preservation of electronic records with enduring value; identification and appropriate disposition of electronic records with short-term value, and
Type
Web Page
Title
President of the Republic's Decree No. 137/2003 of 7 April 2003: "Regulation on Coordination Provisions in Matter of Electronic Signatures"
translated from Italian by Fiorella Foscarini of InterPARES
Conclusions
RQ The differentiation between internal and incoming/outgoing records, which is related to the complexities and costs of such a certification system, may impact the long-term preservation of heavy-signed and light-signed records and poses questions about different records' legal values and organizations' accountability. The paragraph about cut-back refers to the destruction of documents, not records. It is, however, significantly ambiguous.
SOW
DC Modifies the President of the Republic's Decree No. 445/2000 of 28 December 2000.
Type
Web Page
Title
Legislative Decree No. 10 of 23 January 2002: "Acknowledgement of the Directive No. 1999/93/CE on a Community Framework for Electronic Signatures"
translated from Italian by Fiorella Foscarini of InterPARES
Critical Arguements
CA Italian implementation of E.U. Directive No. 1999/93/CE on a Community Framework for Electronic Signatures. Article 6 (which replaces Article 10 of DPR 445/2000) defines the form and effectiveness of electronic records.
Type
Web Page
Title
President of the Republic's Decree No. 445/2000 of 8 December 2000: "Testo unico delle disposizioni legislative e regolamentari in materia di documentazione amministrativa"
Requirements for Electronic Records Management Systems includes: (1) "Functional Requirements" (http://www.nationalarchives.gov.uk/electronicrecords/reqs2002/pdf/requirementsfinal.pdf); (2) "Metadata Standard" (the subject of this record); (3) Reference Document (http://www.nationalarchives.gov.uk/electronicrecords/reqs2002/pdf/referencefinal.pdf); and (4) "Implementation Guidance: Configuration and Metadata Issues" (http://www.nationalarchives.gov.uk/electronicrecords/reqs2002/pdf/implementation.pdf)
Publisher
Public Records Office, [British] National Archives
Critical Arguements
CA Sets out the implications for records management metadata in compliant systems. It has been agreed with the Office of the e-Envoy that this document will form the basis for an XML schema to support the exchange of records metadata and promote interoperability between ERMS and other systems
SOW
DC The National Archives updated the functional requirements for electronic records management systems (ERMS) in collaboration with the central government records management community during 2002. The revision takes account of developments in cross-government and international standards since 1999.
Type
Web Page
Title
The MPEG-21 Rights Expression Language: A White Paper
CA Presents the business case for a Digital Rights Expression Language, an overview of the DRM landscape, a discussion of the history and role of standards in business, and some technical aspects of MPEG-21. "[U]nless the rights to ... content can be packaged within machine-readable licences, guaranteed to be ubiquitous, unambiguous and secure, which can then be processed consistently and reliably, it is unlikely that content owners will trust consign [sic] their content to networks. The MPEG Rights Expression Language (REL) is designed to provide the functionality required by content owners in order to create reliable, secure licences for content which can be used throughout the value chain, from content creator to content consumer."
Conclusions
RQ "While true interoperability may still be a distant prospect, a common rights expression language, with extensions based on the MPEG REL, can incrementally bring many of the benefits true interoperability will eventually yield. As extensions are created in multiple content verticals, it will be possible to transfer content generated in one securely to another. This will lead to cross channel fertilisation and the growth of multimedia content. At the same time, a common rights language will also lead to the possibility of broader content distribution (by enabling cross-DRM portability), thus providing more channel choice for consumers. It is this vision of the MPEG REL spreading out that is such an exciting prospect. ... The history of MPEG standards would seem to suggest that implementers will start building to the specification in mid-2003, coincidental with the completion of the standard. This will be followed by extensive take-up within two or three years, so that by mid 2006, the MPEG REL will be a pervasive technology, implemented across many different digital rights management and conditional access systems, in both the content industries and in other, non-rights based industries. ... The REL will ultimately become a 'transparent' technology, as invisible to the user as the phone infrastructure is today."
SOW
DC DC The Moving Picture Experts Group (MPEG) is a working group of ISO/IEC, made up of some 350 members from various industries and universities, in charge of the development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio and their combination. MPEG's official designation is ISO/IEC JTC1/SC29/WG11. So far MPEG has produced the following compression formats and ancillary standards: MPEG-1, the standard for storage and retrieval of moving pictures and audio on storage media (approved Nov. 1992); MPEG-2, the standard for digital television (approved Nov. 1994); MPEG-4, the standard for multimedia applications; MPEG-7, the content representation standard for multimedia information search, filtering, management and processing; and MPEG-21, the multimedia framework.
Type
Web Page
Title
Archiving of Electronic Digital Data and Records in the Swiss Federal Archives (ARELDA): e-government project ARELDA - Management Summary
The goal of the ARELDA project is to find long-term solutions for the archiving of digital records in the Swiss Federal Archives. This includes the accession, the long-term storage, preservation of data, description, and access for the users of the Swiss Federal Archives. It is also coordinated with the basic efforts of the Federal Archives to realize a uniform records management solution in the federal administration and therefore to support the pre-archival creation of documents of archival value for the benefits of the administration as well as of the Federal Archives. The project is indispensable for the long-term execution of the Federal Archives Act; Older IT systems are being replaced by newer ones. A complete migration of the data is sometimes not possible or too expensive; A constant increase of small database applications, built and maintained by people with no IT background; More and more administrative bodies are introducing records and document management systems.
Publisher
Swiss Federal Archives
Publication Location
Bern
Critical Arguements
CA "Archiving in general is a necessary prerequisite for the reconstruction of governmental activities as well as for the principle of legal certainty. It enables citizens to understand governmental activities and ensures a democratic control of the federal administration. And finally are archives a prerequisite for the scientific research, especially in the social and historical fields and ensure the preservation of our cultural heritage. It plays a vital role for an ongoing and efficient records management. A necessary prerequisite for the Federal Archives in the era of the information society will be the system ARELDA (Archiving of Electronic Data and Records)."
Conclusions
RQ "Because of the lack of standard solutions and limited or lacking personal resources for an internal development effort, the realisation of ARELDA will have to be outsourced and the cooperation with the IT division and the Federal Office for Information Technology, Systems and Telecommunication must be intensified. The guidelines for the projects are as follows:
SOW
DC ARELDA is one of the five key projects in the Swiss government's e-government strategy.
Museums and the Online Archive of California (MOAC) builds on existing standards and their implementation guidelines provided by the Online Archive of California (OAC) and its parent organization, the California Digital Library (CDL). Setting project standards for MOAC consisted of interpreting existing OAC/CDL documents and adapting them to the projects specific needs, while at the same time maintaining compliance with OAC/CDL guidelines. The present overview over the MOAC technical standards references both the OAC/CDL umbrella document and the MOAC implementation / adaptation document at the beginning of each section, as well as related resources which provide more detail on project specifications.
Critical Arguements
CA The project implements specifications for digital image production, as well as three interlocking file exchange formats for delivering collections, digital images and their respective metadata. Encoded Archival Description (EAD) XML describes the hierarchy of a collection down to the item-level and traditionally serves for discovering both the collection and the individual items within it. For viewing multiple images associated with a single object record, MOAC utilizes Making of America 2 (MOA2) XML. MOA2 makes the images representing an item available to the viewer through a navigable table of contents; the display mimics the behavior of the analog item by e.g. allowing end-users to browse through the pages of an artist's book. Through the further extension of MOA2 with Text Encoding Initiative (TEI) Lite XML, not only does every single page of the book display in its correct order, but a transcription of its textual content also accompanies the digital images.
Conclusions
RQ "These two instances of fairly significant changes in the project's specifications may serve as a gentle reminder that despite its solid foundation in standards, the MOAC information architecture will continue to face the challenge of an ever-changing technical environment."
SOW
DC The author is Digital Media Developer at the UC Berkeley Art Museum & Pacific Film Archives, a member of the MOAC consortium.