InterPARES 2 Project - Warrant Database

<< Reset Search

1 Results found for: Lynch,Clifford

Type	Journal
Title	When Documents Deceive: Trust and Provenance as New Factors for Information Retrieval in a Tangled Web
Author(s)	Lynch,Clifford
Descriptors	description / RDF / XML / academic / hypothetical / overview / practical / systems / scholarly
	Click to View All Information
Periodical Name	Journal of the American Society for Information Science and Technology
Periodical Abbreviation	JASIST
Publication Year	2001
Volume	52
Issue	1
Pages	12
Publisher	John Wiley & Sons
Critical Arguements	"This brief and somewhat informal article outlines a personal view of the changing framework for information retrieval suggested by the Web environment, and then goes on to speculate about how some of these changes may manifest in upcoming generations of information retrieval systems. It also sketches some ideas about the broader context of trust management infrastructure that will be needed to support these developments, and it points towards a number of new research agendas that will be critical during this decade. The pursuit of these agendas is going to call for new collaborations between information scientists and a wide range of other disciplines." (p. 12) Discusses public key infrastructure (PKI) and Pretty Good Practice (PGP) systems as steps toward ensuring the trustworthiness of metadata online, but explains their limitations. Makes a distinction between the identify of providers of metadata and their behavior, arguing that it is the latter we need to be concerned with.
Phrases	<P1> Surrogates are assumed to be accurate because they are produced by trusted parties, who are the only parties allowed to contribute records to these databases. Documents (full documents or surrogate records) are viewed as passive; they do not actively deceive the IR system.... Compare this to the realities of the Web environment. Anyone can create any metadata they want about any object on the net, with any motivation. (p. 13) <P2> Sites interested in manipulating the results of the indexing process rapidly began to exploit the difference between the document as viewed by the user and the document as analyzed by the indexing crawler through a set of techniques broadly called "index spamming." <P3> Pagejacking might be defined generally as providing arbitrary documents with independent arbitrary index entries. Clearly, building information retrieval systems to cope with this environment is a huge problem. (p. 14) <P4> [T]he tools are coming into place that let one determine the source of a metadata assertion (or, more precisely and more generally) the identity of the person or organization that stands behind the assertion, and to establish a level of trust in this identity. (p. 16) <P5> It is essential to recognize that in the information retrieval context one is not concerned so much with identity as with behavior. ... This distinction is often overlooked or misunderstood in discussions about what problems PKI is likely to solve: identity alone does not necessarily solve the problem of whether to trust information provided by, or warranted by, that identity. ... And all of the technology for propagating trust, either in hierarchical (PKI) or web-of-trust identity management, is purely about trust in identity. (p. 16) <P6> The question of formalizing and recording expectations about behavior, or trust in behavior, are extraordinarily complex, and as far as I know, very poorly explored. (p. 16) <P7> [A]n appeal to certification or rating services simply shifts the problem: how are these services going to track, evaluate, and rate behavior, or certify skills and behavior? (p. 16) <P8> An individual should be able to decide how he or she is willing to have identity established, and when to believe information created by or associated with such an identity. Further, each individual should be able to have this personal database evolve over time based on experience and changing beliefs. (p. 16) <P9> [T]he ability to scale and to respond to a dynamic environment in which new information sources are constantly emerging is also vital.<P10> In determining what data a user (or an indexing system, which may make global policy decisions) is going to consider in matching a set of search criteria, a way of defining the acceptable level of trust in the identity of the source of the data will be needed. (p. 16) <P10> Only if the data is supported by both sufficient trust in the identity of the source and the behavior of that identity will it be considered eligible for comparison to the search criteria. Alternatively, just as ranking of result sets provided a more flexible model of retrieval than just deciding whether documents or surrogates did or did not match a group of search criteria, one can imagine developing systems that integrate confidence in the data source (both identity and behavior, or perhaps only behavior, with trust in identity having some absolute minimum value) into ranking algorithms. (p. 17) <P11> As we integrate trust and provenance into the next generations of information retrieval systems we must recognize that system designers face a heavy burden of responsibility. ... New design goals will need to include making users aware of defaults; encouraging personalization; and helping users to understand the behavior of retrieval systems <warrant> (p. 18) <P12> Powerful paternalistic systems that simply set up trust-related parameters as part of the indexing process and thus automatically apply a fixed set of such parameters to each search submitted to the retrieval system will be a real danger. (p. 17)
Conclusions	RQ "These developments suggest a research agenda that addresses indexing countermeasures and counter-countermeasures; ways of anonymously or pseudononymously spot-checking the results of Web-crawling software, and of identifying, filtering out, and punishing attempts to manipulate the indexing process such as query-source-sensitive responses or deceptively structured pages that exploit the gap between presentation and content." (p. 14) "Obviously, there are numerous open research problems in designing such systems: how can the user express these confidence or trust constraints; how should the system integrate them into ranking techniques; how can efficient index structures and query evaluation algorithms be designed that integrate these factors. ... The integration of trust and provenance into information retrieval systems is clearly going to be necessary and, I believe, inevitable. If done properly, this will inform and empower users; if done incorrectly, it threatens to be a tremendously powerful engine of censorship and control over information access. (p. 17)