Abstracts of VIEWS 1996 Papers.
If an abstract title is underlined, it is linked to the postscript copy of the paper.
Workshop on Materialized Views: Techniques and Applications
(In Co-operation with ACM Sigmod)
Friday, June 7, 1996
Le Centre Sheraton Hotel, Montreal, Canada.
A warehouse is a data repository integrating information extracted from remote sources with the purpose of providing efficient support for complex business-critical queries. For the warehouse data to be reliable, an important issue is the degree of consistency guaranteed between the warehouse data and the data in the remote sources. In this paper we present two algorithms, the buffering algorithm and the partial buffering algorithm, both based on a revisitation of the conservative timestamp algorithm for distributed transactions, that guarantee different levels of data consistency in the warehouse.
This paper addresses the problem of refresh of a remote database snapshot, or "remote materialized view," which is based on a join of two or more tables in the source database. We consider the situation in which the snapshot is brought into consistency with the source database only at explicit command of the user; i.e., the snapshot is not, in general, consistent with the source database. A differential refresh scheme called "relevant logging" is described. RL examines updates to the base table and stores those which are relevant in a log. Subsequent algorithms determine those updates which should be transmitted to refresh the snapshot. A variation which treats the view as semi-join tables is also discussed. Brief comparisons of costs of this method with full regeneration and an alternative differential file-based method are made.
We consider the problem of maintaining a materialized view without accessing the base relations. More specifically, we would like to find a maximal test that guarantees that a view is self-maintainable (abbrev SM) under a given update to the base relations, i.e., can be maintained using only the view definition, its contents and the update. We observe that SM evaluation can be separated into a view-definition-time portion where a maximal test is generated solely based on the view definition, and an update-time portion where the test can be efficiently applied to the view and the update. We call such a maximal test a Complete Test for View Self-Maintainability (abbrev CTSM).
This paper reports on some interesting new results for conjunctive-query views under insertion updates: 1) the CTSM's are extremely simple queries that look for certain tuples in the view to be maintained; 2) these CTSM's can be generated at view definition time using a very simple algorithm based on the concept of Minimal Z-Partition; 3) view self-maintenance can also be expressed as simple update queries over the view itself.
A data warehouse collects and integrates data from multiple, autonomous, heterogeneous, sources. The warehouse effectively maintains one or more materialized views over the source data. In this paper we describe the architecture of the Whips prototype system, which collects, transforms, and integrates data for the warehouse. We show how the required functionality can be divided among cooperating distributed CORBA objects, providing both scalability and the flexibility needed for supporting different application needs and heterogeneous sources. The Whips prototype is a functioning system implemented at Stanford University and we provide preliminary performance results.
A NAME = "paperFerrariAbstract">In this paper we show how materialization strategies can be successfully used for increasing the efficiency of access control in a temporal authorization model. The temporal authorization model allows to specify both temporal authorizations, that is, authorizations with an interval of validity, and derivation rules, allowing new authorizations to be derived on the basis of the presence or absence of other authorizations. These functionalities highly increase the expressive power of the model. However, they make the access control less efficient, as it may require the evaluation of several rules. For this reason an approach based on materialization strategies has been adopted. Preliminary results on its performance are illustrated in this paper.
We witness a rapid increase in the number of structured information sources that are available online, especially on the WWW. These sources include commercial databases on product information, stock market information, real estate, automobiles, and entertainment. We would like to use the data stored in these databases to answer complex queries that go beyond keyword searches. We face the following challenges: (1) Several information sources store interrelated data, and any query-answering system must understand the relationships between their contents. (2) Many sources are not full-featured database systems and can answer only a small set of queries over their data (for example, forms on the WWW restrict the set of queries one can ask). (3) Since the number of sources is very large, effective techniques are needed to prune the set of information sources accessed to answer a query. (4) The details of interacting with each source vary greatly.
We describe the Information Manifold, an implemented system that provides uniform access to a heterogeneous collection of more than 100 information sources, many of them on the WWW. IM tackles the above problems by providing a mechanism to describe declaratively the contents and query capabilities of available information sources. The contents of the information sources are described as views over a set of world-view relations. We describe an algorithm that uses the source descriptions to prune efficiently the set of information sources for a given query. The query plans we generate can involve querying several information sources and combining their answers.
In [GHKMS96a] we argued that GUI programming is a database research problem. This was based on our experience of building a nontrivial GUI system designed around a data centric architecture where we applied many database techniques. In this paper we discuss some of the database techniques used in more detail; in particular, the application of derived view materialization and maintenance, and active views to the GUI programming problem.
We argue that any GUI system has a large number of data dependencies that typically have to be maintained by the programmer. Therefore, stating these data dependencies declaratively, independent of performance concerns, will provide good encapsulation of program logic. We state these dependencies as derived views in a declarative, higher order, Horn clause logic language on complex objects, called Logic++. One of the ways to improve performance is to materialize these derived views after the system is designed and debugged. The system does the materialization and maintenance of the views to ease the programmer s burden.
We abstract any GUI application as a set of event handlers, where each event handler is a transition from the old screen/program state to a new screen/program state. Under the data centric architecture, every entity on the screen corresponds to proxy datum in the program. Therefore, we can express each event handler as a query dependent update, albeit a complicated one. We express the state and state transitions in Logic++. Often, the proxy data are expressed as derived views that are materialized on the screen. Then, the system must be active in maintaining these materialized views by performing screen updates appropriately. This formulation of the GUI programming problem separates the logic of the system from the idiosyncratic calls to the windowing system, thus significantly simplifying the problem of GUI programming.
In applications as diverse as logistics, workflow, and concurrent engineering, it is crucial to be able to monitor the ongoing execution of a distributed activity or situation. Such applications must support monitoring of standing requests for information (SRIs) i.e., predicates defined over federation-level views of autonomous, distributed data sources. To implement SRIs effectively over diverse data managers, we use a 2-layer approach. Selected updates from component databases are propagated to a monitoring site that runs an active DBMS. SRI conditions are monitored over selectively materialized portions of source databases. This paper describes requirements of distributed situation monitoring applications, explores alternate materialization strategies, and discusses an effort to prototype a solution that exploits commercial replication products.
Previous research on materialized views has primarily been in the context of flat relational databases---materialized views defined in terms of one or more flat relations. This paper discusses a broader class of view definitions---materialized views defined over a nested data model such as the nested relational model or an object-oriented data model. An attribute of a tuple deriving the view can be a reference (i.e., a pointer) to a nested relation, with arbitrary levels of nesting possible. The extended capability of this nested data model, together with materialized views, simplifies data modeling and gives more flexibility---important features in rapidly changing application environments.
Simple extensions of standard techniques to the nested model would do too much work for maintenance: a change in a nested set would cause reprocessing the entire nested set, not just the changed parts. We show that these views can be maintained using techniques derived from existing incremental maintenance algorithms without performing this additional work. We also present techniques to efficiently capture the incremental changes of the nested components.
We describe the implementation of these techniques for the \sword\ interface to the Ode database system. The implementation is based on the representation of nested structures by classes and the use of an SQL-like language to define materialized views. We outline the data structures and algorithms used in the implementation.
This is one of the first pieces of work to explore the applicability of materialized views over complex objects.
We propose a technique to answer OQL query expressions using OQL view expressions. The technique for reformulation is based on the extended pattern-matching of well typed expressions. It uses an algorithm FindSubquery, which determines if the OQL query can be rewritten so that it uses the OQL view expression. If so, the query can be answered using the view. Our research has the following features: we handle queries and views that are general OQL expressions; we handle template views (that are expressed over variables); we can generalize a view over the type hierarchy; we handle disjunctions in the view; and we use integrity constraints while matching the query and the view.
Consider the problem of supporting an integrated view over multiple databases. The traditional approach is to use a virtual view, but recent investigations are proposing to use a materialized view, or a hybrid virtual/materialized view. This paper initiates an investigation into the performance trade-offs along this spectrum of choices. In particular, the paper develops analytical models for predicting query-response time, view freshness, and system load in terms of parameters such as query frequency, query complexity, update frequency, and network delay. In many ways the performance of a mediator-based integration environment is similar to that of a client-server DBMS architecture. However, the notion of query freshness does not explicitly arise in a single DBMS, and complex joins may more likely in an integration environment.
This paper lays the groundwork for conducting benchmarking experiments for integrated views, and presents the results of some initial experimentation. Such results will be used to calibrate the analytical model for specific application environments. This will enable prediction of the behavior of the virtual and materialized approaches in specific contexts.
Incremental maintenance of materialized views involves computing the changes necessary to make a view consistent with updated base tables. Deferred view maintenance involves postponing the application of changes to a view. In order to perform deferred incremental view maintenance, we need to refer to the changes made to base tables since the last view refresh. When multiple materialized views with different refresh times are derived from a common base table, the sets of base table updates relevant to a refresh operation can be different for different views.
One could implement separate logging tables for each materialized view. However, this scheme has a high space and update transaction overhead. An alternative approach is to use a single log table called a coalesced log. This paper presents high-level algorithms for maintaining and querying coalesced log tables. Although we are motivated by efficiency considerations, we attempt to address these issues in an algebraic manner that is consistent with ease of proof. We conclude by discussing how our approach might be implemented in SQL.
Materialized views, especially views involving aggregation, are often used in environments such as data warehouses to speed up the evaluation of complex queries. Because the views can be quite complex, as changes are made to the underlying base data it is usually better to incrementally maintain a view by propagating changes to base data onto the view than to recompute the view from scratch. Griffin and Libkin [GL95] provide a framework for deriving incremental view maintenance expressions for a view with duplicate semantics, where the view can be defined using any number of select, project, join, bag union, and monus bag-algebra operators. However, aggregation was considered only in a limited sense; for example, aggregation with group-by attributes was not allowed. We extend the framework of [GL95] to include maintaining views with aggregation in the general case, where aggregation can include group-by attributes and a view can include several aggregate operators.
A materialized view needs to be maintained in response to changes to the tables used to derive the view. Incremental maintenance is typically done in two steps -- (1) compute the changes to the materialized view, and (2) refresh the materialized view by applying these changes.
In this paper, we show that it is possible to interleave change computation of one view with refresh of other views so that the changes can be computed more efficiently than would be possible if the two steps were to occur sequentially in either order. The algebraic equations to compute the changes to a view are simplest when written in terms of both the pre-update and post-update states of the deriving tables. By interleaving the refresh and change computation steps, we can guarantee that a table is always in the correct state during change computation. Such an interleaving corresponds to a staggered evaluation of the algebraic equations defining the changes -- hence the technique is called the staggered maintenance algorithm.
We permit multiple materialized views to be defined from base tables and other materialized views using select, project, join, union, and the SQL Except operations. We use a model where views may be maintained in an immediate, deferred, or snapshot manner and where we may have no control on when the base tables are updated by user transactions.