#Number
TR-PDS-1995-018
#Title
Scheduling Queries With Large Out-of-Core Data Sets
#Author
Aman Sinha and Craig Chase
#Abstract
We consider the problem of dynamic, preemptive scheduling of queries
that have a shared data set. The model consists of (a) a processing
node with limited physical memory, (b) a disk resident data set, (c)
Poisson arrival of queries, (d) known probability distributions of
processing times and (e) fixed, non-zero cost of data fetches from
the disk. The objective is to minimize the flow time of the
queries. In this paper we assume that the queries may be scheduled
in any order. Data is fetched from the disk in units of ``chunks'',
which are assumed to be fixed size. Each query may process the data
chunks in any order. Since the data is shared, a single data chunk
may be required by multiple queries. Hence, an efficient scheduling
algorithm must ensure that the chunk is read from disk a minimum
number of times. It is well known that the shortest expected
remaining processing time (SERPT) rule can minimize the flow time
for memory resident tasks. However, it is unclear how successfully
SERPT can be adapted to exploit data locality.
In this paper we show that SERPT can be easily augmented to include
data prefetching. Through simulation, we show that
prefetched-SERPT (PSERPT) achieves very good locality of reference
and is very effective at minimizing flow time even with very
fine-grained computation, and a high-degree of inter-query data
locality.
#Bib
@TechReport{SC95,
author = "Aman Sinha and Craig Chase",
title = "Scheduling Queries With Large Out-of-Core Data Sets",
institution = "Parallel and Distributed Systems Laboratory, ECE
Dept. University of Texas at Austin",
year = "1995",
number = "TR-PDS-1995-018",
note = "submitted to the 1996 SIGMOD conference",
note = "available via ftp or WWW at maple.ece.utexas.edu
as technical report ECE-PDS-1995-018",
}