【DBS】Lecture 11. Query Processing

Meaturing of query cost

$t_T$ – time to transfer one block. (≈ 0.1ms)
$t_S$ – time for one seek. (≈ 4ms)
Cost for b block transfers plus S seeks : $b t_T + S t_S$

11

External merge sort

cost:
M: 缓冲区能容纳的磁盘块数
$b_r$: 关系r中记录的磁盘块数
$b_b$: 每次从一个归并段读取的数据数
merge passes required: $\; \lceil log_{M-1}(b_r/M) \rceil$
total number of block transfer: $\; b_r(2\lceil log_{M-1}(b_r/M) \rceil + 1)$
total number of seeks: $\; 2\lceil b_r/M \rceil + \lceil b_r / b_b \rceil (2 \lceil log_{M-1}(b_r/M) \rceil -1 )$

Join

元组数较少的关系作为外层关系时效果较好

nested-loop join 嵌套循环连接

n: 元组数(记录数), b: 磁盘块数

worst case: $\; n_r * b_s+b_r$ block transfer, plus $n_r +b_r$ disk seeks

best case: $\; b_r + b_s$ block transfer, plus 2 seeks

block nested-loop join 块嵌套循环连接

worst case: $\; b_r b_s+b_r$ block transfer, plus $2b_r$ seeks

best case: $\; b_r + b_s$ block transfer, plus 2 seeks

improve: $\; \lceil b_r / (M-2) * b_s + b_r \rceil$ block transfer, plus $2\lceil b_r / (M-2 ) \rceil$ seeks

indexed nested-loop join 索引嵌套循环连接

c: 用连接条件对s进行单次选择操作的代价

cost of join: $\; b_r(t_T+t_S)+n_r*c$

merge-join 排序归并连接

$b_r+b_s$ block transfers + $\lceil b_r/b_b \rceil + \lceil b_s / b_b \rceil$ seeks

hash-join

$M>n_h +1$ 或 $M>(b_s+M)+1$ 或 $M>\sqrt {b_s}$ 时不需要递归划分

不需要递归划分时:$3(b_r + b_s) +4 * n_h$ block transfers + $2( \lceil b_r / b_b \rceil + \lceil b_s/b_b \rceil)$ seeks

需要递归划分时:$2(b_r + b_s) \lceil log_{M–1}(b_s) – 1 \rceil + b_r + b_s$ block transfers + $2(\lceil b_r / b_b \rceil + \lceil b_s / b_b \rceil ) \lceil log_{M–1}(b_s) – 1 \rceil$ seeks