【DBS】Lecture 09. Storage and File Structure

存储器层次结构(向上变快变贵但是易失)

img

基本存储(primary storage): 高速缓冲存储器(cache)、主存储器(main memory)

辅助存储(secondary storage)或联机存储(online storage): 基本存储下一层,如磁盘(magnetic disk)

三级存储(tertiary storage)或脱机存储(offline storage): 最底层,如磁带(magnetic tapes)或光盘(optical disk)

主存储器以上均为易失性存储(volatile storage):设备断电后失去所有内容

独立磁盘冗余阵列(RAID)

平均故障时间(MTTF)

RAID level:

Choice of RAID Level

Factors in choosing RAID level

  • Monetary cost
  • Performance: Number of I/O operations per second, and bandwidth during normal operation
  • Performance during failure
  • Performance during rebuild of failed disk
    • Including time taken to rebuild failed disk

RAID 0 is used only when data safety is not important

  • E.g., data can be recovered quickly from other sources

Level 2 and 4 never used since they are subsumed by 3 and 5
Level 3 is not used anymore since bit-striping forces single block reads to access all disks, wasting disk arm movement, which block striping (level 5) avoids
Level 6 is rarely used since levels 1 and 5 offer adequate safety for almost all applications
So competition is between 1 and 5 only

Level 1 provides much better write performance than level 5

  • Level 5 requires at least 2 block reads and 2 block writes to write a single block, whereas Level 1 only requires 2 block writes
  • Level 1 preferred for high update environments such as log disks

Level 1 had higher storage cost than level 5

  • Disk drive capacities increasing rapidly (50%/year) whereas disk access times have decreased much less (x 3 in 10 years)
  • I/O requirements have increased greatly, e.g. for Web servers
  • When enough disks have been bought to satisfy required rate of I/O, they often have spare storage capacity
    • So there is often no extra monetary cost for Level 1!

Level 5 is preferred for applications with low update rate, and large amounts of data.
Level 1 is preferred for all other applications.

Buffer Manager

当Buffer的空闲区不够,不能容下新读入的Block时,需要将Buffer中 原有Block覆盖(替换)。主要策略为:

  • LRU strategy (Least Recently Used, 最近最少使用策略): Replace the block which was least recently used.
  • MRU strategy (Most recently used, 最近最常用策略): System must pin the block currently being processed. After the final tuple of that block has been processed, the block is unpinned, and it becomes the most recently used block.(最优)
  • Toss-immediate,立即丢弃策略:处理完的元组立即丢弃

File Organization

  • 定长记录
  • 变长记录

Organization of Records in Files

  • heap file 堆文件,流水文件:
    a record can be placed anywhere in the file where there is space
  • sequential file 顺序文件:
    store records in sequential order, based on the value of a search key of each record
  • hashing file 散列文件:
    a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed
  • clustering file organization 聚集文件组织:
    records of several different relations can be stored in the same file
    Motivation: store related records in different relations on the same block to minimize I/O