Skip to content

Latest commit

 

History

History
149 lines (116 loc) · 6.15 KB

README.md

File metadata and controls

149 lines (116 loc) · 6.15 KB

Asynchronous Buffer Pool Manager

Authors: Connor Tsui, Sarvesh Tandon, Kyle Booker

An asynchronous buffer pool manager, built on top of [tokio_uring] and [tokio].

The goal of this buffer pool manager is to exploit parallelism as much as possible by eliminating the use of any global locks/latches or any single points of contention for the entire system. This means that several parts of this system are implemented quite differently to how a traditional buffer pool manager would be implemented.

Most notable is the fact that I/O is non-blocking, courtesy of the io_uring Linux interface that the [tokio_uring] interface is built on top of. The second most notable part of this system is that there is no global page table that users must go through to access any page of data. All page data and data movement (eviction) is decentralized.

By making use of [tokio_uring], this buffer pool manager is implemented with a thread-per-core model where many light-weight asynchronous tasks can be scheduled on each operating system provided thread. And because the system is thread-per-core (and not a work-stealing system), it is the responsibility of the caller to ensure that tasks are balanced between threads.

Usage

The intended usage is as follows:

  • Call [BufferPoolManager::initialize] to set the memory and storage sizes
  • Spawn 1 operating system thread for every CPU core
  • Call [BufferPoolManager::start_thread] on each thread to initialize each for the buffer pool
  • For each of the tasks on each thread, call [BufferPoolManager::spawn_local]
  • To access a page of data, use PageHandles via [BufferPoolManager::get_page]
  • Use the read or write methods on PageHandle and then perform the intended operations over the page's data

Single-Threaded Example

Here is a simple example of starting the [BufferPoolManager] on a single thread (in this case, the main thread). It accesses a single page from persistent storage, fills it with the character 'A', and then writes it out to persistent storage.

use async_bpm::BufferPoolManager;
use async_bpm::page::PageId;
use std::ops::DerefMut;

// Initialize a buffer pool with 64 frames and 128 pages on persistent storage.
BufferPoolManager::initialize(64, 128);
let bpm = BufferPoolManager::get();

BufferPoolManager::start_thread(async move {
    let pid = PageId::new(0);
    let ph = bpm.get_page(&pid).unwrap();

    {
        let mut guard = ph.write().await.unwrap();
        guard.deref_mut().fill(b'A');
        guard.flush().await.unwrap();
   }
});

Since there should be only 1 [BufferPoolManager] instance at a time, the [BufferPoolManager::get] method returns a &'static reference to a global instance. This global buffer pool instance contains the global storage manager instance.

Note that the call to flush is not strictly necessary, as it is only a way to force an immediate write out to persistent storage.

Multi-Threaded Example

Here is a multi-threaded example on 8 threads, where each thread spawns 2 tasks that write a unique character to persistent storage.

use async_bpm::BufferPoolManager;
use async_bpm::page::PageId;
use std::ops::DerefMut;

const THREADS: usize = 8;

BufferPoolManager::initialize(64, 256);
let bpm = BufferPoolManager::get();

// Spawn all threads
std::thread::scope(|s| {
    for i in 0..THREADS {
        s.spawn(move || {
            BufferPoolManager::start_thread(async move {
                let h1 = BufferPoolManager::spawn_local(async move {
                    let index = 2 * i as u8;
                    let pid = PageId::new(index as u64);
                    let ph = bpm.get_page(&pid).unwrap();

                    {
                        let mut guard = ph.write().await.unwrap();
                        guard.deref_mut().fill(b' ' + index);
                        guard.flush().await.unwrap();
                    }
                });

                let h2 = BufferPoolManager::spawn_local(async move {
                    let index = ((2 * i) + 1) as u8;
                    let pid = PageId::new(index as u64);
                    let ph = bpm.get_page(&pid).unwrap();

                    {
                        let mut guard: async_bpm::page::WritePageGuard =
                            ph.write().await.unwrap();
                        guard.deref_mut().fill(b' ' + index);
                        guard.flush().await.unwrap();
                    }
                });

                let (res1, res2) = tokio::join!(h1, h2);
                res1.unwrap();
                res2.unwrap();
            });
        });
    }
});

More Examples

TODO more examples.


Design

This system is designed as a thread-per-core model with multiple persistent storage devices. The reason that it is not a multi-threaded worker pool model like the [tokio] runtime is due to how the io_uring Linux interface works. I/O operations on io_uring are submitted to a single io_uring instance that is registered per-thread to eliminate contention. To move a task to a separate thread would mean that the task could no longer poll the operation result off of the thread-local io_uring completion queue. Thus, lightweight asynchronous tasks given to worker threads cannot be moved between threads (or in other words, are !Send).

A consequence of this fact is that threads cannot work-steal in the same manner that the [tokio] runtime allows threads to do. It is thus the responsibility of some global scheduler to assign tasks to worker threads appropriately. Then, once a task has been given to a worker thread, the buffer pool's internal thread-local scheduler (which is just a [tokio_uring] local scheduler) is in charge of managing all of the cooperative tasks.

This buffer pool manager is heavily inspired by LeanStore, which you can read about here, and future work could introduce the all-to-all model of threads to distinct SSDs, where each worker thread has a dedicated io_uring instance for every physical SSD.