Skip to content

Commit

Permalink
Add Yiwen's Dissertation (#263)
Browse files Browse the repository at this point in the history
  • Loading branch information
yiwenzhang92 authored Jul 17, 2024
1 parent 8fc21a3 commit f16f11c
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions source/_data/SymbioticLab.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1161,6 +1161,22 @@ @PhDThesis{fanlai:dissertation
The fourth and fifth parts introduce FedScale and Oort to complement and extend today’s ML training stage and the subsequent deployment stage up to the planetary scale. They enable federated model training and testing across millions of clients at the edge. FedScale supports on-device model execution and integrates Oort to orchestrate clients. At runtime, Oort cherry-picks the clients, who have the data that offers better utility in improving model accuracy and the capability to execute the ML task quickly, to minimize the performance gap between cloud ML and federated ML.}
}
@PhDThesis{yiwenzhg:dissertation,
Title = {Quality of Service for Performance-Critical Cloud Applications},
Author = {Yiwen Zhang},
Institution = {University of Michigan},
Year = {2024},
Month = {July},
publist_confkey = {dissertation},
publist_link = {paper || yiwen-dissertation.pdf},
publist_abstract = {Cloud infrastructure continues to scale due to rapid evolvement of both hardware and software technologies in recent years. On the one hand, recent hardware advancement such as accelerators, kernel-bypass networks, and high-speed interconnect brings more powerful computing devices, faster networking equipment, and larger data storage. On the other hand, new software technologies such as computer vision and natural language processing introduce more workloads across datacenters and the edge. As a result, more and more applications from many tenants with different performance requirements must share the compute and network resources to improve resource utilization. Therefore, it is more important than ever to ensure performance-critical applications receive the appropriate level of priority and service quality.
This dissertation aims to build system support for better quality of service (QoS) for performance-critical applications in the cloud. Specifically, we aim to provide guaranteed performance specified by service level objectives (SLOs) for multiple coexisting applications while maximizing system resource utilization. Unfortunately, we observe that existing cloud infrastructure lacks QoS support in multiple critical places including network interface cards (NICs), datacenter fabrics, edge devices and tiered memory systems, each of which requires unique QoS-aware system design to ensure predictable application performance.
To this end, we have built software solutions to provide better QoS in each of the aforementioned areas. First, we built Justitia to provide performance isolation and fairness in the NIC for kernel-bypass networks (KBNs). Justitia overcomes the unique challenges in KBN with several innovations, including split connections with message-level shaping, sender-based resource mediation with receiver-side updates, and passive latency monitoring. Second, we built Aequitas to provide QoS for latency-critical remote procedure calls (RPCs) inside datacenter networks. Aequitas is a distributed sender-driven admission control scheme that uses commodity Weighted-Fair Queuing (WFQ) to guarantee RPC-level SLOs. It enforces cluster-wide RPC latency SLOs via probabilistic downgrading in order to limit the amount of traffic admitted into different QoS levels. Third, we built Vulcan to automatically generate query plans for live ML queries based on their accuracy and end-to-end latency requirements, while minimizing resource consumption across the edge. Vulcan determines the best pipeline, placement, and query configuration by combining several techniques including Bayesian Optimization and memorizing intermediate results of pipeline operators. Finally, we built Mercury, a QoS-aware tiered memory system to provide predictable performance for memory-intensive applications. Mercury proposes a new resource management scheme inside the kernel tailored for tiered memory systems. It leverages a novel admission control and a real-time adaptation algorithm to ensure QoS guarantees for both latency-sensitive and bandwidth-intensive applications. Together, these solutions provide the missing pieces from the edge to the cloud to enable QoS for performance-critical cloud applications.}
}
@article{swan:arxiv22,
title={Swan: A Neural Engine for Efficient DNN Training on Smartphone SoCs},
Expand Down
Binary file not shown.

0 comments on commit f16f11c

Please sign in to comment.