Advancement: Zhang, J. (CSE) - Using the Context Bus to Observe and Analyze Distributed Systems
The current state-of-the-art approaches observe distributed systems via annotations from application code and built-in anchors inside infrastructure that indicate what details should be involved during observation. Operations such as debugging, root cause analysis, and testing need to retrieve, recognize, and combine signals from different sources and formats of observational data. These approaches, which are typically applied offline, are fundamentally flawed. We can not perfectly link observational data entries collected by different observers during runtime or even after the fact without integrated system designs that add sufficient (temporary) identifiers (e.g., user-id, request-id, session-id, etc.) or apply additional mechanisms to reveal correlation among signals. However, correlating and connecting signals during runtime based on lineage is often considered too expensive in terms of performance and learning costs.
This work makes universal submissions for various observation methods possible through an ingenious representation of events at various granularity and a universal channel cross-cutting the software stack, supporting fine-grained, just-in-time analysis and operations that benefit development, monitoring, maintenance, etc. This work is a novel solution that maintains and aligns events and corresponding lineage efficiently and timely. We provide enriched features based on fine-grained signals from timely correlated observational data entries and support real-time features that immediately take reactional operations to the analysis result (e.g., fault injection, traffic shaping, load balancing, all in enriched features). This work exceeds the coverage (a full-stack and multi-granularity solution) and functionality (taking reactional operations to the just-in-time analysis results) of the current state-of-the-art approaches which work at a limited range of granularity and operate with missed opportunities (e.g., taking actions via fixed instructions that are updated sluggishly).
Event Host: Jun Zhang, Ph.D. Student, Computer Science & Engineering
Advisor: Peter Alvaro
Monday, December 11, 2023 at 7:00pm