Full system simulation allows simulating an entire physical machine on top of a host operating system (OS) and thus provides a powerful foundation to study the runtime behavior and interaction of computer architecture, operating systems and applications [1, 2, 3]. Since the entire execution environment in such a system is virtual, every operation carried out can be inspected easily.
A well-known limitation of full system simulation is the low execution speed offered by current simulators. Compared to hardware-assisted virtualization, functional simulation is orders of magnitude slower, restricting analyses based on simulation to short-running workloads. Moreover, due to the high slowdown, simulated machines loose the possibility to interact with non-simulated remote hosts. That further limits the types of workloads that can be examined with full system simulation.
Representative sampling  can reduce the run-time overhead by limiting complex analyses to short time frames that are representative for the analyzed workload. However, an initial functional simulation to identify such intervals is still needed and the accuracy achievable with this technique also heavily depends on sufficient phase behavior in the workload, which is not always present . Moreover, in some scenarios (e.g., analysis of memory duplication) limiting the observation window is not an option. An acceleration technique to enable full-length analyses of long-running workloads is thus desirable.
SimuBoost strives to close the performance gap between virtualization and functional simulation through the use of scalable parallelization. The core idea is to run the workload in a virtual machine (VM), taking checkpoints in regular intervals. Due to the difference in execution speed between virtualization and simulation, the spans between subsequent checkpoints can then be simulated and analyzed simultaneously in one job per interval. By transferring jobs to multiple nodes, a parallelized and distributed simulation of the target workload can be achieved, thereby reducing the overall simulation time.
Key challenges in SimuBoost are:
- Checkpointing SimuBoost has to create checkpoints in short intervals (1s - 2s) to bootstrap parallel simulations. To achieve a high speedup, the downtime, that is the time the VM has to be paused for each checkpoint, must be as short as possible. Moreover, the amount of data that needs to be stored and transferred to remote nodes should be kept low. We are evaluating a combination of copy-on-write (COW) and incremental, hash-based checkpointing approaches to fullfil these requirements. Previous work has already shown that downtimes as low as 100ms are feasible .
- Functional Continuity Full system simulators usually implement a deterministic execution model. Using hardware-assisted virtualization however introduces non-deterministic behavior as devices work asynchronously to the CPU. In consequence, non-deterministic events (e.g., interrupts) appear at different points in the virtualization and simulation stages. That leads to state deviation, where the continuity at interval boundaries in the simulation breaks. SimuBoost thus needs to log non-deterministic events in the virtualization and precisely replay them in the simulation [7, 8], keeping both stages synchronized.
We are currently working on the implementation. To estimate the speedup achievable with SimuBoost, we give a first evaluation of the practical feasibility of our approach through a formal model to describe its speedup and scalability characteristics. SimuBoost can speed up conventional simulation in a realistic scenario (parameter-wise) with a slowdown of 100x by a factor of 84x, while delivering a parallelization efficiency of 94% according to the model.
Contact: Dr.-Ing. Marc Rittinghaus
 L. Albertsson et al. Using complete system simulation for temporal debugging of general purpose operating systems and workloads. MASCOT, 2000.
 M. Rosenblum et al. Using the simos machine simulator to study complex computer systems. TOMACS, 1997.
 C. Won et al. A detailed performance analysis of udp/ip, tcp/ip, and m-via network protocols using linux/simos. High Speed Networks, 2004.
 T. Sherwood et al. Automatically characterizing large scale program behavior. Volume 30. ACM, 2002.
 V. Weaver et al. Using dynamic binary instrumentation to generate multi-platform simpoints: Methodology and accuracy. HiPEAC, 2008.
 M. Sun et al. Fast, lightweight virtual machine checkpointing. 2010.
 G. Dunlap et al. Revirt: Enabling intrusion analysis through virtual-machine logging and replay. SIGOPS, 2002.
 M. Sheldon et al. Retrace: Collecting execution trace with virtual machine deterministic replay. MoBS, 2007.