Analysis of checkpointing algorithms for primary-backup replication

2024-11-0920179781-5386-1629-11530-134610.1109/ISCC.2017.80245062-s2.0-85030546778https://doi.org/10.1109/ISCC.2017.8024506https://hdl.handle.net/20.500.14288/6487Replication is useful for supporting fault-tolerance, reliable and recovery oriented distributed systems. Popular application areas include databases, P2P systems, web services and Internet of Things. In this study, we propose utilizing the checkpointing concept for improving the efficiency of the well-known primary-backup replication protocol in distributed systems. We developed a software framework based on an in-memory replicated key-value store to evaluate various checkpointing algorithms. Using the framework over geographically distributed nodes of the PlanetLab platform, we performed extensive experiments and analysis with several different metrics, including blocking time, checkpointing time, checkpoint size and recovery time. Experimental scenarios consist of using the well-known benchmarking tool, YCSB, performing realistic read/update queries through exemplary workloads. Our findings indicate that incremental checkpointing combined with a periodic usage is the most efficient approach with having up to 30-times better system throughput and 50% decrease in average blocking times compared to traditional primary-backup replication and other checkpointing algorithms.engN/AComputer scienceInformation systemsEngineeringElectrical electronic engineeringTelecommunicationsAnalysis of checkpointing algorithms for primary-backup replicationConference Proceeding426895800012BakılacakN/A3260