Case Study:

Proxmox VE ZFS Cluster with Remote DR – Availability, Security, and Simplicity

Decorative image

Cost optimized solution with DR in secondary site

For a customer with strict availability and data protection requirements, we designed and deployed a solution based on Proxmox Virtual Environment (PVE), utilizing ZFS and Proxmox Backup Server (PBS). The goal was to create a reliable virtualization platform with seamless disaster recovery options—even in the event of a complete site failure.This case study is being published with a time delay, after the solution — including the disaster recovery scenario — was thoroughly validated in real-world operation.

Solution Architecture

In the primary location, a 3-node Proxmox cluster runs with ZFS storage for virtual machines and containers. A local Proxmox Backup Server performs daily backups of all data. These backups are synchronized nightly to a remote site on a secondary PBS instance. In case of disaster, the remote PBS can restore virtual machines to a pre-prepared Proxmox server in the secondary location.

Key Components

  • 3× PVE nodes (Proxmox VE 8.x), each with ZFS RAID-Z2

  • ZFS replication every minute from the primary node to the others (asymmetrical)

  • HA Cluster: automatic VM failover in case of node outage

  • Local PBS: deduplicated backups every evening

  • Remote PBS: nightly sync over low-bandwidth connection

  • Disaster recovery ready via remote PVE host

Infrastructure Diagram


Backup and Replication Strategy

Benefits

  • High availability via Proxmox cluster and minute-level ZFS replication

  • Quick VM recovery without data loss on local node failure

  • Fully automated backup with remote off-site copy

  • Disaster recovery strategy with ready-to-restore environment

  • Cost-effective – no need for shared storage or high-speed WAN links

Summary

This case study demonstrates the power of an open-source Proxmox + ZFS stack. Thanks to strategic replication and backup planning, we achieved high availability, rapid recovery, and disaster resilience—even with limited connectivity to the remote site.

Interested in this topic?

Let's connect