In a 2-node HA cluster without quorum server, since neither of the cluster nodes (partitions) has more than 50% votes in case of split-brain, we should configure corosync.conf to enable two_ndoe: 1, so that both nodes (partitions) will be granted "quorum". But there'll be potential fencing matches which could introduce double-fencing by chances.

The current solution is using random/static fencing delays by configuring pcmk_delay_max/base parameters for stonith resources to prevent double-fencing.

Still, an interesting further improvement could be, how to make sure the more significant node that is hosting the more significant resources or instances (promoted) wins fencing matches.

An good idea is:

  • Involve the existing priority meta-attribute for resources. Users set priority for the significant resources that they want to matter. When fencing is needed, cluster scheduler will calculate nodes' priority respectively by summing up the priority of the resources/instances being hosted on them. Promoted instances take a little higher priority (+1) than the base priority on calculation. So specifically for the instances belonging to a same promotable clone resource, they get different priority values like "Promoted > Started > Stopped".

  • A cluster-wide priority-fencing-delay gets applied to the node with the highest total resource priority. It's added as a parameter of the fencing action in the cluster transition graph and passed to fenced by controld.

More details could be found: https://confluence.suse.com/pages/viewpage.action?spaceKey=hateam&title=More+significant+node+wins+fencing+match+under+2-node+split-brain


Comments

  • yan_gao
    11 months ago by yan_gao | Reply

    Worked out a prototype for it: https://github.com/gao-yan/pacemaker/commit/e55e71b1aa3227b87b0550e314c1fc1f51b839e2

Similar Projects

This project is one of its kind!