SQL Server AG Won’t Become Primary After Force Quorum? Here’s Why and How to Fix It

SQL Server AG Won’t Become Primary After Force Quorum? Here’s Why and How to Fix It

Overview

If you’ve encountered a situation where none of your SQL Server Always On Availability Group (AG) replicas become PRIMARY after a cluster failure — you’re not alone.  We recently had a customer with this exact scenario (AG won’t become primary after force quorum), and it is both uncommon and difficult to troubleshoot so I thought it would be worth posting about.

  • What causes this issue
  • Why all replicas can get stuck in SECONDARY state
  • How to resolve the issue safely

The Scenario

You have a two-node Always On AG across two subnets. Due to a network failure or maintenance in which one of both subnets become unavailable:

  • The Windows Server Failover Cluster (WSFC) loses quorum
  • The second node is unreachable or has been evicted
  • In Failover Cluster Manager, the AG Role is in a Failed state and will not come online.
  • In SQL Server Management Studio:
    • Both replicas show as SECONDARY
    • SQL Server refuses to allow any failover with the following error:

“The availability replica on this instance cannot become the primary replica because the WSFC cluster was started in force quorum.”


What’s Actually Happening?

This is SQL Server protecting you from a split-brain scenario. When you Force the Cluster Online due to quorum issues, SQL Server doesn’t trust that it’s safe to assign a PRIMARY replica.

So, it intentionally refuses to promote any replica to PRIMARY — even if the cluster seems healthy on the surface. This is referred to as split-brain, passive state mode. It is protecting itself since it doesn’t know which node the cluster is going to choose when / if it brings the AG role online. In this state, all the databases are in Not Synchronized, and you can’t even remove them from the AG since that has to be done on the PRIMARY replica.

Resolution

Step 1: Fix the Cluster

In the scenario we encountered, the root cause was a DNS issue with both the Cluster IP and the Listener IP on the secondary subnet, causing the IP Addresses to show up as Failed instead of Offline. Fixing the DNS issue and restarting the cluster resolved this.

This clears the “forced” quorum state and allows SQL Server to trust the cluster again.

Step 2: Restart SQL Server

This forces a recheck of the AG and WSFC state but will not fix the AG. Multiple reboots, cluster restarts, service restarts will not do that. Remember, it is protecting itself.

Step 3: Force Failover Allow Data Loss

This is a special command that should only be used in cases like this. Although allow data loss sounds scary, in this case it is quite safe since both replicas are secondaries, and no data has changed.

The command should be run on the secondary that you want to be primary, ideally the same server that was primary prior to the cluster failure.

ALTER AVAILABILITY GROUP [YourAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;

This command promotes the current replica to PRIMARY even if it wasn’t fully synchronized. It basically tells SQL Server to skip the checks with the WSFC and to trust you that this should be the primary.


🔍 Post-Failover Cleanup

After promotion, check the state of your AG:


SELECT 
  replica_server_name,
  role_desc,
  synchronization_state_desc,
  connected_state_desc
FROM sys.dm_hadr_availability_replica_states;

You will likely need to resume data movement on the databases to get the secondary back in sync and, in worst case scenarios, you may have to reseed the databases on the secondary


Microsoft Docs Reference

For more details, see Microsoft’s official documentation: Perform a Forced Manual Failover of an Availability Group (Microsoft Docs)


Lessons Learned

  • SQL Server is cautious with failovers — and that’s a good thing
  • Forced quorum is a tool, not a fix — clear it as soon as possible
  • AGs won’t elect a PRIMARY without trust in WSFC’s integrity
  • Document and test your disaster recovery plan in lower environments

Final Word

The `FORCE_FAILOVER_ALLOW_DATA_LOSS` command can feel scary, but when used correctly — as in this scenario where both nodes thought they were SECONDARY — it’s the right call.

Knowing how to handle these edge cases will keep your high availability setup truly available — even in the worst-case scenarios.

Please share this

Leave a Reply

Related Articles

A screenshot of a computer AI-generated content may be incorrect.

SSMS Essential Settings

With the recent announcement by Microsoft that they’re removing Azure Data Studio alongside the release of the new 64-bit version of SQL Server Management Studio

Read More »