Wednesday, 21 July 2010

What Happens When a Forefront TMG Array Manager Fails?

Forefront TMG Enterprise Edition introduced the concept of a new array called the Standalone Array. This technology is based upon the old locally installed Configuration Storage Server (CSS) model and does not require the use of a dedicated management server. Unlike the existing model, there is now the concept of an Array Manager server and other array members are termed Array Managed servers. Only one array manager can exist within an array and this essentially becomes the master configuration owner.

A common question to ask (as people did with ISA Server 2006 EE and the CSS role) is What happens when the array manager fails or is offline for some other reason?

With the array manager offline:

  • The remaining Forefront TMG array managed servers will continue to operate and provide full firewall functionality using a locally cached version of the Forefront TMG configuration/policy.
  • The remaining Forefront TMG array managed servers will not enter Lockdown Mode due to the lack of an array manager as should operate normally.
  • You will not be able to access the Forefront TMG configuration or connect to it using the Forefront TMG Management console. Consequently you cannot make changes to the firewall policy or monitor the Forefront TMG environment, amongst other administrative tasks provided by the console.

When the array manager comes back online:

  • The remaining Forefront TMG array members will synchronise with the array manager to obtain updated local cache configuration.
  • You will be able to access the Forefront TMG configuration and connect to it using the Forefront TMG Management console from the array manager or the array managed servers.

When the array manager cannot come back online, or is going to be offline for a long period of time, it is probably unacceptable to lose access to the Forefront TMG configuration or be unable to connect to it using the Forefront TMG Management console. In this scenario, the recommended option is to designate an existing, fully functional array managed server as the new array manager. This is achieved using the Set as Array Manager link from Tasks tab of the Forefront TMG Management console.

This process is documented here and appears to be a simple task. However, this document does not cover the expectation of the administrator when it comes to actually performing the procedure on the following counts:

  • In reality, the process involves some considerable time delays where it is easy to think that the process has stalled or hung.
  • It does not cater for the fact that you may be using a workgroup deployment which requires the administrator to assign SSL server certificates to the array manager as part of the process.
  • The array name remains unchanged and will show the computer name of the existing array manager, not the current array manager computer name.
  • What happens if you start the existing array manager once you have designated a new array manager, or you want to bring the array manager back online as an array member.

So, I thought it would be useful to document this process with a real-world slant with a basic walkthrough…

The example environment I will use is the same as that used for my Workgroup Deployment with Forefront TMG Enterprise Edition series of articles, namely:

image

Where TMG03 is the array manager server and TMG04 is the array managed server. As this environment is based upon a standalone array for a workgroup deployment, this also allows us to see the additional steps required for this scenario (which is handy).

So, assuming that TMG03 has failed or is taken offline, we will need to designate TMG04 as the new array manager as shown below.

Starting the Forefront TMG Management console on TMG04 you will notice that you receive the following error as the existing array manager (TMG03) is not operational:

2010-07-15_1147

After clicking Continue it will take a long time for the console to load. From my testing this could take up to 3 or 4 minutes before the console is accessible. For many people, this is an unexpectedly long delay and it would be common place to assume the process has stalled or hung. However, please be patient!

2010-07-15_1148

 2010-07-15_1153

Once the console has loaded, there should be an option under the Tasks tab for Set as Array Manager as shown below:

 2010-07-15_1152

If you are using a workgroup deployment (as per our example environment) you will see the following prompt:

2010-07-15_1154

In order to designate this server as the new array manager, you will need to have created an SSL server certificate that can be used for workgroup authentication. In our case, the certificate will need a common name of TMG04.dmz.com as discussed in previous workgroup articles. The above prompt assumes that this certificate is available in an exported PFX file format with an associated password. Once you have selected the appropriate PFX file and entered the correct password, the Setting this server as the array manager… process will begin:

2010-07-15_1154_001 

Please Note: If you receive an error that the ISASTGCTRL service cannot be started, it may be necessary to reconfigure this service to use a startup type of Automatic as opposed to the default startup type of Disabled.

Once completed, you should now see the the console connected to the local Forefront TMG configuration storage server which contains all previous configuration and settings; TMG04 is now the new array manager:

2010-07-15_1335

2010-07-15_1336_001 

A quick look around the console should confirm that TMG04 is now the array manager and the Forefront TMG configuration is synchronised:

2010-07-15_1337

2010-07-15_1409

2010-07-15_1335_001

If you have an array with more than two array members, it will now be necessary to configure remaining array members to use the new array manager with the Change Array Manager option available on the Tasks tab of the Forefront TMG Management console.


Once you have designated TMG04 as the array manager, in the event that TMG03 is brought back online it will no longer be able to participate in the array and Forefront TMG will need to uninstalled, reinstalled and then rejoined to the array now managed by TMG04. It will also be necessary to delete the existing TMG03 server object from the Servers tab of the System node (on TMG04) before attempting to join the array and also remove the TMG03 entry from the Managed Server Computers computer set (on TMG04).

Hope this helps!

17 comments:

  1. Hello.
    Fine article. But I found it after repaired our TMG array :)

    ReplyDelete
  2. Sorry, at least you know now! :)

    ReplyDelete
  3. Thanks for sharing with your knowledge.

    I got similar problem - TMG server can't join to the array managed by EMS. However EMS can monitor that server but can't deploy enterprise policy.. I got exactly the same error message like you depicted in your article. I'm fighting with that since a long time.. So far no luck :/

    ReplyDelete
  4. Hello Jason,

    thx for posting. I read to late and have deploy an new Array :-( I loose many time for troubleshoting.

    nice greetings
    from Germany

    ReplyDelete
  5. Hello Jason,

    I've been looking for information like this, but on how to recover from a failed Offline UAG Array Manager. Do you have any information on this?

    Thanks,

    ReplyDelete
  6. Have you seen this?

    http://technet.microsoft.com/en-us/library/dd857288.aspx

    Don't forget that the recovery for UAG is going to be pretty similar as UAG simply uses the TMG array topology and services.

    Cheers

    JJ

    ReplyDelete
    Replies
    1. Great information. Thanks. I have an existing TMG Standard installation that has been up for a few years and in production. When I installed TMG Enterprise in the remote datacenter, it automatically assigned itself as "Array Manager" even though the old TMG Standard does not indicate that it is a member of an Array.
      Under "Systems" in the new TMG Enterprise, it shows the production TMG Standard as "managed" but is red, like non existent or something. Can I safely delete the TMG Standard from the apparently non funtional Array without any problems with the production TMG Standard ? Hope this makes sense. Thanks

      Delete
    2. Great blog. I am in the midst of documenting the DR strategy for our environment. I've read that for reporting to function properly after changing the array manager you need to update the report server [Found under Logs & Reports / Reporting tab /Configure Reporting Settings / Report Server tab]. Is this correct?

      Delete
    3. Hi Kate,

      If you are going to recover the failed server, than you don't *have* to remove the reporting server assignment. However if you plan on runnning without the existing server for a while, it may make sense to move the reporting role.

      Moving the role as part of the recovery exercise makes sense, but I don't believe it has to be done if you are recovering the failed node within a short timeframe. To my knowledge, the reporting server and the array manager don't have to be the same server.

      Cheers

      JJ

      Delete
  7. Great stuff... as i'm trying to implement the same but facing lots of issues.

    Here i have a issue i'm facing during promoting managed servers to manager, but i get this error

    Error Code: 0x80070422

    Messager: The Service cannot be started, either because it is diabled or because it has no enabled devices associated with it.

    Please advice if you came accross this error

    ReplyDelete
    Replies
    1. I think you need to configure the ISASTGCTRL service to be set to Automatic, not Disabled before carrying out the promotion action. Not sure why it doesn't do this itself though :\

      Delete
  8. Jason,

    Can I use this post in case I have EMS crashed? Will it work in the same way?
    thanks
    Uilson

    ReplyDelete
    Replies
    1. This is probably a better resource: http://technet.microsoft.com/en-us/library/ee388575.aspx

      Delete
  9. If I have 3 servers in array. After change array manager before add second server to the array I must before delete server object from System node like to do it for failed array manager or I can just change array manager for this server with wizard?

    ReplyDelete
    Replies
    1. Just to give you best answer, can you please rephrase that as I am not 100% sure what you are asking...sorry

      Delete
  10. hi all
    i need an urgent help please, i have three TMG in array technology, two physical servers member of the array and the array manager is a virtual machine, the problem is that i tried to access one of the array members but it gives me refresh failed, i cannot do any changes,not all the options in the left are there!. i accessed the array manager and i tried to open the forefront TMG Management, there is nothing there when it opened, only in the left Microsoft forefront Threat Management Gateway, and int he right there is only option "connect", i tried to connect and it gives me an error message "Forefront TMG Management was unable to connect to configuration storage server, when i click on detail it gives that " the server is not operational, the rules and services is up and running, but i cannot change rules, cannot see logs and reports, even i cannot export configuration for backup, what i understood that the managed servers can operate using the cache or local configuration storage, the question is how can i solve this, if i restart the array manager will affect other managed servers, will i be down as we are a financial company Credit card and its very critical and will have financial losses if the services goes down, can any one help, can i re-assign one of the servers as an array manager, i red that we can but from Tasks/ assign array manager does not appear, not all configuration options is there, please can anyone help what to do, i will install a TMG on a new server now and move all rules to that server and use it but can anyone tell me work around solution or restarting the array manager will affect the other two servers operation?
    thanks Alot

    ReplyDelete