What are the differences between Inbound and Global Geolink modes for geographic load balancing?
The simple answer would be that Global Geolink geographic balancing requires two VFI whereas Inbound Geolink geographic balancing only needs one.
A more detailed answer would be: Inbound Geolink geographic balancing will only balance incoming traffic for your hosted services and rely on your already existing infrastructure to replicate the content between your multiple sites. In short, the Link LBs will direct the clients queries at the right place, you have to ensure that the servers are in sync at every location.
Global Geolink geographic balancing on the other hand, can balance both incoming and outgoing traffic. Even if one site's internet access is totally down we can redirect both incoming and outgoing traffic for this site trough other locations by rerouting the traffic over the WAN links with NATTP.
How is Geolink information exchanged between sites?
Geolink info is exchanged via management ports and can be configured with multiple possible destination IPs to reach the alternate Elfiq management port. For example, first IP could use the private WAN and alternate Geolink IP (backup) use a site to site VPN to ensure Geolink is always up. If Geolink becomes unavailable, each Elfiq has default behavior for IDNS resolution for each resource record.
How are DNS records modified for geographic incoming load balancing?
You simply have to change their A record to NS record for every service you need to load balance in geographic mode. Instead of pointing to an IDNS interceptor per link in one site, the NS record will point to one interceptor IP per link for each geographic site having an Elfiq. So if you have 2 sites with 3 links per site, you have 6 IDNS interceptors (3 on each Elfiq). Because the Elfiq Link LB exchange their metrics and IDNS resources via the Geolink, each Elfiq has the geographic view of the 6 links and can process DNS requests for any of the two sites. This is the core of incoming geographic load balancing.
Global Geolink geographic mode allows redirection/encapsulation of traffic between sites using private links.
On top of geographic load balancing, you can use the Internet Service Verificators (ISV) service to poll your local services availability. If a service becomes unavailable, the ELfiq will remove it from the possible IP addresses to return via IDNS. Alternate site will also be notified via GeoLink.
I set my IDNS RR entry with the WFA algorithm. I expected the Link LB to do round-robin when the weight is the same, however, I am always getting the same IP.
The IDNS RR WFA algorithm uses the GMAC weight as its first metric. If two choices or more have the same weight value then the Link LB will choose the resource (or IP) that currently has the lowest "Hit Count" (seen as "HC" in the "sh idns rr" output). If one link has been down for a given period of time then the Hit Counts associated with that link will stop to increment while the remaining links will pick up the traffic. When the downed link is re-enabled, the hit-counts for this resource will be significantly lower than the others. The Link LB will then use solely this resource until the Hit Counts have equalized with the other records. In effect this can give the impression that load balancing is not behaving as it should. Once Hit count have been equalized, round-robin load balancing will resume between IPs that share the same weight.
If you want to speed up this process you can erase and re-create the relevant IDNS RR records, that will effectively reset the Hit counts to zero.
My geographic deployment suffers from erratic Geolink synchronization, what can I do?
First, we always recommend that you check the logs of your units. When using the web interface ou can do so using the "Event Log" under the System category in the menu. When using the CLI you can also check the logs using the "sh log event".
Then you will need to take a look at the output of the "sh geolink" command within the VFI. This command can be run only in CLI mode (either through the WEB or directly in SSH or console mode).
Here is a sample output:
Name[Geolink1] Id[1] State[enable] Conn state[ok] Last PRtt (sec)[2.510]
Local geotag[2] Remote geotag[1] Dest Index[0]
Dest IP0[192.168.XXX.XXX] Dest IP1[not defined]
Dest IP2[not defined] Dest IP3[not defined]
Description[Test_Geolink]
Pushing interval (sec)[3] Tx count[7084] Tx timeout (msec)[4000]
Please pay particular attention to the following fields:
Last PRtt: This timer represents the time it took to run the last synchronization process, if this timer is over the Tx timeout (4 seconds in the example) then you should start to investigate why it takes so long to synchronize. It means that the transactions will not work consistently. A long transaction time can be usually explained by one of the following factors:
-
There is a significant delay in getting information from one unit to another, this can be measured using a simple ping command from different places in the network. If the Round Trip time (RTT) between the units is too high you should see if you can help this situation.
-
There can be problems with the configuration that the units are trying to process. If the processing of the data exchanged through the Geolink hangs because of an error, you will see the transaction time go up because the process times out. These errors need to be investigated in order to solve the replication issue. Elfiq suggests you take a closer look at each configuration and check for items that could be conflicting. For example, ISVs and GMACs cannot have the same numerical identifier (or ID) on multiple sites because that would prevent the merging of the information.
Will my geolink fail when I change the users/passwords or the DSA keys?
If your geolink is configured with the "mgmt" and "ena" passwords:
After changing the mgmt or ena passwords the API geolink will remain up for as long as possible. A rupture in the communications channel will force a re-connection and break the geolink if it has not been changed to reflect the password change.
If your geolink is configured with the DSA keys:
The new SSH geolink will behave in a similar manner, in other words, if a key gets changed or removed, the currently established geolink will stay up for as long as possible. If it fails it will not be able to reconnect unless the keys are correctly exchanged.
What happens if my geolink fails?
If a geolink fails, the local shadow values will 'expire' after a lenght of time defined as:
(2x the update interval (sec)) + tx timeout (msec)
Once expired the shadow GMAC will fall in an "administratively down" state but will not disappear. In fact they will never disappear unless the remote unit reconnects to clean up the structures.
What is NATTP?
NATTP stands for "Network address translation transfer protocol" - This is an Elfiq designed protocol that uses it's own protocol number and runs on IP. NATTP is a Layer 4 protocol in the OSI model. It was created as a very efficient way (low overhead) to encapsulate valid traffic and modify it's original path. NATTP is preferable to NAT because it removes any security risks associated with passing public traffic on private links by encapsulating the traffic so that only another Link LB unit can retrieve the data.
What is a channel for inter-VFI communication?
Channels are direct paths between VFIs that take place directly on the backplane. When creating a new Channel you will need to declare the chosen VFI and the target "side" of the VFI. For example, you could create a Channel that will send traffic to the "outside" of VFI2. This means that any traffic that goes through this Channel will be seen on VFI2 as incoming traffic (coming in from the outside).
Once your Channel is created you can send traffic through it using "acl channel in", "acl channel out" and "gmac channel" entries.