Turn your PCs into clusters!

WEB INTERFACE

 The SG cluster has a friendly web interface for the administrator. Properties of all group and all servers are available from this interface.

Each Group has a red block containing the group related information followed by various blue blocks containing server related information.

Note: settings set with this interface are temporary. It would be cleared after SG load balancer reboot. If you want them permanent, you have to set them in initial_command in /etc/sg.conf

  

 

Group Properties

active_flag: On or Off

Whether a group is enabled or disabled

alive_mask: (read only)

Bits representing the alive status of servers in this group, this is for programmers :)

keyport_list: a list of critical port numbers

All ports critical to the the service provided by your cluster should be listed here. In case any keyport on a server is not responding for a period of time, that server will be treated as a dead server. For example, a squid cluster should list 0(ping), 3128/tcp(squid) in keyport_list, a mail server cluster should put 25/tcp(smtp), 110/tcp(pop3), 143/tcp(imap4) in keyport_list

select_method: Round Robin, by Connection, by Packet, by ClientIP or by External

This is the load balancing policy used in target server selection when a link is created. 'by Connection' means choosing the server with least connections. 'by Packet' means choosing the server with least packet traffic. 'by ClientIP' means choosing the server by hashing from the cleint ip, this guarantess requests from same client will be redirect to same server even after the failover of the load balancer. 'by External' means choosing the server with least 'external_count' which is a property modified by service program through feed back protocol.

round_index: (read only, from 0 to server_total-1)

This is the next server used in 'Round Robin' server selection.

keep_same_server: On or Off

The target server for a new created link is chosen based on the balancing policy. But sometimes two different links are actually related and should be redirected to the same target server. For example, RPC service port query to port mapper and the following RPC service request. Or another example, squid ICP query and the following http get request. In these case, the target server for the previous link and following link should be the same. keep_same_server guarantees packets will be redirected to the same target server if any link from same client is still available in the SG link mapping table.

mcast_mode: Deny, Bypass, ReadWrite, ReadOnly

This decides how a request for multicast service is handled. 'Deny' means dropping all request for multicast service. 'Bypass' means handling multicast service requests as unicast ones. 'ReadWrite' means turning multicast service support on. 'ReadOnly' means only serve read requests for multicast service. This is primarily used for multicast service server state recovering.

multicast_addr: a multicast IP address

Multicast address used for multicast service. The default value for w.x.y.z is 234.x.y.z

mcast_error_threshold: 0-65535

A mcast error means replies from some servers for a multicast write request are not the same as the majority of servers in the server group. This implies there may be a state inconsistence in these servers. If a server's mcast error exceeds mcast_error_threshold, it will be put into pending state. If the server's mcast error exceeds double mcast_error_threshold, it will be put into dead state.

deny_interval: seconds, 0-65535

SG has protections against the attack from evil clients. 'deny_interval' determines how long an evil client will be rejected by SG load balancer after the attack is detected.

connection_count_limit: 0-65535, 0 means no limit

This setting limits the maximum connections a client could have to a server group

connection_rate_limit: 0-65535, 0 means no limit

The setting limits the maximum connection rate a client could have in the last 60 seconds.

finwait_tcp_limit: 0-65536, 0 means no limit

The system resource used by a tcp connection won't be released if the connection stays in FINWAIT1 or FINWAIT2 state. Some Deny-Of-Service attack will intend to have many connections to a server and let these connections stay in FINWAIT1/2 state by not sending back acknowledge at connection close. This setting limits the maximum FINWAIT1/2 connections a client could have to a server group.

connection_count_max, connection_rate_max, finwait_tcp_max: (read only)

These properties keep the maximum value happened in the past to give the administrator the clue to good setting for connection_count_limit, connection_rate_limit and finwait_tcp_limit.

failure_detect_by_packet_snoop, recovery_detect_by_packet_snoop: On or Off

Whether to enable the failure/recovery detection by packet snoop or not. While packet snoopy is the least overhead method to detect the failure and recovery of a server, it doesn't work if there is no request from client.

packet_delta_threshold: 0-65535, packet_timeout_threshold:0-65535

These two thresholds defines when will SG think a server as not responding. If any keyport of a server receives packets more than packet_delta_threshold and doesn't response for more than packet_timeout_threshold, SG puts this server into pending state. If the server doesn't response for more than double packet_timeout_threshold, SG puts this server into dead state.

failure_detect_by_porttest, recovery_detect_by_porttest: On or Off

Whether to enable the failure/recovery detection by port test. Since port test detects the failure/recovery of servers actively, it works even there is no traffic at all.

porttest_error_threshold: 0-65535

A porttest error means there is at least one keyport not responding to the requests generated by sgmon. If a server's porttest_error exceeds this threshold, SG puts it into pending state. If a server's porttest_error exceeds double porttest_error_threshold, SG puts it into dead state.

failure_detect_by_heartbeat, recovery_detect_by_heartbeat: On or Off

Whether to enable the failure/recovery detection by monitoring the heartbeat from servers. The heartbeat may either generated by the sghb process executed on servers where service programs is or by service programs directly.

heartbeat_timeout_threshold: 0-65535

If a server is not heartbeating for more than this threshold, SG puts it into pending state. If it is not heartbeating for more than double of this threshold, SG puts it into dead state.

 

Server Properties

status: Alive, Pending or Dead

the status of this server. Various fault/recovery detecting machinisms are used in SG system. The server status is calculated by sorting all fault/recovery event with timestamp. The latest event would decides the result.

Why pending state?
A server not responding to client's request or monitor's test may crashed or be busy in serving others under heavy load. We put a server into pending state at the beginning instead of dead state to expect it to come back later.

failure: (read only)

How many times the server failed since SG system starts.

ac_list: string like "140.116.72.0/24 !140.116.49.0/24"

access control list of this server. This used to allow/deny requests from a client based on its ip/subnet. The default is allow all. If there is any entry in ACL, the the default is the opposite to the last entry. For example, "140.116.72.0/24 !140.116.49.0/24" means allow clients from 140.116.72.0/24 and deny clients from 140.116.72.49.0/24 and allow all the else.

Servers in same group can have different ACL to provide differential service for different client.
Ex: reserving the best computer in a group for internal use in a computing cluster

weight: 1-255

server weight used in load balancing

connection_count: (read only)

total connections this server has

packet_traffic: (read only)

packet counts per minute recently

external_count: a long integer

used to store the load value fed back by service programs on each server. SG can balance the load based on this value.

mcast_error_count: (read only)

times of mcast error this server has.

mcast_error_timestamp: (read only)

the time in seconds since the last mcast error happened

packet_delta_in: (read only)

the maximum of the not responding packet counts of all keyports in this server

packet_unack_timeout: (read only)

the maximum of the not responding packet timeout of all keyports in this server

porttest_error_count: (read only)

times of porttest error this server has

porttest_error_timestamp: (read only)

the time in seconds since the last porttest error happened

heartbeat_count: (read only)

times of heartbeat detected on this server

heartbeat_timestamp: (read only)

the time in seconds since the last heartbeat from this server

mem0...mem7: each one is a unsigned char

shared variables on SG, this is for programmers :)


SG Cluster by Distributed System Lab E.E. NCKU 2001