The Siebel Connection Broker (SCBroker) is a server component that provides intraserver load balancing. SCBroker distributes session login requests across multiple instances of Application Object Managers (AOMs) running on a Siebel Server. SCBroker logic will always route the request to the AOM process that has the least number of running tasks.
Under certain scenarios, an AOM process can be in a hanging state and is not able to acknowledge connection requests from the SCBroker anymore. If this hanging AOM process happens to have the least number of running tasks, SCB will attempt to route new session requests to this AOM process repeatedly. This results in blockage of all new login requests on that Siebel Server.
The scenario that SCBroker can not connect a new session to an existing object manager process because this process itself is hanging, can usually be determined by the following entry in the swse log file:
[SWSE] Open Session failed after 60.1958 seconds.
[SWSE] Failed to obtain a session ID. Login failed attempting to connect to %1
[SWSE] Set Error Response (Session: Error: 10879185 Message: Login failed attempting to connect to siebel.TCPIP.None.None://hostname:2361/enterprise/SCCObjMgr_enu)
[SWSE] Login failed.
SBL-SSM-00005: Timeout occurred while opening SISNAPI connection.
Login failed for Login name : SADMIN
[SWSE] Open Session failed (0xa600d1) after 60.0111 seconds.
Note the last line where we see a connection timeout of 60 seconds displayed.
This is the built in timeout of 60 seconds after which a request from SWSE plugin to the SCBroker is cancelled.
From that message we can conclude that SCBroker was not able to transfer a login request to a working object manager on node 'hostname'
We can also conclude that the object manager process is still running since it is still in SCBrokers routing table, however it does not accept new connections.
This is usually caused by an MT process hang scenario.
The following criteria also should be met to confirm a deadlock behaviour:
- Every login attempt to a certain AOM on the affected Siebel Server is prompted with the message: "SBL-SWP-00121: The server you are trying to access is either busy or experiencing difficulties. Please close the Web browser, open a new browser window, and try logging in again."
- It takes approximately 60 seconds until the server busy message appears.
- AOM tasks might be staying in the state “Handling Logon” or ”Relogin”.
- The SCBroker log is continuously logging the following error message: "SBL-SCB-00011: Failed to connect to pipe (SEBL_0_pid) on process pid". where pid is the process id of the corresponding object manager process. This process id can be found in the enterprise log to associate it with a certain AOM.
The workaround to get the SCBroker process operative again is to terminate the hanging AOM process manually. This can be done using the kill command on UNIX or the Task Manager on Windows.
Once the offending process has been terminated, SCBroker will continue normal distribution of new session requests across the remaining AOM processes.
To help us in this situation there is a new SCBroker parameter "ConnForwardAlgorithm" or "Connection Forward algorithm for SCBroker".
This is a hidden parameter.
This parameter determines which algorithm is used to forward incoming login requests to MT server processes. There are two possible methods:
a) Least Loaded "LL", the default
and
b) Round Robin "RR"
Note that this parameter is an advanced parameter as can be seen in following srvrmgr example:
srvrmgr> list advanced param ConnForwardAlgorithm for comp SCBroker show PA_ALIAS, PA_VALUE, PA_NAME
Although LL is advisable in terms of performance, in case of a SCBroker hang this is causing unwanted behavior: once SCBroker has identified the least loaded process, it will reconnect to this particular pid until next session is established. Now in the case that the supposed to be least loaded process is hanging, SCBroker will try that process again and again and we might see stalling logins on all web servers.
In this situation, the algorithm should be changed to "RR".
This has two advantages:
1) The hanging process will only be contacted once. Next attempt is going to next available pid as described in the routing table. The effect will be that only one login is affected, and the consecutive logins will be successful until the hanging process is revisited. In that way, all requests will be distributed among the remaining, non-hanging processes and end users should get server busy message only once and a retry should them connect to a working process.
2) By monitoring the task distribution you can identify the process id that is not getting additional hits.
This process id is very likely the culprit.
The following srvrmgr command can be used in order to switch the forwarding mode:
Using srvrmgr with the /s switch, connect to the siebel server corresponding to the hostname as identified by the hostname in the swse error message.
Then change the parameter:
change param ConnForwardAlgorithm=RR for comp SCBroker
Now restart SCBroker component:
shutdown fast comp SCBroker
startup comp SCBroker
Now you can monitor for which particular process id the number of running tasks value is not increasing anymore:
list procs for comp <interactive_comp_name> show TK_PID,TK_NUM_NORMAL_TASKS
If you encounter hanging processes on multiple server nodes, then you need to change the parameter value to RR on all of these nodes.
Please note that the SCBroker forwarding mode is local to each siebel server in the enterprise, so you can have mixed LL and RR configuration within an enterprise.
It should also be mentioned that existing, established sessions are not affected by the SCBroker component bounce.
Established session will continue to work even when the SCBroker is temporarily unavailable.
By running the round robin scheduler mode of versions as stated above, it is possible to distribute incoming login requests to all processes that in this moment are still able to process requests.