Websphere application server (WAS) – Thread dump and Heap Dump


When to generate ? How to generate ? how to debug ?

Thread Dumps
If you get unexplained server hangs under WebSphere, you can obtain, from the WebSphere server, a thread dump to help diagnose the problem.

In the case of a server hang, you can force an application to create a thread dump.

On unix/Linux machines find the process id (PID) of the hung JVM and issue kill -3 PID.  Look for an output file in the installation root directory with a name like javacore.date.time.id.txt.
Using wasadmin prompt,
get the handle of the server
wsadmin>set jvm [$AdminControl completeObjectName type=JVM,process=server1,*]
execute
wsadmin>$AdminControl invoke $jvm dumpThreads Continue reading

A little about DCS, cluster members and Cluster member crash


Have you ever been asked this question in the interview?
how do you find out which cluster member was crashed/down?

The general answer we give is to go to administration console and check the individual server status or the cluster member status.
The other option is to use a third-party monitoring tool such as ITCAM, wily introscope, UniCenter and Nagios etc..
Have you ever checked the system.out log file of any individual server when one of the cluster member was stopped?
WebSphere has Distribution & Consistency Services (DCS), which is a part of the HA architecture. Using these DCS messages we can find which member of the cluster is down.

Here is an example:

I’ve a cell with name Test-Cell, which has a cluster with 6nodes each having 2 servers.
I’ve stopped one of cluster members. Then if you see the System.Out log file, you see message similar to the below:

[3/3/10 18:00:37:758 CET] 00000026 RoleMember W DCSV8104W: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Removing member [Test-Cell\node02\server02] because the member was requested to be removed by member Test-Cell\node02\server01. Internal details VL suspects others: CC-Situation Normal
[3/3/10 18:00:38:176 CET] 00000023 VSyncAlgo1 I DCSV2004I: DCS Stack DefaultCoreGroup at Member Test-Cell\node01\server01: View synchronization completed successfully. The View Identifier is (22898:0.Test-Cell\node02\server01). The internal details are None.
[3/3/10 18:00:38:207 CET] 00000023 VSyncAlgo1 I DCSV2004I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: View synchronization completed successfully. The View Identifier is (331:0.Test-Cell\node02\server01). The internal details are None.
[3/3/10 18:00:38:537 CET] 00000024 CoordinatorIm I HMGR0218I: A new core group view has been installed. The core group is DefaultCoreGroup.
[3/3/10 18:00:39:228 CET] 00000026 DataStackMemb I DCSV8050I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: New view installed, identifier (332:0.Test-Cell\node02\server01), view size is 11 (AV=11, CD=12, CN=12, DF=12)
[3/3/10 18:00:39:343 CET] 00000021 DRSBuddyManag A CWWDR0006I: Replication instance terminated : Test-Cell\node02\server02

So, from the above messages, it is clear that server02 of Node02 was down and is removed from the coregroup.
After some troubleshooting/changes, i started the server which was down earlier. Now, if you observe the SystemOut.log, you can see the following:

[3/3/10 18:17:13:245 CET] 00000026 RoleMember I DCSV8051I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Core group membership set changed. Added: [Test-Cell\node02\server02].
[3/3/10 18:17:13:315 CET] 00000023 MbuRmmAdapter I DCSV1032I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Connected a defined member Test-Cell\node02\server02.
[3/3/10 18:17:30:337 CET] 00000023 VSyncAlgo1 I DCSV2004I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: View synchronization completed successfully. The View Identifier is (333:0.Test-Cell\node02\server01). The internal details are None.
[3/3/10 18:17:30:353 CET] 00000026 DataStackMemb I DCSV8050I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: New view installed, identifier (334:0.Test-Cell\node02\server01), view size is 12 (AV=12, CD=12, CN=12, DF=12)
[3/3/10 18:17:30:354 CET] 00000027 DRSBuddyManag A CWWDR0007I: Replication instance group membership changed: Test-Cell\node02\server02
[3/3/10 18:17:30:356 CET] 00000027 DRSBuddyManag A CWWDR0002I: Replication instance is active : Test-Cell\node02\server02
[3/3/10 18:17:30:358 CET] 00000010 ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Confirmed all new view members in view identifier (334:0.Test-Cell\node02\server01). View channel type is View|Ptp.

You can see a meesage that it added a new member to the coregroup.

About DCS:
There are two main versions of DCS: Core DCS and Data DCS. There is one Core DCS per process and it provides membership services among peer processes. These processes together form a Core Group. A process may be a member in one or more named Core Groups. Applications running on these processes can be members of application groups. Application groups are subsets of a particular named core group. A Data DCS component can be associated with each member of an application group.
DCS provides a mechanism for communicating information (distribution) among members with a given quality of service. Failure detection mechanisms that support and allow guaranteed quality of service are an inherent part of DCS and its services. DCS supports WebSphere components’ state replication requirements (like http session and stateful beans) as well as the distribution and synchronization of WebSphere artifacts for performance, scalability, and availability

you might see the same post else where on internet … because many are just copying the content with a reference to the origina.l post

WebSphere Application Server Important Files


XML Configuration Files
Property files
Log Files

WebSphere stores its configuration to set of XML files. When we use the Admin console to configure WebSphere, certain XML files are updated internally.

CELL-scope

• admin-authz.xml
Contains the roles set for administration of the Admin console.
<profile_root>/appsrv01/config/cells/<cell_name>/

• profileRegistry.xml
Contains a list of profiles and profile configuration data

• resources.xml
Defines operating cell scope environmental resources, including JDBC, JMS, JavaMail, URL end point configuration, and so on.

• security.xml
Contains security data , including all user ID and password information. Continue reading

Monitoring and Diagnose Websphere application server environment


IBM Says, they have a very low overhead monitoring tool solution to do the task. It is Java Health Center or known as HC.

“Health Center is a very low overhead monitoring tool. It runs alongside an IBM Java application with a very small impact on the application’s performance. Health Center monitors several application areas, using the information to provide recommendations and analysis that help you improve the performance and efficiency of your application. Health Center can save the data obtained from monitoring an application and load it again for analysis at a later date.”

Health Center provides visibility, monitoring and profiling in the following application areas:

  • Performance
    • Java method profiling: The Health Center uses a sampling method profiler to diagnose applications showing high CPU usage. It’s low overhead which means there is no need to specify in advance which parts of the application to monitor, the Health Center simply monitors everything. It works without recompilation or byte code instrumentation and shows where the application is spending its time, by giving full call stack information for all sampled methods.
    • Lock analysis: Synchonization can be a big performance bottleneck on multi-CPU systems. It is often difficult to identify a hot lock or assess the impact locking is having on your application. Health Center records all locking activity and identifies the objects with most contention. Health Center analyses this information, and uses it to provide guidance about whether synchronization is impacting performance
    • Garbage collection: The performance of Garbage Collection (GC) affects the entire application. Tuning GC correctly can potentially deliver significant performance gains. Health Center identifies where garbage collection is causing performance problems and suggests more appropriate command line options. Continue reading

forgot websphere admin console password


When you enable the security on WebSphere Application Server [WAS], it will prompt you for authentication when you access admin console, stop server and wsadmin prompt. As we discussed earlier in other blog post about WebSphere Security, all the security related settings are stored in config file under Profile_root/config/cells/cell_name.  File name is security.xml. The workaround when the administrator forgot the password is to change the security settings by manually modifying the security.xml file

  1. Locate the security.xml file and take a backup of it
  2. open security.xml file for editing and search for enabled=”true”
  3. modify it to enabled=”false” [you need to do this only for the very first occurrence of enabled=”true” ]
  4. Restart the servers
    1. Note: since you do not have the password you cannot stop the servers, so use KILL command
  5. Log into admin console
  6. Enable Security again
  7. Restart the servers.

Displaying non-english characters correctly with WebSphere


Did you faced issue while displaying non-english characters/special characters likes [currency symbols] dollor, pound etc ?

Try this solution:

    • Select Servers ->Application Server ->server name ->Process Definition ->Java Virtual Machine ->Custom Properties –>New
    • Type client.encoding.override in the Name ; Type UTF-8 in the Value column.
    • Click Apply.
    • Stop and restart the WebSphere Application Server

SRVE0133E: An error occurred while parsing parameters. java.net.SocketTimeoutException: Async operation timed out


[9/28/09 22:23:29:538 EST] 00000040 SRTServletReq E   SRVE0133E: An error occurred while parsing parameters. java.net.SocketTimeoutException: Async operation timed out
 at com.ibm.ws.tcp.channel.impl.AioTCPReadRequestContextImpl.processSyncReadRequest(AioTCPReadRequestContextImpl.java:157)
 at com.ibm.ws.tcp.channel.impl.TCPReadRequestContextImpl.read(TCPReadRequestContextImpl.java:109)
 at com.ibm.ws.http.channel.impl.HttpServiceContextImpl.fillABuffer(HttpServiceContextImpl.java:4127)
 at com.ibm.ws.http.channel.impl.HttpServiceContextImpl.readSingleBlock(HttpServiceContextImpl.java:3371)
 at com.ibm.ws.http.channel.impl.HttpServiceContextImpl.readBodyBuffer(HttpServiceContextImpl.java:3476)
 at com.ibm.ws.http.channel.inbound.impl.HttpInboundServiceContextImpl.getRequestBodyBuffer(HttpInboundServiceContextImpl.java:1604)
 at com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream.bufferIsGood(WCCByteBufferInputStream.java:133)
 at com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream.read(WCCByteBufferInputStream.java:95)
 at com.ibm.ws.webcontainer.srt.http.HttpInputStream.read(HttpInputStream.java:296)
 at com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData(RequestUtils.java:297)
 at com.ibm.ws.webcontainer.srt.SRTServletRequest.parseParameters(SRTServletRequest.java:1722
 

As you can see this error will originate from Web Container’s http transport channel. And the reason for this error to come is Network delay. By default, in WebSphere application server, ConnectionTimeout will be set to 5 seconds for http transport channels. So if your network is slow, you may see this error in the logs

If you would like to change this timeout settings, follow these steps:

    • Application Servers -> serverA -> Web Container -> HTTP Transport -> Application Name
    • go to custom properties
    • Add these two parameters (where xx is the time in seconds)
    • ConnectionIOTimeOut=xx and ConnectionKeepAliveTimeout=xx
    • Repeat this for all servers, if your application is mapped to multiple servers.
    • Restart the server (s)