The next tool to look at is also part of the standard JDK distribution. The tool is called “jstat” – a Java Virtual Machine statistics monitoring tool. This is a command line tool that can be used to get metrics from the running JVM. The JVM connected can again either be local or remote. A full list of metrics that jstat is capable of exposing can be obtained by running “jstat -option” from the command line. The most commonly used options are:
+-----------------+---------------------------------------------------------------+ | Option | Displays... | +-----------------+---------------------------------------------------------------+ |class | Statistics on the behavior of the class loader | |compiler | Statistics on the behavior of the HotSpot Just-In-Time com- | | | piler | |gc | Statistics on the behavior of the garbage collected heap | |gccapacity | Statistics of the capacities of the generations and their | | | corresponding spaces. | |gccause | Summary of garbage collection statistics (same as -gcutil), | | | with the cause of the last and current (if applicable) | | | garbage collection events. | |gcnew | Statistics of the behavior of the new generation. | |gcnewcapacity | Statistics of the sizes of the new generations and its corre- | | | sponding spaces. | |gcold | Statistics of the behavior of the old and permanent genera- | | | tions. | |gcoldcapacity | Statistics of the sizes of the old generation. | |gcpermcapacity | Statistics of the sizes of the permanent generation. | |gcutil | Summary of garbage collection statistics. | |printcompilation | Summary of garbage collection statistics. | +-----------------+---------------------------------------------------------------+
This tool is extremely useful for getting a quick overview of JVM health to see whether the garbage collector behaves as expected. You can run it via “jstat -gc -t PID 1s”, for example, where PID is the process ID of the JVM you want to monitor. You can acquire PID via running “jps” to get the list of running Java processes. As a result, each second jstat will print a new line to the standard output similar to the following example:
Timestamp S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 200.0 8448.0 8448.0 8448.0 0.0 67712.0 67712.0 169344.0 169344.0 21248.0 20534.3 3072.0 2807.7 34 0.720 658 133.684 134.404 201.0 8448.0 8448.0 8448.0 0.0 67712.0 67712.0 169344.0 169343.2 21248.0 20534.3 3072.0 2807.7 34 0.720 662 134.612 135.332 202.0 8448.0 8448.0 8102.5 0.0 67712.0 67598.5 169344.0 169343.6 21248.0 20534.3 3072.0 2807.7 34 0.720 667 135.340 136.060 203.0 8448.0 8448.0 8126.3 0.0 67712.0 67702.2 169344.0 169343.6 21248.0 20547.2 3072.0 2807.7 34 0.720 669 136.178 136.898 204.0 8448.0 8448.0 8126.3 0.0 67712.0 67702.2 169344.0 169343.6 21248.0 20547.2 3072.0 2807.7 34 0.720 669 136.178 136.898 205.0 8448.0 8448.0 8134.6 0.0 67712.0 67712.0 169344.0 169343.5 21248.0 20547.2 3072.0 2807.7 34 0.720 671 136.234 136.954 206.0 8448.0 8448.0 8134.6 0.0 67712.0 67712.0 169344.0 169343.5 21248.0 20547.2 3072.0 2807.7 34 0.720 671 136.234 136.954 207.0 8448.0 8448.0 8154.8 0.0 67712.0 67712.0 169344.0 169343.5 21248.0 20547.2 3072.0 2807.7 34 0.720 673 136.289 137.009 208.0 8448.0 8448.0 8154.8 0.0 67712.0 67712.0 169344.0 169343.5 21248.0 20547.2 3072.0 2807.7 34 0.720 673 136.289 137.009
Let us interpret the output above using the explanation given to output attributes in the jstat manpage. Using the knowledge acquired, we can see that:
- jstat connected to the JVM 200 seconds from the time this JVM was started. This information is present in the first column labeled “Timestamp”. As seen from the very same column, the jstat harvests information from the JVM once every second as specified in the “1s” argument given in the command.
- From the first line we can see that up to this point the young generation has been cleaned 34 times and the whole heap has been cleaned 658 times, as indicated by the “YGC” and “FGC” columns, respectively.
- The young generation garbage collector has been running for a total of 0.720 seconds, as indicated in the “YGCT” column.
- The total duration of the full GC has been 133.684 seconds, as indicated in the “FGCT” column. This should immediately catch our eye – we can see that out of the total 200 seconds the JVM has been running, 66% of the time has been spent in Full GC cycles.
The problem becomes even clearer when we look at the next line, harvesting information a second later.
- Now we can see that there have been four more Full GCs running during the one second between the last time jstat printed out the data as indicated in the “FGC” column.
- These four GC pauses have taken almost the entire second – as seen in the difference in the “FGCT” column. Compared to the first row, the Full GC has been running for 928 milliseconds, or 92.8% of the total time.
- At the same time, as indicated by the “OC” and “OU” columns, we can see that from almost all of the old generation capacity of 169,344.0 KB (“OC“), after all the cleaning work that the four collection cycles tried to accomplish, 169,344.2 KB (“OU“) is still in use. Cleaning 800 bytes in 928 ms should not be considered a normal behavior.
Only these two rows from the jstat output give us insight that something is terribly wrong with the application. Applying the same analytics to the next rows, we can confirm that the problem persists and is getting even worse.
The JVM is almost stalled, with GC eating away more than 90% of the available computing power. And as a result of all this cleaning, almost all the old generation still remains in use, further confirming our doubts. As a matter of fact, the example died in under a minute later with a “java.lang.OutOfMemoryError: GC overhead limit exceeded” error, removing the last remaining doubts whether or not things are truly sour.
As seen from the example, jstat output can quickly reveal symptoms about JVM health in terms of misbehaving garbage collectors. As general guidelines, just looking at the jstat output will quickly reveal the following symptoms:
- Changes in the last column, “GCT”, when compared to the total runtime of the JVM in the “Timestamp” column, give information about the overhead of garbage collection. If you see that every second, the value in that column increases significantly in comparison to the total runtime, a high overhead is exposed. How much GC overhead is tolerable is application-specific and should be derived from the performance requirements you have at hand, but as a ground rule, anything more than 10% looks truly suspicious.
- Rapid changes in the “YGC” and “FGC” columns tracking the young and Full GC counts also tend to expose problems. Much too frequent GC pauses when piling up again affect the throughput via adding many stop-the-world pauses for application threads.
- When you see that the old generation usage in the “OU” column is almost equal to the maximum capacity of the old generation in the “OC” column without a decrease after an increase in the “FGC” column count has signaled that old generation collection has occurred, you have exposed yet another symptom of poorly performing GC.