Log4j delivers control over logging

The open source log4j API for Java offers fast, efficient log services

Almost every large application includes its own logging or tracing API. Experience indicates that logging represents an important component of the development cycle. As such, logging offers several advantages. First, it can provide precise context about a run of the application. Once inserted into the code, the generation of logging output requires no human intervention. Second, log output can be saved in a persistent medium to be studied at a later time. Finally, in addition to its use in the development cycle, a sufficiently rich logging package can also be employed as an audit tool.

In conformance with that rule, in early 1996 the EU SEMPER (Secure Electronic Marketplace for Europe) project decided to write its own tracing API. After countless enhancements, several incarnations, and much work, that API has evolved into log4j, a popular logging package for Java. The package is distributed under the IBM Public License, certified by the open source initiative.

Logging does have its drawbacks. It can slow down an application. If too verbose, it can cause scrolling blindness. To alleviate those concerns, log4j is designed to be fast and flexible. Since logging is rarely the main focus of an application, log4j API strives to be simple to understand and use.

This article starts by describing the main components of the log4j architecture. It proceeds with a simple example depicting basic usage and configuration. It wraps up by touching on performance issues and the upcoming logging API from Sun.

Categories, appenders, and layouts

Log4j has three main components:

Categories
Appenders
Layouts

The three components work together to enable developers to log messages according to message type and priority, and to control at runtime how these messages are formatted and where they are reported. Let’s look at each in turn.

Category hierarchy

The first and foremost advantage of any logging API over plain System.out.println resides in its ability to disable certain log statements while allowing others to print unhindered. That capability assumes that the logging space, that is, the space of all possible logging statements, is categorized according to some developer-chosen criteria.

In conformance with that observation, the org.log4j.Category class figures at the core of the package. Categories are named entities. In a naming scheme familiar to Java developers, a category is said to be a parent of another category if its name, followed by a dot, is a prefix of the child category name. For example, the category named com.foo is a parent of the category named com.foo.Bar. Similarly, java is a parent of java.util and an ancestor of java.util.Vector.

The root category, residing at the top of the category hierarchy, is exceptional in two ways:

It always exists
It cannot be retrieved by name

In the Category class, invoking the static getRoot() method retrieves the root category. The static getInstance() method instantiates all other categories. getInstance() takes the name of the desired category as a parameter. Some of the basic methods in the Category class are listed below:

package org.log4j;
public Category class {
   // Creation & retrieval methods:
   public static Category getRoot();
   public static Category getInstance(String name);
   // printing methods:
   public void debug(String message);
   public void info(String message);
   public void warn(String message);
   public void error(String message);
   // generic printing method:
   public void log(Priority p, String message);
}

Categories may be assigned priorities from the set defined by the org.log4j.Priority class. Although the priority set matches that of the Unix Syslog system, log4j encourages the use of only four priorities: ERROR, WARN, INFO and DEBUG, listed in decreasing order of priority. The rationale behind that seemingly restricted set is to promote a more flexible category hierarchy rather than a static (even if large) set of priorities. You may, however, define your own priorities by subclassing the Priority class. If a given category does not have an assigned priority, it inherits one from its closest ancestor with an assigned priority. As such, to ensure that all categories can eventually inherit a priority, the root category always has an assigned priority.

To make logging requests, invoke one of the printing methods of a category instance. Those printing methods are:

error()
warn()
info()
debug()
log()

By definition, the printing method determines the priority of a logging request. For example, if c is a category instance, then the statement c.info("..") is a logging request of priority INFO.

A logging request is said to be enabled if its priority is higher than or equal to the priority of its category. Otherwise, the request is said to be disabled.. A category without an assigned priority will inherit one from the hierarchy.

Below, you’ll find an example of that rule:

// get a category instance named "com.foo"
Category  cat = Category.getInstance("com.foo");
// Now set its priority.
cat.setPriority(Priority.INFO);
Category barcat = Category.getInstance("com.foo.Bar");
// This request is enabled, because WARN >= INFO.
cat.warn("Low fuel level.");
// This request is disabled, because DEBUG < INFO.
cat.debug("Starting search for nearest gas station.");
// The category instance barcat, named "com.foo.Bar",
// will inherit its priority from the category named
// "com.foo" Thus, the following request is enabled
// because INFO >= INFO.
barcat.info("Located nearest gas station.");
// This request is disabled, because DEBUG < INFO.
barcat.debug("Exiting gas station search");

Calling the getInstance() method with the same name will always return a reference to the exact same category object. Thus, it is possible to configure a category and then retrieve the same instance somewhere else in the code without passing around references. Categories can be created and configured in any order. In particular, a parent category will find and link to its children even if it is instantiated after them. The log4j environment typically configures at application initialization, preferably by reading a configuration file, an approach we’ll discuss shortly.

Log4j makes it easy to name categories by software component. That can be accomplished by statically instantiating a category in each class, with the category name equal to the fully qualified name of the class — a useful and straightforward method of defining categories. As the log output bears the name of the generating category, such a naming strategy facilitiates identifying a log message’s origin. However, that is only one possible, albeit common, strategy for naming categories. Log4j does not restrict the possible set of categories. Indeed, the developer is free to name the categories as desired.

Appenders and layouts

The ability to selectively enable or disable logging requests based on their category is only part of the picture. Log4j also allows logging requests to print to multiple output destinations called appenders in log4j speak. Currently, appenders exist for the console, files, GUI components, remote socket servers, NT Event Loggers, and remote UNIX Syslog daemons.

A category may refer to multiple appenders. Each enabled logging request for a given category will be forwarded to all the appenders in that category as well as the appenders higher in the hierarchy. In other words, appenders are inherited additively from the category hierarchy. For example, if you add a console appender to the root category, all enabled logging requests will at least print on the console. If, in addition, a file appender is added to a category, say C, then enabled logging requests for C and C’s children will print on a file and on the console. Be aware that you can override that default behavior so that appender accumulation is no longer additive.

More often than not, users want to customize not only the output destination but also the output format, a feat accomplished by associating a layout with an appender. The layout formats the logging request according to the user’s wishes, whereas an appender takes care of sending the formatted output to its destination. The PatternLayout, part of the standard log4j distribution, lets the user specify the output format according to conversion patterns similar to the C language printf function.

For example, the PatternLayout with the conversion pattern %r [%t]%-5p %c - %m%n will output something akin to:

176 [main] INFO  org.foo.Bar - Located nearest gas station.

In the output above:

The first field equals the number of milliseconds elapsed since the start of the program
The second field indicates the thread making the log request
The third field represents the priority of the log statement
The fourth field equals the name of the category associated with the log request

The text after the - indicates the statement’s message.

Configuration

Inserting log requests into the application code requires a fair amount of planning and effort. Observation shows that code dedicated to logging represents approximately four percent of the application’s total. Consequently, even moderately sized applications will have thousands of logging statements embedded within their code. Given their number, it becomes imperative to manage those log statements without the need to modify them manually.

The log4j environment can be fully configured programmatically. However, it is far more flexible to configure log4j by using configuration files. Currently, configuration files can be written in XML or in Java properties (key=value) format.

Let us give a taste of how that is done with the help of an imaginary application — MyApp — that uses log4j:

 import com.foo.Bar;
 // Import log4j classes.
 import org.log4j.Category;
 import org.log4j.BasicConfigurator;
 public class MyApp {
   // Define a static category variable so that it references the
   // Category instance named "MyApp".
   static Category cat = Category.getInstance(MyApp.class.getName());
   public static void main(String[] args) {
     // Set up a simple configuration that logs on the console.
     BasicConfigurator.configure();
     cat.info("Entering application.");
     Bar bar = new Bar();
     bar.doIt();
     cat.info("Exiting application.");
   }
 }

As seen in the code above, MyApp begins by importing log4j related classes. It then defines a static category variable with the name MyApp, which happens to be the class’ fully qualified name.

MyApp uses the Bar class defined in the package com.foo:

 package com.foo;
 import org.log4j.Category;
 public class Bar {
   static Category cat = Category.getInstance(Bar.class.getName());
   public void doIt() {
     cat.debug("Did it again!");
   }
 }

In MyApp, the invocation of the BasicConfigurator.configure() method creates a rather simple log4j setup. That method is hardwired to add to the root category a FileAppender printing on the console. The output will be formatted by using a PatternLayout set to the pattern %-4r [%t] %-5p %c %x - %m%n.

Note that by default, the root category is assigned to Priority.DEBUG.

The output of MyApp is:

0    [main] INFO  MyApp  - Entering application.
36   [main] DEBUG com.foo.Bar  - Did it again!
51   [main] INFO  MyApp  - Exiting application.

Figure 1 depicts MyApp‘s object diagram immediately after it calls the BasicConfigurator.configure() method.

Figure 1. MyApp’s object diagram after calling the BasicConfigurator.configure() method

The MyApp class configures log4j by invoking BasicConfigurator.configure() method. Other classes need only import the org.log4j.Category class, retrieve the categories they want to use and log away.

The previous example always outputs the same log information. Fortunately, it is easy to modify MyApp so that the log output can be controlled at runtime. Below, you’ll see a slightly modified version:

 import com.foo.Bar;
 import org.log4j.Category;
 import org.log4j.PropertyConfigurator;
 public class MyApp {
   static Category cat = Category.getInstance(MyApp.class.getName());
   public static void main(String[] args) {
     // BasicConfigurator replaced with PropertyConfigurator.
     PropertyConfigurator.configure(args[0]);
     cat.info("Entering application.");
     Bar bar = new Bar();
     bar.doIt();
     cat.info("Exiting application.");
   }
 }

This version of MyApp instructs PropertyConfigurator to parse a configuration file and set up logging accordingly.

Let’s look at a sample configuration file that results in exactly the same output as the previous BasicConfigurator-based example:

# Set root category priority to DEBUG and its only appender to A1.
log4j.rootCategory=DEBUG, A1
# A1 is set to be a FileAppender which outputs to System.out.
log4j.appender.A1=org.log4j.FileAppender
log4j.appender.A1.File=System.out
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Suppose we no longer wish to see the output of any component belonging to the com.foo package. The following configuration file shows one possible way of achieving that:

log4j.rootCategory=DEBUG, A1
log4j.appender.A1=org.log4j.FileAppender
log4j.appender.A1.File=System.out
log4j.appender.A1.layout=org.log4j.PatternLayout
# Print the date in ISO 8601 format
log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
# Print only messages of priority WARN or above in the package com.foo.
log4j.category.com.foo=WARN

The output of MyApp configured with this file is shown below:

2000-09-07 14:07:41,508 [main] INFO  MyApp - Entering application.
2000-09-07 14:07:41,529 [main] INFO  MyApp - Exiting application.

As the category com.foo.Bar does not have an assigned priority, it inherits its priority from com.foo, which was set to WARN in the configuration file. The log statement from the Bar.doIt() method has the priority DEBUG, lower than the category priority WARN. Consequently, doIt()‘s log request is suppressed.

Next, we see another configuration file that uses multiple appenders:

log4j.rootCategory=debug, stdout, R
log4j.appender.stdout=org.log4j.FileAppender
log4j.appender.stdout.File=System.out
log4j.appender.stdout.layout=org.log4j.PatternLayout
# Pattern to output the caller's file name and line number.
log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n
log4j.appender.R=org.log4j.RollingFileAppender
log4j.appender.R.File=example.log
log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n

Calling the enhanced MyApp with the that configuration file will output the following on the console:

 INFO [main] (MyApp2.java:12) - Entering application.
DEBUG [main] (Bar.java:8) - Doing it again!
 INFO [main] (MyApp2.java:15) - Exiting application.

In addition, as the root category has been allocated a second appender, output will also be directed to the example.log file. That file will be rolled over when it reaches 100 KB. When rollover occurs, the old version of example.log automatically moves to example.log.1.

Note that to obtain those different logging behaviors, we did not need to recompile code. We could just as easily have logged to a Unix Syslog daemon and redirected all com.foo output to an NT Event logger. Similarly, we could have forwarded logging events to a remote log4j server, which would log according to local server policy — for example, by logging to a local file as well as forwarding the log event to a second log4j server.

Nested diagnostic contexts

Most real-world systems must deal with multiple clients simultaneously. In a typical multithreaded implementation of such a system, different threads will handle different clients. Considering that, logging is especially well suited to tracing and debugging complex distributed applications. A common approach to differentiate the logging output of one client from another is to instantiate a new separate category for each client. However, that approach promotes category proliferation and increases the management overhead of logging.

In a lighter technique, you can uniquely stamp each log request initiated from the same client interaction, an approach described by Neil Harrison (see Resources). To uniquely stamp each request, the user pushes contextual information into the Nested Diagnostic Context (NDC). The NDC class is shown below:

  public class NDC {
    // Used when printing the diagnostic
    public static String get();
    // Remove the top of the context from the NDC.
    public static String pop();
    // Add diagnostic context for the current thread.
    public static void push(String message);
    // Remove the diagnostic context for this thread.
    public static void remove();
  }

The NDC is managed per thread as a stack of contextual information. Note that all of org.log4j.NDC‘s methods are static. Assuming that NDC printing is turned on, every time a log request is made, the appropriate log4j component will include the entire NDC stack for the current thread in the log output. That is done without the intervention of the user, who is responsible only for placing the correct information in the NDC by using the push() and pop() methods at a few well-defined points in the code. In contrast, the per-client category approach commands extensive changes in the code.

To illustrate that point, let us take the example of a servlet delivering content to numerous clients. The servlet can build the NDC at the very beginning of the request before executing other code. The contextual information can be the client’s host name and other information inherent to the request, typically information contained in cookies. Hence, even if the servlet is serving multiple clients simultaneously, the logs initiated by the same code (that is, belonging to the same category) can still be distinguished because each client request will have a different NDC stack. Contrast that with the complexity of passing a freshly instantiated category to all code exercised during the client’s request.

Nevertheless, some sophisticated applications, such as virtual hosting Web servers, must log differently, depending on the virtual host context and the software component issuing the request. The most recent log4j release supports multiple hierarchy trees, an enhancement that allows each virtual host to possess its own copy of the category hierarchy.

Performance

One of the often-cited arguments against logging stems from its computational cost. That is a legitimate concern, as even moderately sized applications can generate thousands of log requests. Much effort was spent measuring and tweaking logging performance. Log4j claims to be fast and flexible: speed first, flexibility second.

Nevertheless, the user should be aware of the following performance issues:

Logging performance when logging is turned off.

When logging is turned off entirely or for a set of priorities, the cost of a log request is a method invocation plus an integer comparison. The method invocation involves the hidden cost of parameter construction.

For some category cat, writing
```
   cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
```
incurs the cost of constructing the message parameter that is converting both integer i and entry[i] to a String and concatenating intermediate strings, regardless of whether the message will be logged or not.

If one is worried about speed, then write:
```
   if(cat.isDebugEnabled() {
      cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
   }
```
That will not incur the cost of parameter construction if debugging is disabled. On the other hand, if the category is debug enabled, it will twice incur the cost of evaluating whether the category is enabled or not: once in debugEnabled() and once in debug(). That represents an insignificant overhead because evaluating a category takes about 1 percent of the time it takes to actually log.

In log4j, logging requests are made to instances of the Category class. Because Category is a class and not an interface, the cost of method invocation is arguably lower but at the cost of some flexibility.

On a 233 MHz Pentium II machine, the typical cost of logging when logging is turned off is in the range of 5 to 50 nanoseconds.
The performance of deciding whether to log or not to log when logging is turned on.

Under that scenario, the question centers on the performance cost of walking the category hierarchy. When logging is turned on, log4j still needs to compare the priority of the log request with the priority of the request category. However, categories may not have an assigned priority; they can inherit them from the category hierarchy. Thus, to discover its priority, the category may need to search its ancestors.

A serious effort has been made to make that hierarchy walk as fast as possible. For example, child categories link only to their existing ancestors. In the BasicConfigurator example shown earlier, the category named com.foo.Bar links directly to the root category, thereby circumventing the nonexistent com or com.foo categories, significantly improving the speed of the walk, especially in sparse hierarchies.

The typical cost of walking the hierarchy is in the range of 5 to 15 microseconds, again on a 233 MHz Pentium II machine.
Actual logging.

Another performance issue stems from the cost of formatting the log output and sending it to its target destination. Here again, a serious effort was made to make layouts (formatters) perform as quickly as possible. The same is true for appenders. The typical cost of actually logging ranges from about 100 to 300 microseconds.

Although log4j has a many features, its first design goal was speed. Some log4j components have been rewritten many times to improve performance. Nevertheless, contributors frequently come up with new optimizations.

Sun’s upcoming logging API

Sun has initiated JSR47 in the Java Community Process (JCP) to define a logging API for Java. The resulting API will require JDK version 1.4 and should soon become ready for public review.

The JSR47 API and log4j are quite similar at the architectural level. The JSR47 API will support a hierarchical namespace, one of the central features of log4j. On the other hand, log4j possesses many useful features missing in JSR47. Given the momentum behind log4j, in my partisan opinion, JSR47 is likely to be obsolete by the time it launches. Log4j is written by the people who use it, not by a closed committee.

By the way, log4j already provides support for bridging into the JSR47 API when the latter becomes available.

Let me also mention JLog, another logging API available from IBM’s alphaWorks. JLog was previously named the RAS Toolkit until IBM renamed it. On the alphaWorks site JLog has the label “Logging Toolkit for Java,” almost the same label that log4j had on alphaWorks before it migrated to JLog, while a good logging package, should not be confused with log4j.

Conclusion

Log4j is a popular logging package written in Java. One of its distinctive features includes the notion of inheritance in categories. Using a category hierarchy, it is possible to control which log statements are output at arbitrary granularity, thus reducing the volume of logged output and minimizing the cost of logging.

One of the advantages of the log4j API is its manageability. Once the log statements have been inserted into the code, they can be controlled with configuration files. Moreover, they can be selectively enabled or disabled, and sent to different and multiple output targets in user-chosen formats. Further, the log4j package is designed so that log statements can remain in shipped code without incurring a heavy performance cost.

Log4j is the result of a collective effort. My special thanks go to all the authors who have contributed to the project. Without exception, the best features in the package have all originated in the user community.

Source: www.infoworld.com