Java 101: Java’s character and assorted string classes support text-processing
Explore Character, String, StringBuffer, and StringTokenizer
Text can represent a combination of digits, letters, punctuation, words, sentences, and more. Computer programs that process text need assistance (from their associated languages) to represent and manipulate text. Java provides such assistance through the Character
, String
, StringBuffer
, and StringTokenizer
classes. In this article, you’ll create objects from these classes and examine their various methods. You’ll also receive answers to three mysteries: why Java regards a string literal as a String
object, why String
objects are immutable (and how immutability relates to string internment), and what happens behind the scenes when the string concatenation operator concatenates two strings into a single string.
Note |
---|
Future articles will cover the Character , String , StringBuffer , and StringTokenizer methods that I omit in this discussion. |
The Character class
Though Java already has a character type and char
keyword to represent and manipulate characters, the language also requires a Character
class for two reasons:
- Many data structure classes require their data structure objects to store other objects—not primitive type variables. Because directly storing a
char
variable in these objects proves impossible, that variable’s value must wrap inside aCharacter
object, which subsequently stores in a data structure object. - Java needs a class to store various character-oriented utility methods—static methods that perform useful tasks and do not require
Character
objects; for example, a method that converts an arbitrary character argument representing a lowercase letter to another character representing the uppercase equivalent.
Character objects
The java.lang.Character
class declares a private value
field of character type. A character stores in value
when code creates a Character
object via class Character
‘s public Character(char c)
constructor, as the following code fragment demonstrates:
Character c = new Character ('A');
The constructor stores the character that 'A'
represents in the value
field of a new Character
object that c
references. Because the Character
object wraps itself around the character, Character
is a wrapper class.
By calling Character
‘s public char charValue()
method, code extricates the character from the Character
object. Furthermore, by calling Character
‘s public String toString()
method, code returns the character as a String
object. The following code, which builds on the previous fragment, demonstrates both method calls:
System.out.println (c.charValue ());
String s = c.toString ();
System.out.println (c.charValue ());
returns value
‘s contents and outputs those contents (A
) to the standard output device. String s = c.toString ();
creates a String
object containing value
‘s contents, returns the String
‘s reference, and assigns that reference to String
variable s
.
Character
supplies three methods that compare Character
objects for ordering or other purposes. The public int compareTo(Character anotherCharacter)
method compares the contents of two Character
s by subtracting anotherCharacter
‘s value
field from the current Character
‘s value
field. The integer result returns. If the result is zero, both objects are the same (based on the value
field only). If the result is negative, the current Character
‘s value
is numerically less than the anotherCharacter
-referenced Character
‘s value
. Finally, a positive result implies that the current Character
‘s value
field is numerically greater than anotherCharacter
‘s value
field. A second overloaded public int compareTo(Object o)
method works the same as compareTo(Character anotherCharacter)
(and returns the same result), but compares the current Character
and the o
-referenced object (which must be of type Character
, or the method throws a ClassCastException
object). compareTo(Object o)
allows Java’s Collections Framework to sort Character
s according to natural order. (A future article will discuss that method, sorting, and natural order.) Finally, the public final boolean equals(Object o)
method compares the contents of the value
field in the current Character
with the contents of the value
field in o
. A Boolean true value returns if o
is of type Character
and if both value
fields contain the same contents. Otherwise, false returns. To see the compareTo(Character anotherCharacter)
and equals(Object o)
methods in action, examine the following code fragment:
Character c1 = new Character ('A');
Character c2 = new Character ('B');
Character c3 = new Character ('A');
System.out.println ("c1.compareTo (c2): " + c1.compareTo (c2));
System.out.println ("c1.equals (c2): " + c1.equals (c2));
System.out.println ("c1.equals (c3): " + c1.equals (c3));
System.out.println ("c1.compareTo (c2): " + c1.compareTo (c2));
outputs -1
because A
is (numerically) less than B
. System.out.println ("c1.equals (c2): " + c1.equals (c2));
outputs false
because the Character
s that c1
and c2
reference contain different characters (A
and B
). Finally, System.out.println ("c1.equals (c3): " + c1.equals (c3));
outputs true
because, although c1
and c3
reference different Character
s, both objects contain the same character (A
).
Character-oriented utility methods
Character
serves as a repository for character-oriented utility methods. Examples of those methods include:
public static boolean isDigit(char c),
which returns a Boolean true value ifc
‘s character is a digit. Otherwise, false returns.public static boolean isLetter(char c),
which returns a Boolean true value ifc
‘s character is a letter. Otherwise, false returns.public static boolean isUpperCase(char c),
which returns a Boolean true value ifc
‘s character is an uppercase letter. Otherwise, false returns.public static char toLowerCase(char c),
which returns the lowercase equivalent ofc
‘s character if it is uppercase. Otherwisec
‘s character returns.public static char toUpperCase(char c),
which returns the uppercase equivalent ofc
‘s character if it is lowercase. Otherwisec
‘s character returns.
The following code fragment demonstrates those five methods:
System.out.println (Character.isDigit ('4')); // Output: true
System.out.println (Character.isLetter (';')); // Output: false
System.out.println (Character.isUpperCase ('X')); // Output: true
System.out.println (Character.toLowerCase ('B')); // Output: b
System.out.println (Character.toUpperCase ('a')); // Output: A
Another useful utility method is Character
‘s public static char forDigit(int digit, int radix)
, which converts digit
‘s integer value to its character equivalent in the number system that radix
specifies and returns the result. However, if digit
identifies an integer less than zero or greater than or equal to radix
‘s value, forDigit(int digit, int radix)
returns the null character (represented in source code as Unicode escape sequence 'u0000'
). Similarly, if radix
identifies an integer less than Character
‘s MIN_RADIX
constant or greater than Character
‘s MAX_RADIX
constant, forDigit(int digit, int radix)
returns the null character. The following code demonstrates that method:
for (int i = 0; i < 16; i++)
System.out.println (Character.forDigit (i, 16));
That fragment converts integer numbers 0 through 15 to their character equivalents in the hexadecimal number system and outputs those character equivalents (0 through f).
To complement the forDigit(int digit, int radix)
method, Character
provides the public static int digit(char c, int radix)
method, which converts the c
-specified character value in the radix
-specified number system, to the value’s integer equivalent and returns the result. If c
contains a nondigit character for the specified number system or radix
is not in the MIN_RADIX
/MAX_RADIX
range, digit(char c, int radix)
returns -1
. The following code demonstrates that method:
char [] digits = { '0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'x' };
for (int i = 0; i < digits.length; i++)
System.out.println (Character.digit (digits [i], 16));
The fragment above converts the digits
array’s digit characters to their integer equivalents and outputs the results. Apart from the last character, each character represents a hexadecimal digit. (Passing 16
as the radix
argument informs digit(char c, int radix)
that the number system is hexadecimal.) Because x
does not represent a hexadecimal digit, digit(char c, int radix)
outputs -1
when it encounters that character.
To demonstrate Character
‘s isDigit(char c)
and isLetter(char c)
methods, I’ve created a CA
(character analysis) application that counts a text file’s digits, letters, and other characters. In addition to printing those counts, CA
calculates and prints each count’s percentage of the total count. Listing 1 presents CA
‘s source code (don’t worry about the file-reading logic: I’ll explain FileInputStream
and other file-related concepts in a future article):
Listing 1: CA.java
// CA.java
// Character Analysis
import java.io.*;
class CA
{
public static void main (String [] args)
{
int ch, ndigits = 0, nletters = 0, nother = 0;
if (args.length != 1)
{
System.err.println ("usage: java CA filename");
return;
}
FileInputStream fis = null;
try
{
fis = new FileInputStream (args [0]);
while ((ch = fis.read ()) != -1)
if (Character.isLetter ((char) ch))
nletters++;
else
if (Character.isDigit ((char) ch))
ndigits++;
else
nother++;
System.out.println ("num letters = " + nletters);
System.out.println ("num digits = " + ndigits);
System.out.println ("num other = " + nother + "rn");
int total = nletters + ndigits + nother;
System.out.println ("% letters = " +
(double) (100.0 * nletters / total));
System.out.println ("% digits = " +
(double) (100.0 * ndigits / total));
System.out.println ("% other = " +
(double) (100.0 * nother / total));
}
catch (IOException e)
{
System.err.println (e);
}
finally
{
try
{
fis.close ();
}
catch (IOException e)
{
}
}
}
}
If you want to perform a character analysis on CA
‘s source file—CA.java
—execute java CA ca.java
. You see the following output:
num letters = 609
num digits = 18
num other = 905
% letters = 39.75195822454308
% digits = 1.174934725848564
% other = 59.07310704960835
The String class
The String
class contrasts with Character
in that a String
object stores a sequence of characters—a string—whereas a Character
object stores one character. Because strings are pervasive in text-processing and other programs, Java offers two features that simplify developer interaction with String
objects: simplified assignment and an operator that concatenates strings. This section examines those features.
String objects
A java.lang.String
object stores a character sequence in a character array that String
‘s private value
field variable references. Furthermore, String
‘s private count
integer field variable maintains the number of characters in that array. Each String
has its own copy of those fields, and Java’s simplified assignment shortcut offers the easiest way to create a String
and store a string in the String
‘s value
array, as the following code demonstrates:
public static void main (String [] args)
{
String s = "abc";
System.out.println (s); // Output: abc
}
When the compiler compiles the preceding fragment, it stores the abc
string literal in a special area of the class file—the constant pool, which is a collection of string literals, integer literals, and other constants. The compiler also generates a byte code instruction (ldc
—load constant) that pushes a reference to a String
object containing abc
onto the calling thread’s stack, and generates another instruction (astore_1
) that pops that reference from the stack and stores it in the s
local variable, which corresponds to local variable 1 at the JVM level.
What creates that String
object and when? Neither the Java Language Specification nor the Java Virtual Machine Specification offer answers that I can find. Instead, I speculate the following: When a classloader—a concept I’ll discuss in a future article—loads a class file, it scans its constant pool’s memory copy. For each string literal in that pool, the classloader creates a String
, populates that object with the string literal’s characters, and modifies the string literal’s entry in the constant pool’s memory copy so ldc
pushes the String
‘s reference onto the calling thread’s stack.
Because the compiler and classloader treat string literals as String
objects, "abc".length()
and synchronized ("sync object")
are legal. "abc".length()
returns the length of the String
containing abc
; and synchronized ("sync object")
grabs the lock associated with the String
containing sync object
. Java regards these and other string literals as String
objects to serve as a convenience for developers. As with the simplified assignment shortcut, substituting string literals for String
object reference variables reduces the amount of code you must write.
Java also offers a variety of String
constructors for creating String
objects. I detail three below:
public String(char [] value)
creates a newString
object that contains a copy of all characters found in thevalue
array parameter. Ifvalue
is null, this constructor throws aNullPointerException
object.public String(char [] value, int offset, int count)
creates a newString
that contains a portion of those characters found invalue
. Copying begins at theoffset
array index and continues forcount
characters. Ifvalue
is null, this constructor throws aNullPointerException
object. If eitheroffset
orcount
contain values that lead to invalid array indexes, this constructor throws anIndexOutOfBoundsException
object.public String(String original)
creates a newString
that contains the same characters as theoriginal
-referencedString
.
The following code demonstrates the first two constructors:
char [] trueFalse = { 't', 'f', 'T', 'F' };
String s1 = new String (trueFalse);
String s2 = new String (trueFalse, 2, 2);
After this fragment executes, s1
references a String
containing tfTF
, and s2
references a String
containing TF
.
The following code demonstrates the third constructor:
String s3 = new String ("123");
That fragment passes a reference to a string literal-based String
containing 123
to the String(String original)
constructor. That constructor copies original
‘s contents to the new s3
-referenced String
.
No matter how many times the same string literal appears in source code, the compiler ensures that only one copy stores in the class file’s constant pool. Furthermore, the compiler ensures that only nonduplicates of all string constant expressions (such as "a" + 3
) end up in the constant pool as string literals. When a classloader creates String
s from all string literal entries in the constant pool, each String
‘s contents are unique. The classloader interns, or confines, those String
s in a common string memory pool located in JVM-managed memory.
At runtime, as new String
s are created, they do not intern in the common string memory pool for performance reasons. Verifying a String
‘s nonexistence in that pool eats up time, especially if many String
s already exist. However, thanks to a String
method, code can intern newly created String
objects into the pool, which I’ll show you how to accomplish later in this article.
For proof that string literal-based String
s store in the common string memory pool and new String
s do not, consider the following code fragment:
String a = "123";
String b = "123";
String c = new String (a);
System.out.println ("a == b: " + (a == b));
System.out.println ("a == c: " + (a == c));
System.out.println ("a == b: " + (a == b));
outputs a == b: true
because the compiler stores one copy of 123
in the class file’s constant pool. At runtime, a
and b
receive the same reference to the 123
String
object that exists in the common string memory pool. In contrast, System.out.println ("a == c: " + (a == c));
outputs a == c: false
because the a
-referenced (and b
-referenced) String
stores in the common string memory pool and the c
-referenced String
does not. If c
existed in the common string memory pool, c
, b
, and a
would all reference the same String
(since that pool contains no duplicates). Hence, System.out.println ("a == c: " + (a == c));
would output a == c: true
.
Interning String
s in the common string memory pool poses a problem. Because that pool does not permit duplicates, what happens if code interned a String
and then changed that object’s contents? The pool might then contain two String
s with the same contents, which defeats the string memory savings that internment in the pool provides. For that reason, Java does not let code modify a String
. Thus, String
s are immutable, or unchangeable.
Note |
---|
Some String methods (such as public String toUpperCase(Locale l) ) appear to modify a String , but in actuality, don’t. Instead, such methods create a new String containing a modified string. |
String method sampler
String
contains more than 50 methods (not including constructors), however we’ll only examine 13:
Note |
---|
Many String methods require an index (also known as an offset) argument for accessing a character in the String object’s value array (or a character array argument). That index/offset is always zero-based: index/offset 0 refers to the array’s first character. |
public char charAt(int index)
extracts the character at theindex
position in the currentString
object’svalue
array and returns that character. This method throws anIndexOutOfBoundsException
object ifindex
is negative or equals/exceeds the string’s length. Example:String s = "Hello"; System.out.println (s.charAt (0));
(output:H
).public int compareToIgnoreCase(String anotherString)
performs a lexicographic (dictionary order) case-insensitive comparison between characters in the currentString
‘svalue
array and thevalue
array of theanotherString
-referencedString
. A zero return value indicates that both arrays contain the same characters; a positive return value indicates that the currentString
‘svalue
array identifies a string that follows the string thatanotherString
‘svalue
array represents; and a negative value indicates thatanotherString
‘s string follows the currentString
‘s string. This method throws aNullPointerException
object ifanotherString
is null. Example:String s = "abc"; String t = "def"; System.out.println (s.compareToIgnoreCase (t));
(output:-3
).public String concat(String str)
creates a newString
containing the currentString
‘s characters followed by thestr
-referencedString
‘s characters. A reference to the newString
returns. However, ifstr
contains no characters, a reference to the currentString
returns. Example:String s = "Hello,"; System.out.println (s.concat (" World"));
(output:Hello, World
). Although you can chooseconcat(String str)
, the string concatenation operator (+
) produces more compact source code. For example,String t = "a"; String s = t + "b";
is more compact thanString t = "a"; String s = t.concat ("b");
. However, because the compiler convertsString t = "a"; String s = t + "b";
toString t = "a"; String s = new StringBuffer ().append (t).append ("b").toString ();
, usingconcat(String str)
might seem cheaper. However, their execution times prove similar.-
Caution Do not use either concat(String str)
or the string concatenation operator in a loop that executes repeatedly; that can affect your program’s performance. (Each approach creates several objects—which can increase garbage collections—and several methods are called behind the scenes.) public static String copyValueOf(char [] data)
creates a newString
containing a copy of all characters in thedata
array and returns the newString
‘s reference. Example:char [] yesNo = { 'y', 'n', 'Y', 'N' }; String s = String.copyValueOf (yesNo); System.out.println (s);
(output:ynYN
).public boolean equalsIgnoreCase(String anotherString)
performs a case-insensitive comparison of the currentString
‘s characters withanotherString
‘s characters. If those characters match (from a case-insensitive perspective), this method returns true. But if either the characters do not match or ifanotherString
contains a null reference, this method returns false. Example:System.out.println ("Abc".equalsIgnoreCase ("aBC"));
(output:true
).public int indexOf(int ch)
returns the index ofch
‘s first occurrence in the currentString
‘svalue
array. If that character does not exist, -1 returns. Example:System.out.println ("First index = " + "The quick brown fox.".indexOf ('o'));
(output:First index = 12
).public String intern()
interns aString
in the common string memory pool. Example:String potentialNapoleanQuote = new String ("Able was I, ere I saw Elba!"); potentialNapoleanQuote.intern ();.
-
Tip To quicken string searches, use intern()
to intern yourString
s in the common string memory pool. Because that pool contains no duplicateString
s, each object has a unique reference. Plus, using==
to compare references proves faster than using a method to compare a string’s characters. public int lastIndexOf(int ch)
returns the index ofch
‘s last occurrence in the currentString
‘svalue
. If that character does not exist,-1
returns. Example:System.out.println ("Last index = " + "The quick brown fox.".lastIndexOf ('o'));
(output:Last index = 17
).public int length()
returns the value stored incount
. In other words, this method returns a string’s length. If the string is empty,length()
returns 0. Example:System.out.println ("abc".length ());
(output:3
).-
Caution Confusing length()
withlength
leads to compiler errors.length()
is a method that returns the current number of characters in aString
‘svalue
array, whereaslength
is a read-only array field that returns the maximum number of elements in an array. public String substring(int beginIndex, int endIndex)
creates a newString
that contains every character in the string beginning atbeginIndex
and ending at one position less thanendIndex
, and returns that object’s reference. However, ifbeginIndex
contains 0 andendIndex
contains the string’s length, this method returns a reference to the currentString
. Furthermore, ifbeginIndex
is negative,endIndex
is greater than the string’s length, orbeginIndex
is greater thanendIndex
, this method throws anIndexOutOfBoundsException
object. Example:System.out.println ("Test string.".substring (5, 11));
(output:string
).public char [] toCharArray()
creates a new character array, copies the contents of the currentString
‘svalue
array to the new character array, and returns the new array’s reference. Example:String s = new String ("account"); char [] ch = s.toCharArray ();
.-
public String trim()
completes one of two tasks:- Creates a new
String
with the same contents as the currentString
—except for leading and trailing white space characters (that is, characters with Unicode values less than or equal to 32)—and returns that reference - Returns the current
String
‘s reference if no leading/trailing white space characters exist
Example:
System.out.println ("[" + " tabcd ".trim () + "]");
(output:[abcd]
). - Creates a new
public static string valueOf(int i)
creates a newString
containing the character representation ofi
‘s integer value and returns that object’s reference. Example:String s = String.valueOf (20); s += " dollars"; System.out.println (s);
(output:20 dollars
).
To demonstrate String
‘s charAt(int index)
and length()
methods, I prepared a HexDec
hexadecimal-to-decimal conversion application:
Listing 2: HexDec.java
// HexDec.java
// Hexadecimal to Decimal
class HexDec
{
public static void main (String [] args)
{
if (args.length != 1)
{
System.err.println ("usage: java HexDec hex-character-sequence");
return;
}
// Convert argument from hexadecimal to decimal
int dec = 0;
String s = args [0];
for (int i = 0; i < s.length (); i++)
{
char c = s.charAt (i); // Extract character
// If character is an uppercase letter, convert character to
// lowercase
if (Character.isUpperCase (c))
c = Character.toLowerCase (c);
if (!(c >= '0' && c <= '9') && !(c >= 'a' && c <= 'f'))
{
System.err.println ("invalid character detected");
return;
}
dec <<= 4;
if (c <= '9')
dec += (c - '0');
else
dec += (c - 'a' + 10);
}
System.out.println ("decimal equivalent = " + dec);
}
}
If you want to convert hexadecimal number 7fff to a decimal, use java HexDec 7fff
. You then observe the following output:
decimal equivalent = 32767
Caution |
---|
HexDec includes the expression i < s.length () in its for loop header. For long loops, do not call length() in a for loop header because of method call overhead (which can affect performance). Instead, call that method and save its return value before entering the loop, and then use the saved value in the loop header. Example: int len = s.length (); for (int i = 0; i < len; i++) . For toy programs, like HexDec , it doesn’t matter if I call length() in the for loop header. But for professional programs where performance matters, every time-saving trick helps. (Some Java compilers perform this optimization for you.) |
For another String
method demonstration, see Listing 3, which shows how the intern()
method and the ==
operator enable a rapid search of a partial list of country names for a specific country:
Listing 3: CS.java
// CS.java
// Country search
import java.io.*;
class CS
{
static String [] countries =
{
"Argentina",
"Australia",
"Bolivia",
"Brazil",
"Canada",
"Chile",
"China",
"Denmark",
"Egypt",
"England",
"France",
"India",
"Iran",
"Ireland",
"Iraq",
"Israel",
"Japan",
"Jordan",
"Pakistan",
"Russia",
"Scotland",
"South Africa",
"Sweden",
"Syria",
"United States"
};
public static void main (String [] args)
{
int i;
if (args.length != 1)
{
System.err.println ("usage: java CS country-name");
return;
}
String country = args [0];
// First search attempt using == operator
for (i = 0; i < countries.length; i++)
if (country == countries [i])
{
System.out.println (country + " found");
break;
}
if (i == countries.length)
System.out.println (country + " not found");
// Intern country string
country = country.intern ();
// Second search attempt using == operator
for (i = 0; i < countries.length; i++)
if (country == countries [i])
{
System.out.println (country + " found");
break;
}
if (i == countries.length)
System.out.println (country + " not found");
}
}
CS
attempts twice to locate a specific country name in an array of country names with the ==
operator. The first attempt fails because the country name string literals end up as String
s in the common string memory pool, and the String
containing the name being searched is not in that pool. After the first search attempt, country = country.intern ();
interns that String
in the pool; this second search most likely succeeds, depending on the name being searched. For example, java CS Argentina
produces the following output:
Argentina not found
Argentina found
The StringBuffer class
String
is not always the best choice for representing strings in a program. The reason: Its immutability causes String
methods, such as substring(int beginIndex, int endIndex)
, to create new String
objects, rather than modify the original String
objects. In many situations, that leads to unreferenced String
s that become eligible for garbage collection. When many unreferenced String
s are created within a long loop, overall heap memory reduces, and the garbage collector might need to perform many collections, which can affect a program’s performance, as the following code demonstrates:
String s = "abc";
String t = "def";
String u = "";
for (int i = 0; i < 100000; i++)
u = u.concat (s).concat (t);
u.concat (s)
creates a String
containing the u
-referenced String
‘s characters followed by the s
-referenced String
‘s characters. The new String
‘s reference subsequently returns and identifies a String
, named a
to prevent confusion, on which concat (t)
is called. The concat (t)
method call results in a new String
object, b
, that contains a
‘s characters followed by the t
-referenced String
‘s characters. a
is discarded (because its reference disappears) and b
‘s reference assigns to u
(which results in u
becoming eligible for garbage collection).
During each loop iteration, two String
s are discarded. By the loop’s end, assuming garbage collection has not occurred, 200,000 String
s that occupy around 2,000,000 bytes await garbage collection. If garbage collection occurs during the loop, this portion of a program’s execution takes longer to complete. That could prove problematic if the above code must complete within a limited time period. The StringBuffer
class solves this problem.
StringBuffer objects
In many ways, the java.lang.StringBuffer
class resembles its String
counterpart. For example, as with String
, a StringBuffer
object stores a character sequence in a character array that StringBuffer
‘s private value
field variable references. Also, StringBuffer
‘s private count
integer field variable records that array’s character number. Finally, both classes declare a few same-named methods with identical signatures, such as public int indexOf(String str)
.
Unlike String
objects, StringBuffer
objects represent mutable, or changeable, strings. As a result, a StringBuffer
method can modify a StringBuffer
object. If the modification produces more characters than value
can accommodate, the StringBuffer
object automatically creates a new value
array with double the capacity (plus two additional array elements) of the current value
array, and copies all characters from the old array to the new array. (After all, Java arrays have a fixed size.) Capacity represents the maximum number of characters a StringBuffer
‘s value
array can store.
Create a StringBuffer
object via any of the following constructors:
public StringBuffer()
creates a newStringBuffer
object that contains no characters but can contain up to 16 characters before automatically expanding.StringBuffer
has an initial capacity of 16 characters.public StringBuffer(int initCap)
creates a newStringBuffer
that contains no characters and up toinitCap
characters before automatically expanding. IfinitCap
is negative, this constructor throws aNegativeArraySizeException
object.StringBuffer
has an initial capacity ofinitCap
.public StringBuffer(String str)
creates a newStringBuffer
that contains all characters in thestr
-referencedString
and up to 16 additional characters before automatically expanding.StringBuffer
‘s initial capacity is the length ofstr
‘s string plus 16.
The following code fragment demonstrates all three constructors:
StringBuffer sb1 = new StringBuffer ();
StringBuffer sb2 = new StringBuffer (100);
StringBuffer sb3 = new StringBuffer ("JavaWorld");
StringBuffer sb1 = new StringBuffer ();
creates a StringBuffer
with no characters and an initial capacity of 16. StringBuffer sb2 = new StringBuffer (100);
creates a StringBuffer
with no characters and an initial capacity of 100. Finally, StringBuffer sb3 = new StringBuffer ("JavaWorld");
creates a StringBuffer
containing JavaWorld
and an initial capacity of 25.
StringBuffer method sampler
Since we already examined StringBuffer
‘s constructor methods, we now examine the nonconstructor methods. For brevity, I focus on only 13 methods.
Note |
---|
Like String , many StringBuffer methods require an index argument for accessing a character in the StringBuffer ‘s value array (or a character array argument). That index/offset is always zero-based. |
public StringBuffer append(char c)
appendsc
‘s character to the contents of the currentStringBuffer
‘svalue
array and returns a reference to the currentStringBuffer
. Example:StringBuffer sb = new StringBuffer ("abc"); sb.append ('d'); System.out.println (sb);
(output:abcd
).public StringBuffer append(String str)
appends thestr
-referencedString
‘s characters to the contents of the currentStringBuffer
‘svalue
array and returns a reference to the currentStringBuffer
. Example:StringBuffer sb = new StringBuffer ("First,"); sb.append (" second"); System.out.println (sb);
(output:First, second
).public int capacity()
returns the currentStringBuffer
‘s current capacity (that is,value
‘s length). Example:StringBuffer sb = new StringBuffer (); System.out.println (sb.capacity ());
(output:16
).public char charAt(int index)
extracts and returns the character at theindex
position in the currentStringBuffer
‘svalue
array. This method throws anIndexOutOfBoundsException
object ifindex
is negative, equals the string’s length, or exceeds that length. Example:StringBuffer sb = new StringBuffer ("Test string"); for (int i = 0; i < sb.length (); i++) System.out.print (sb.charAt (i));
(output:Test string
).public StringBuffer deleteCharAt(int index)
removes the character at theindex
position in the currentStringBuffer
‘svalue
array. Ifindex
is negative, equals the string’s length, or exceeds that length, this method throws aStringIndexOutOfBoundsException
object. Example:StringBuffer sb = new StringBuffer ("abc"); sb.deleteCharAt (1); System.out.println (sb);
(output:ac
).-
public void ensureCapacity(int minimumCapacity)
ensures the currentStringBuffer
‘s current capacity is larger thanminimumCapacity
and twice the current capacity. IfminimumCapacity
is negative, this method returns without doing anything. The following code demonstrates this method:StringBuffer sb = new StringBuffer ("abc"); System.out.println (sb.capacity ()); sb.ensureCapacity (20); System.out.println (sb.capacity ());
The fragment produces the following output:
19 40
-
Tip Because it takes time for a StringBuffer
to create a new character array and copy characters from the old array to the new array (during an expansion), useensureCapacity(int minimumCapacity)
to minimize expansions prior to entering a loop that appends many characters to aStringBuffer
. That improves performance. public StringBuffer insert(int offset, String str)
inserts thestr
-referencedString
‘s characters into the currentStringBuffer
beginning at the index thatoffset
identifies. Any characters starting atoffset
move upwards. Ifstr
contains a null reference, thenull
character sequence is inserted into theStringBuffer
. Example:StringBuffer sb = new StringBuffer ("ab"); sb.insert (1, "cd"); System.out.println (sb);
(output:acdb
).public int length()
returns the value stored incount
. In other words, this method returns a string’s length. If the string is empty,length()
returns 0. AStringBuffer
‘s length differs from its capacity; length specifiesvalue
‘s current character count, whereas capacity specifies the maximum number of characters that store in that array. Example:StringBuffer sb = new StringBuffer (); System.out.println (sb.length ());
(output:0
).public StringBuffer replace(int start, int end, String str)
replaces all characters in the currentStringBuffer
‘svalue
array that range between indexesstart
and one position less thanend
(inclusive) with characters from thestr
-referencedString
. This method throws aStringIndexOutOfBoundsException
object ifstart
is negative, exceeds thevalue
array’s length, or is greater thanend
. Example:StringBuffer sb = new StringBuffer ("abcdef"); sb.replace (0, 3, "x"); System.out.println (sb);
(output:xdef
).public StringBuffer reverse()
reverses the character sequence in the currentStringBuffer
‘svalue
array. Example:StringBuffer sb = new StringBuffer ("reverse this"); System.out.println (sb.reverse ());
(output:siht esrever
).public void setCharAt(int index, char c)
sets the character at positionindex
in the currentStringBuffer
‘svalue
array toc
‘s contents. Ifindex
is negative, equalsvalue
‘s length, or exceeds that length, this method throws anIndexOutOfBoundsException
object. Example:StringBuffer sb = new StringBuffer ("abc"); sb.setCharAt (0, 'd'); System.out.println (sb);
(output:dbc
).-
public void setLength(int newLength)
establishes a new length for the currentStringBuffer
‘svalue
array. Every character in that array located at an index less thannewLength
remains unchanged. IfnewLength
exceeds the current length, null characters append to the array beginning at thenewLength
index. If necessary,StringBuffer
expands by creating a newvalue
array of the appropriate length. This method throws anIndexOutOfBoundsException
object ifnewLength
is negative. The following fragment demonstrates this method:StringBuffer sb = new StringBuffer ("abc"); System.out.println (sb.capacity ()); System.out.println (sb.length ()); sb.setLength (100); System.out.println (sb.capacity ()); System.out.println (sb.length ()); System.out.println ("[" + sb + "]");
The fragment produces this output (in the last line, null characters, after
abc
, appear as spaces):19 3 100 100 [abc ]
-
public String toString()
creates a newString
object containing the same characters as the currentStringBuffer
‘svalue
array and returns a reference toString
. The following code demonstratestoString()
in a more efficient (and faster) alternative toString
‘sconcat(String str)
method for concatenating strings within a loop:String s = "abc"; String t = "def"; StringBuffer sb = new StringBuffer (2000000); for (int i = 0; i < 100000; i++) sb.append (s).append (t); String u = sb.toString (); sb = null; System.out.println (u);
As the output is large, I don’t include it here. Try converting this code into a program and compare its performance with the earlier
String s = "abc"; String t = "def"; String u = ""; for (int i = 0; i < 100000; i++) u = u.concat (s).concat (t);
code.
For a demonstration of StringBuffer
‘s append(String str)
and toString()
methods, and the StringBuffer()
constructor, examine Listing 4’s DigitsToWords
, which converts an integer value’s digits to its equivalent spelled-out form (for example, 10 verses ten):
Listing 4: DigitsToWords.java
// DigitsToWords.java
class DigitsToWords
{
public static void main (String [] args)
{
for (int i = 0; i < 10000; i++)
System.out.println (convertDigitsToWords (i));
}
static String convertDigitsToWords (int integer)
{
if (integer < 0 || integer > 9999)
throw new IllegalArgumentException ("Out of range: " + integer);
if (integer == 0)
return "zero";
String [] group1 =
{
"one",
"two",
"three",
"four",
"five",
"six",
"seven",
"eight",
"nine"
};
String [] group2 =
{
"ten",
"eleven",
"twelve",
"thirteen",
"fourteen",
"fifteen",
"sixteen",
"seventeen",
"eighteen",
"nineteen"
};
String [] group3 =
{
"twenty",
"thirty",
"fourty",
"fifty",
"sixty",
"seventy",
"eighty",
"ninety"
};
StringBuffer result = new StringBuffer ();
if (integer >= 1000)
{
int tmp = integer / 1000;
result.append (group1 [tmp - 1] + " thousand");
integer -= tmp * 1000;
if (integer == 0)
return result.toString ();
result.append (" ");
}
if (integer >= 100)
{
int tmp = integer / 100;
result.append (group1 [tmp - 1] + " hundred");
integer -= tmp * 100;
if (integer == 0)
return result.toString ();
result.append (" and ");
}
if (integer >= 10 && integer <= 19)
{
result.append (group2 [integer - 10]);
return result.toString ();
}
if (integer >= 20)
{
int tmp = integer / 10;
result.append (group3 [tmp - 2]);
integer -= tmp * 10;
if (integer == 0)
return result.toString ();
result.append ("-");
}
result.append (group1 [integer - 1]);
return result.toString ();
}
}
DigitsToWords
has a limit of 9,999; it cannot convert integer values that exceed 9,999. Below are the first 22 lines of output:
zero
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
For another practical illustration of StringBuffer
‘s append(String str)
method, as well as StringBuffer(int length)
, append(char c)
, and deleteCharAt(int index)
, I created an Editor
application that demonstrates a basic line-oriented text editor:
Listing 5: Editor.java
// Editor.java
import java.io.IOException;
class Editor
{
public static int MAXLINES = 100;
static int curline = -1; // Current line.
static int lastline = -1; // Last appended line index.
// The following array holds all lines of text. (Maximum is MAXLINES.)
static StringBuffer [] lines = new StringBuffer [MAXLINES];
static
{
// We assume 80-character lines. But who knows? Because StringBuffers
// dynamically expand, you could end up with some very long lines.
for (int i = 0; i < lines.length; i++)
lines [i] = new StringBuffer (80);
}
public static void main (String [] args)
{
do
{
// Prompt user to enter a command
System.out.print ("C: ");
// Obtain the command, and make sure there is no leading/trailing
// white space
String cmd = readString ().trim ();
// Process command
if (cmd.equalsIgnoreCase ("QUIT"))
break;
if (cmd.equalsIgnoreCase ("ADD"))
{
if (lastline == MAXLINES - 1)
{
System.out.println ("FULL");
continue;
}
String line = readString ();
lines [++lastline].append (line);
curline = lastline;
continue;
}
if (cmd.equalsIgnoreCase ("DELFCH"))
{
if (curline > -1 && lines [curline].length () > 0)
lines [curline].deleteCharAt (0);
continue;
}
if (cmd.equalsIgnoreCase ("DUMP"))
for (int i = 0; i <= lastline; i++)
System.out.println (i + ": " + lines [i]);
}
while (true);
}
static String readString ()
{
StringBuffer sb = new StringBuffer (80);
try
{
do
{
int ch = System.in.read ();
if (ch == 'n')
break;
sb.append ((char) ch);
}
while (true);
}
catch (IOException e)
{
}
return sb.toString ();
}
}
To see how Editor
works, type java Editor
. Here is one example of this program’s output:
C: add
some text
C: dump
0: some text
C: delfch
C: dump
0: ome text
C: quit
Among Editor
‘s various commands, add
appends a line of text to the StringBuffer
strings array, dump
dumps all lines to the standard output device, and delfch
removes the current line’s first character. Obviously, delfch
is not very useful: a better program would specify an index after the command name and delete the character at that index. However, before you can accomplish that task, you must learn about the StringTokenizer
class.
The StringTokenizer class
What do the Java compiler, a text-based adventure game, and a Linux shell program have in common? Each program contains code that extracts, from user-specified text, the fundamental character sequences, or tokens, such as identifiers and punctuation (compiler), game-play instructions (adventure game), or command name and arguments (Linux shell). Java accomplishes the token extraction process—known as string tokenizing because user-specified text exists as one or more character strings— via the StringTokenizer
class.
Unlike the frequently-used Character
, String
, and StringBuffer
language classes, the less-frequently-used StringTokenizer
utility class exists in package java.util
and requires an explicit import
directive to import that class into a program.
StringTokenizer objects
Before a program can extract tokens from a string, the program must create a StringTokenizer
object by calling one of the following constructors:
public StringTokenizer(String s),
which creates aStringTokenizer
that extracts tokens from thes
-referencedString
. Furthermore, the constructor specifies the space character (' '
), tab character ('t'
), new-line character ('n'
), carriage-return character ('r'
), and form-feed character ('f'
) as delimiters—characters that separate tokens from each other. Delimiters do not return as tokens.public StringTokenizer(String s, String delim),
which is identical to the previous constructor except you also specify a string of delimiter characters via thedelim
-referencedString
. During string tokenizing,StringTokenizer
ignores all delimiter characters as it searches for the next token’s beginning. Delimiters do not return as tokens.public StringTokenizer(String s, String delim, boolean returnDelim),
which resembles the previous constructors except you also specify whether delimiter characters should return as tokens. Delimiter characters return when you passtrue
toreturnDelim
.
Examine the following fragment to learn how these constructors create StringTokenizer
objects:
String s = "A sentence to tokenize.|A second sentence.";
StringTokenizer stok1 = new StringTokenizer (s);
StringTokenizer stok2 = new StringTokenizer (s, "|");
StringTokenizer stok3 = new StringTokenizer (s, " |", true);
stok1
references a StringTokenizer
that extracts tokens from the s
-referenced String
—and also recognizes space, tab, new-line, carriage-return, and form-feed characters as delimiters. stok2
references a StringTokenizer
that also extracts tokens from s
. This time, however, only a vertical bar character (|
) classifies as a delimiter. Finally, in the stok3
-referenced StringTokenizer
, the white space and vertical bar classify as delimiters and return as tokens. Now that these StringTokenizer
s exist, how do you extract tokens from their s
-referenced String
s? Let’s find out.
Token extraction
StringTokenizer
provides four methods for extracting tokens: public int countTokens()
, public boolean hasMoreTokens()
, public String nextToken()
, and public String nextToken(String delim)
. The countTokens()
method returns an integer containing a count of a string’s tokens. Use this return value to determine the maximum tokens to extract. However, you should call hasMoreTokens()
to determine when to end tokenizing because countTokens()
is undependable (as you will see). hasMoreTokens()
returns a Boolean true value if at least one more token exists to extract. Otherwise, that method returns false. Finally, the nextToken()
and nextToken(String delim)
methods return a String
‘s next token. But if no more tokens are available, either method throws a NoSuchElementException
object. nextToken()
and nextToken(String delim)
differ only in that nextToken(String delim)
lets you reset a StringTokenizer
‘s delimiter characters to those characters in the delim
-referenced String
. Given this information, the following code, which builds on the previous fragment, shows how to use the previous three StringTokenizer
s to extract a string’s tokens:
System.out.println ("count1 = " + stok1.countTokens ());
while (stok1.hasMoreTokens ())
System.out.println ("token = " + stok1.nextToken ());
System.out.println ("rncount2 = " + stok2.countTokens ());
while (stok2.hasMoreTokens ())
System.out.println ("token = " + stok2.nextToken ());
System.out.println ("rncount3 = " + stok3.countTokens ());
while (stok3.hasMoreTokens ())
System.out.println ("token = " + stok3.nextToken ());
The fragment above divides into three parts. The first part focuses on stok1
. After retrieving and printing a token count, a while
loop calls nextToken()
to extract all tokens if hasMoreTokens()
returns true. The second and third parts use identical logic for the other StringTokenizer
s. If you execute the code fragment, you observe the following output:
count1 = 6
token = A
token = sentence
token = to
token = tokenize.|A
token = second
token = sentence.
count2 = 2
token = A sentence to tokenize.
token = A second sentence.
count3 = 13
token = A
token =
token = sentence
token =
token = to
token =
token = tokenize.
token = |
token = A
token =
token = second
token =
token = sentence.
The output above reveals three different token counts for the same string. The counts differ because the sets of delimiters differ. For stok1
, the default delimiter set applies. For stok2
, only one delimiter is present: the vertical bar. stok3
records a space and a vertical bar as its delimiters. The output’s final portion reveals that the space and vertical bar delimiters return as tokens due to passing true
as returnDelim
‘s value in the stok3
call.
Earlier, I cautioned you against relying on countTokens()
for determining the number of tokens to extract. countTokens()
‘s return value is often meaningless when a program dynamically changes a StringTokenizer
‘s delimiters with a nextToken(String delim)
method call, as the following fragment demonstrates:
String record = "Ricard Santos,Box 99,'Sacramento,CA'";
StringTokenizer st = new StringTokenizer (record, ",");
int ntok = st.countTokens ();
System.out.println ("Number of tokens = " + ntok);
for (int i = 0; i < ntok; i++)
{
String token = st.nextToken ();
System.out.println (token);
if (token.startsWith ("Box"))
st.nextToken ("'"); // Throw away comma between Box 99 and
// 'Sacramento,CA'
}
The code creates a String
that simulates a database record. Within that record, commas delimit fields (record portions). Although there are four commas, only three fields exist: a name, a box number, and a city-state. A pair of single quotes surround the city-state field to indicate that the comma between Sacramento
and CA
is part of the field.
After creating a StringTokenizer
recognizing only comma characters as delimiters, the current thread counts the number of tokens, which subsequently print. The thread then uses that count to control the duration of the loop that extracts and prints tokens. When the Box 99
token returns, the thread executes st.nextToken ("'");
to change the delimiter from a comma to a single quote and discard the comma token between Box 99
and 'Sacramento,CA'
. The comma token returns because st.nextToken ("'");
first replaces the comma with a single quote before extracting the next token. The code produces this output:
Number of tokens = 4
Ricard Santos
Box 99
Sacramento,CA
Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:232)
at STDemo.main(STDemo.java:18)
The output indicates four tokens because three commas imply four tokens. But after displaying three tokens, a NoSuchElementException
object is thrown from st.nextToken ();
. The exception occurs because the program assumes that countTokens()
‘s return value indicates the exact number of tokens to extract. However, countTokens()
can only base its count on the current set of delimiters. Because the fragment changes those delimiters during the loop, via st.nextToken ("'");
, method countTokens()
‘s return value is no longer valid.
Caution |
---|
Do not use countTokens() ‘s return value to control a string tokenization loop’s duration if the loop changes the set of delimiters via a nextToken(String delim) method call. Failure to heed that advice often leads to one of the nextToken() methods throwing a NoSuchElementException object and the program terminating prematurely. |
For a practical demonstration of StringTokenizer
‘s methods, I created a PigLatin
application that translates English text to its pig Latin equivalent. For those unfamiliar with the pig Latin game, this coded language moves a word’s first letter to its end and then adds ay. For example: computer becomes omputercay; Java becomes Avajay, etc. Punctuation is not affected. Listing 6 presents PigLatin
‘s source code:
Listing 6: PigLatin.java
// PigLatin.java
import java.util.StringTokenizer;
class PigLatin
{
public static void main (String [] args)
{
if (args.length != 1)
{
System.err.println ("usage: java PigLatin phrase");
return;
}
StringTokenizer st = new StringTokenizer (args [0], " t:;,.-?!");
while (st.hasMoreTokens ())
{
StringBuffer sb = new StringBuffer (st.nextToken ());
sb.append (sb.charAt (0));
sb.append ("ay");
sb.deleteCharAt (0);
System.out.print (sb.toString () + " ");
}
System.out.print ("rn");
}
}
To see what Hello, world!
looks like in pig Latin, execute java PigLatin "Hello, world!"
. You see the following output:
elloHay orldWay
According to pig Latin’s rules, the output is not quite correct. First, the wrong letters are capitalized. Second, the punctuation is missing. The correct output is:
Ellohay, Orldway!
Use what you’ve learned in this article to fix those problems.
Review
Java’s Character
, String
, StringBuffer
, and StringTokenizer
classes support text-processing programs. Such programs use Character
to indirectly store char
variables in data structure objects and access a variety of character-oriented utility methods; use String
to represent and manipulate immutable strings; use StringBuffer
to represent and manipulate mutable strings; and use StringTokenizer
to extract a string’s tokens.
This article also cleared up three mysteries about strings. First, you saw how the compiler and classloader allow you to treat string literals (at the source-code level) as if they were String
objects. Thus, you can legally specify synchronized ("sync object")
in a multithreaded program requiring synchronization. Second, you learned why String
s are immutable, and how immutability works with internment to save heap memory when a program requires many strings and to allow fast string searches. Finally, you learned what happens when you use the string concatenation operator to concatenate strings and how StringBuffer
is involved in that task.
I encourage you to email me with any questions you might have involving either this or any previous article’s material. (Please keep such questions relevant to material discussed in this column’s articles.) Your questions and my answers will appear in the relevant study guides.
Next month, I will deviate from my roadmap and introduce you to the world of Java tools.