Study guide: Java’s character and assorted string classes support text-processing
Brush up on Java terms, learn tips and cautions, review homework assignments, and read Jeff’s answers to student questions
Glossary of terms
- capacity
- Length of the internal array in a data structure object, such as a
StringBuffer
object. - delimiters
- Characters that separate tokens.
- interns
- Contains.
- immutable
- Unchangeable.
- mutable
- Changeable.
- text
- Digits, letters, punctuation, words, sentences, and so on.
- wrapper class
- A class whose objects wrap themselves around primitive type variables for storing those variables in a data structure object that only stores objects—not variables.
Character
is an example.
Tips and cautions
These tips and cautions will help you write better programs and save you from agonizing over why the compiler produces error messages.
Tips
- To quicken string searches, use
intern()
to intern yourString
s in the common string memory pool. Because that pool contains no duplicateString
s, each object has a unique reference. Plus, using==
to compare references proves faster than using a method to compare a string’s characters. - Because it takes time for a
StringBuffer
to create a new character array and copy characters from the old array to the new array (during an expansion), useensureCapacity(int minimumCapacity)
to minimize expansions prior to entering a loop that appends many characters to aStringBuffer
. That improves performance.
Cautions
- Do not use either
concat(String str)
or the string concatenation operator in a loop that executes repeatedly; that can affect your program’s performance. (Each approach creates several objects—which can increase garbage collections—and several methods are called behind the scenes.) - Confusing
length()
withlength
leads to compiler errors.length()
is a method that returns the current number of characters in aString
‘svalue
array, whereaslength
is a read-only array field that returns the maximum number of elements in an array. HexDec
includes the expressioni < s.length ()
in itsfor
loop header. For long loops, do not calllength()
in afor
loop header because of method call overhead (which can affect performance). Instead, call that method and save its return value before entering the loop, and then use the saved value in the loop header. Example:int len = s.length (); for (int i = 0; i < len; i++)
. For toy programs, likeHexDec
, it doesn’t matter if I calllength()
in thefor
loop header. But for professional programs where performance matters, every time-saving trick helps. (Some Java compilers perform this optimization for you.)- Do not use
countTokens()
‘s return value to control a string tokenization loop’s duration if the loop changes the set of delimiters via anextToken(String delim)
method call. Failure to heed that advice often leads to one of thenextToken()
methods throwing aNoSuchElementException
object and the program terminating prematurely.
Miscellaneous notes and thoughts
Subsequent to the publication of my article on character and string classes, a problem with the StringTokenizer
class was brought to my attention. The problem deals with the String delim
parameter that is part of two StringTokenizer
constructors. To many developers, that parameter’s String
type suggests that StringTokenizer
recognizes multicharacter delimiters (such as ###
). Instead, StringTokenizer
interprets that parameter as a set of one-character delimiters. This confusion over what delim
means would disappear if delim
had character array type (char []
). For more information on the problem and a multicharacter delimiter solution, read “Steer Clear of Java Pitfalls,” Michael Daconta (JavaWorld, September 2000).
Reader questions
Find out what questions your fellow readers are asking and my answers to those questions.
Jeff,
When you need to create strings dynamically, what should you use performance wise, String
or StringBuffer
? I thought StringBuffer
performs better than String
. So I always start with a StringBuffer
, and after completing the String
, return a buffer.toString()
.
Michel
Michel, The answer to your question depends on the nature of the string. If it is immutable (that is, unchangeable), use
String
. You could use
StringBuffer
, but doing so would allow code to directly change the immutable string’s characters, and the string would no longer be immutable. Similarly, if the string is mutable, use
StringBuffer
. You could use
String
, but then you would end up creating additional
String
objects during string modification—because
String
methods that could potentially modify a string create additional
String
objects that contain modified strings. Eventually, the garbage collector would see the additional (and probably unreferenced)
String
s and perform a collection—possibly affecting the program’s performance.
StringBuffer
does not suffer from that performance problem because it does not create additional
StringBuffer
(or
String
) objects. Additional
String
s that arise from using
String
(instead of
StringBuffer
) to represent a mutable string are one performance problem. A second problem, which can prove just as serious, could occur when you use
StringBuffer
to represent a mutable string.
StringBuffer
‘s character array (which holds a mutable string) has finite length. Any modification that results in a mutable string whose length exceeds the character array’s length causes
StringBuffer
to create a new character array of appropriate length, copy characters from the old character array to the new character array, and erase the reference to the old character array (making that character array eligible for garbage collection). Because array creation, array copying, and garbage collection take time, how do you solve this potential performance problem? Either create a
StringBuffer
object with large enough initial capacity—character array length—or call
StringBuffer
‘s
ensureCapacity()
method to set an appropriate character array length prior to changing the array. That way, you minimize the number of extra activities. Both performance problems manifest themselves during looped string concatenation. Consider the following code fragment:
String s = "a";
for (int i = 0; i < 2000; i++)
s = s + "b";.
The code fragment translates into this byte code equivalent:
String s = "a";
for (int i = 0; i < 2000; i++)
s = new StringBuffer ().append (s).append ("b").toString ();
The code fragment above creates a
StringBuffer
and a
String
(via
toString()
) during each loop iteration. These objects are temporary and disappear after each loop iteration (although the last-created
String
is still referenced after the loop completes). Eventually, the garbage collector will probably run. How do you solve this potential performance problem? Consider the following code fragment:
String s = "a";
StringBuffer sb = new StringBuffer (2500); // Assume a maximum character array length of 2500 characters.
sb.append (s);
for (int i = 0; i < 2000; i++)
sb.append ("b");
s = sb.toString ();
The code fragment does not create any
StringBuffer
or
String
objects during the loop. Therefore, the potential for garbage collection is quite low. (Garbage collection can still occur because the garbage collector thread runs at various times and there may be unreferenced objects from previously-executed code to collect.) To sum up, understanding whether strings should be immutable or mutable will lead you to select the appropriate
String
/
StringBuffer
classes, which benefits performance. Furthermore, performance improves when you set an appropriate
StringBuffer
capacity prior to making many modifications and use care when dealing with looped string concatenation. Jeff
Homework
- Why does Java require a
Character
class? - Why does Java regard string literals as
String
objects? -
Enhance the
Editor
application with the following capabilities:- Rename
DELFCH
toDELCH
and modify that command to take a single integer argument identifying the zero-based index of the character to delete in the current line. Example:delch 2
deletes the current line’s third character. Provide appropriate error checking to warn users when they specify an invalid index (orcurline
contains -1, indicating no lines of text). If no more characters are in the current line, delete it and updatecurline
as appropriate. - Create a
DEL
command that deletes the current line. Use error checking to deal with the situation whencurline
contains -1. Updatecurline
as appropriate. - Create a
REPL
command that replaces all occurrences of a specific character in the current line with another character. Two character arguments should followREPL
: the first argument identifies the character to replace, and the second argument identifies its replacement character. Example:repl # *
replaces all occurrences of#
with*
. Use appropriate error checking in case no current line exists (i.e.,curline
contains -1). - Create a
SETCL
command that takes a single zero-based integer argument and setscurline
to that value. Use appropriate error checking in case the value is out of range orcurline
contains -1. - If you feel ambitious, include
LOAD
andSAVE
commands that let you load the contents of arbitrary text files and save the current text to a specific text file. What sort of error checking will you need?
- Rename
Answers to last month’s homework
Last month, I asked you answer some questions and create a package. My answers appear in red.
- What is the unnamed package?
-
The unnamed package is the package to which a source file’s classes/interfaces belong when their source file lacks a
package
directive. From an implementation perspective, the unnamed package corresponds to whatever directory is current when you invoke thejava
command. - What is the purpose of
classpath
? -
classpath
is an environment variable that helps the JVM’s classloader locate class and jar files. - Create a
shapes
package with classesPoint
,Circle
,Rectangle
, andSquare
. Of those classes, ensure thatPoint
is the only class not accessible outside its package. Use implementation inheritance to deriveCircle
fromPoint
andRectangle
fromSquare
. Provide anArea
interface with adouble getArea()
method that returns the area of aCircle
, aSquare
, or aRectangle
. Once you finish creating the package, create aTestShapes
program that imports class and interface names fromshapes
, creates objects from shape classes, and computes the area of the shape each object represents. After compiling and runningTestShapes
(successfully), moveshapes
to another location on your hard drive and changeclasspath
so that a second attempt to runTestShapes
results in the same output as the previous run. -
Complete the following steps:
- Ensure no
classpath
environment variable exists. - Create a
shapes
directory. -
Copy the following source code into an
Area.java
file that appears inshapes
:// Area.java package shapes; public interface Area { double getArea (); }
-
Copy the following source code into a
Circle.java
file that appears inshapes
:// Circle.java package shapes; public class Circle extends Point implements Area { private int radius; public Circle (int x, int y, int radius) { super (x, y); this.radius = radius; } // Why do I need to redeclare getX () and getY ()? Hint: Comment // out both methods and try to call them from TestShapes. public int getX () { return super.getX (); } public int getY () { return super.getY (); } public int getRadius () { return radius; } public double getArea () { return 3.14159 * radius * radius; } }
-
Copy the following source code into a
Point.java
file that appears inshapes
:// Point.java package shapes; class Point { private int x, y; Point (int x, int y) { this.x = x; this.y = y; } int getX () { return x; } int getY () { return y; } }
-
Copy the following source code into a
Rectangle.java
file that appears inshapes
:// Rectangle.java package shapes; public class Rectangle extends Square { private int length; public Rectangle (int width, int length) { super (width); this.length = length; } public int getLength () { return length; } public double getArea () { return getWidth () * length; } }
-
Copy the following source code into a
Square.java
file that appears inshapes
:// Square.java package shapes; public class Square implements Area { private int width; public Square (int width) { this.width = width; } public int getWidth () { return width; } public double getArea () { return width * width; } }
-
Copy the following source code into a
TestShapes.java
file that appears inshapes
‘s parent directory:// TestShapes.java import shapes.*; class TestShapes { public static void main (String [] args) { Area [] a = { new Circle (10, 10, 20), new Square (5), new Rectangle (10, 15) }; for (int i = 0; i < a.length; i++) System.out.println (a [i].getArea ()); } }
- Assuming the directory that contains
TestShapes.java
is the current directory, executejavac TestShapes.java
to compileTestShapes.java
and all files in theshapes
directory. Then executejava TestShapes
to run this application. - Move
shapes
to another directory and setclasspath
to refer to that directory and the current directory. For example, under Windows,move shapes temp
movesshapes
into thetemp
directory.set classpath=temp;.
pointsclasspath
to thetemp
directory (just below the root directory) and current directory sojava TestShapes
still runs.
- Ensure no