Last week we looked at a file walking program using the Visitor design pattern.
The Visitor pattern represents an operation to be performed on each of the elements of a data structure. We can define new operations without changing the classes of the elements on which the visitor operates. In this way, the Visitor pattern lets us separate algorithms from the objects on which they operate.
Last week’s program used two primary classes. The walkFileTree()
method of the Files
class visited each file in the tree (the data structure). The CountFiles
class defined the operations (the algorithm) to be run on each file (the element) visited. It was relatively easy to understand.
This week we’ll look at a stream-based approach to walking a file tree. The Streams API lets us work with sequences of elements, such as data from arrays and collections, in a whole new way.
Stream Revision
To do anything with a stream, we have to compose a stream pipeline. A stream pipeline consists of a source, zero or more intermediate operations, and a terminal operation. We can view a stream pipeline as a query on the stream source. An operation on a stream produces a result, but does not change its source.
-
The source could be an array, a collection, a generator function, lines of a file, random numbers, etc.
-
The intermediate operations transform the stream into another stream. We can use the
filter()
andmap()
methods for this, amongst others. -
The terminal operation ends the stream computation. It can produce a result, e.g. arithmetic operations such as
sum()
,average()
andcount()
methods. A terminal operation could also do something with each element by say using theforEach()
method.
This pipeline is also often referred to a “filter-map-reduce” pipeline.
-
The
filter()
method takes a lambda expression as its argument. ThisPredicate
lambda expression always returns aboolean
value. This includes or excludes the processed element from the resultingStream
. -
The
map()
method also takes a lambda expression as its argument. This can change each individual element in the stream. It returns a newStream
containing the changed elements. -
A reduction operation allows us to compute a result using all the elements in the stream. Reduction operations are also called terminal operations because they are always present at the end of a pipeline. Simple reduction operations can be
sum()
,average()
andcount()
. For more complex reduction operations, we can use thereduce()
method.
Streams are lazy. The processing of the source data is only performed when the terminal operation starts. Source elements are consumed only if and when they are needed.
Collections and streams have some similarities, but they have very different usages. Collections contain data elements, while streams don’t. Collections efficiently manage and provide access to their elements. Streams don’t have any way to directly access or manipulate their elements.
We declaratively describe the source of the stream and the operations to be performed on it. Once we hit the terminal operation, the stream is processed and the result (if any) is returned. Each element is visited only once during the life of a stream. After that, the stream no longer exists. It must be re-created for another set of pipeline operations on the same source.
Methods of the Files Class
There are three methods in the java.nio.file.Files
class that we could use to walk or list a directory structure:
Files.list(...)
returns aStream<Path>
object. It does not recurse directories (it reads only the specified directory).Files.walk(...)
returns aStream<Path>
object. It can recurse directories to a specified maximum depth.Files.walkFileTree(...)
returns aPath
object and takes aFileVisitor
as a parameter. It can recurse directories to a specified maximum depth. We used this method last week.
There is a find()
method that also returns a Stream<Path>
object (i.e. a lazily populated Stream
of Path
entries). It walks the file tree in the same way as the walk()
method does. If we have a lot of filtering to do when searching for a file based on its attributes and path, it may be more efficient than the walk()
method. But it’s overkill for the example in this post.
Example
Let’s first do a simple comparison between the Files.list()
and the Files.walk()
methods:
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
public class ListAndWalkTest {
public static void main(String args[]) {
final String startDir = "c://<path>"; // change this first!
System.out.println("Starting Directory = " + startDir);
final Path path = Paths.get(startDir);
// maximum directory depth
// if maxDepth is 0 only the starting file is visited
final int maxDepth = 1;
// ------------------------------------------------------------
System.out.println();
System.out.println("Files.list"); // list() is not recursive
try {
Files.list(path)
.forEach(p -> doSomething("list", p));
} catch (IOException e) {
log("list", e);
}
// ------------------------------------------------------------
System.out.println();
System.out.println("Files.walk");
try {
Files.walk(path, maxDepth)
.forEach(p -> doSomething("walk", p));
} catch (IOException e) {
log("walk", e);
}
} // end of main
private static void doSomething(String label, Path path) {
System.out.println(label + "\t: " + path);
}
private static void log(String label, IOException ex) {
System.out.println(label + "\texception: " + ex);
}
} // end of class
As can be seen, the walk()
method takes a maximum depth parameter, while the list()
method doesn’t. Otherwise they’re invoked in the same way. The walk()
method can also follow symbolic links (see the API documentation).
To keep it simple, we haven’t done any filtering or mapping of the Stream
pipelines.
If you want to run this code, first of all change the startDir
to an appropriate directory on your computer (not "c://<path>"
). You could also change it to System.getProperty("user.dir")
.
Experiment a bit more by changing the maxDepth
variable to recursively walk deeper directories.
Summary
As we can see, the file walking approach using the Files.list()
and Files.walk()
methods is simple and easy to understand. Next week we’ll add some filtering and mapping to the Stream
pipelines.
Have you written a file walker? Does this make it easier for the next time you need to write one? Please share your comments.
Stay safe and keep learning!