Walking a File Tree in Java – Part 2

Stylised image of a man walking along a road

Last week we looked at a file walking program using the Visitor design pattern.

The Visitor pattern represents an operation to be performed on each of the elements of a data structure. We can define new operations without changing the classes of the elements on which the visitor operates. In this way, the Visitor pattern lets us separate algorithms from the objects on which they operate.

Last week’s program used two primary classes. The walkFileTree() method of the Files class visited each file in the tree (the data structure). The CountFiles class defined the operations (the algorithm) to be run on each file (the element) visited. It was relatively easy to understand.

This week we’ll look at a stream-based approach to walking a file tree. The Streams API lets us work with sequences of elements, such as data from arrays and collections, in a whole new way.

Stream Revision

To do anything with a stream, we have to compose a stream pipeline. A stream pipeline consists of a source, zero or more intermediate operations, and a terminal operation. We can view a stream pipeline as a query on the stream source. An operation on a stream produces a result, but does not change its source.

  • The source could be an array, a collection, a generator function, lines of a file, random numbers, etc.

  • The intermediate operations transform the stream into another stream. We can use the filter() and map() methods for this, amongst others.

  • The terminal operation ends the stream computation. It can produce a result, e.g. arithmetic operations such as sum(), average() and count() methods. A terminal operation could also do something with each element by say using the forEach() method.

This pipeline is also often referred to a “filter-map-reduce” pipeline.

  • The filter() method takes a lambda expression as its argument. This Predicate lambda expression always returns a boolean value. This includes or excludes the processed element from the resulting Stream.

  • The map() method also takes a lambda expression as its argument. This can change each individual element in the stream. It returns a new Stream containing the changed elements.

  • A reduction operation allows us to compute a result using all the elements in the stream. Reduction operations are also called terminal operations because they are always present at the end of a pipeline. Simple reduction operations can be sum(), average() and count(). For more complex reduction operations, we can use the reduce() method.

Streams are lazy. The processing of the source data is only performed when the terminal operation starts. Source elements are consumed only if and when they are needed.

Collections and streams have some similarities, but they have very different usages. Collections contain data elements, while streams don’t. Collections efficiently manage and provide access to their elements. Streams don’t have any way to directly access or manipulate their elements.

We declaratively describe the source of the stream and the operations to be performed on it. Once we hit the terminal operation, the stream is processed and the result (if any) is returned. Each element is visited only once during the life of a stream. After that, the stream no longer exists. It must be re-created for another set of pipeline operations on the same source.

Methods of the Files Class

There are three methods in the java.nio.file.Files class that we could use to walk or list a directory structure:

  • Files.list(...) returns a Stream<Path> object. It does not recurse directories (it reads only the specified directory).
  • Files.walk(...) returns a Stream<Path> object. It can recurse directories to a specified maximum depth.
  • Files.walkFileTree(...) returns a Path object and takes a FileVisitor as a parameter. It can recurse directories to a specified maximum depth. We used this method last week.

There is a find() method that also returns a Stream<Path> object (i.e. a lazily populated Stream of Path entries). It walks the file tree in the same way as the walk() method does. If we have a lot of filtering to do when searching for a file based on its attributes and path, it may be more efficient than the walk() method. But it’s overkill for the example in this post.

Example

Let’s first do a simple comparison between the Files.list() and the Files.walk() methods:

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;

public class ListAndWalkTest {

    public static void main(String args[]) {

        final String startDir = "c://<path>";  // change this first!

        System.out.println("Starting Directory = " + startDir);
        final Path path = Paths.get(startDir);

        // maximum directory depth
        // if maxDepth is 0 only the starting file is visited
        final int maxDepth = 1;

        // ------------------------------------------------------------

        System.out.println();
        System.out.println("Files.list");  // list() is not recursive
        try {
            Files.list(path)
                 .forEach(p -> doSomething("list", p));
        } catch (IOException e) {
            log("list", e);
        }

        // ------------------------------------------------------------

        System.out.println();
        System.out.println("Files.walk");
        try {
            Files.walk(path, maxDepth)
                 .forEach(p -> doSomething("walk", p));
        } catch (IOException e) {
            log("walk", e);
        }

    } // end of main

    private static void doSomething(String label, Path path) {
        System.out.println(label + "\t: " + path);
    }

    private static void log(String label, IOException ex) {
        System.out.println(label + "\texception: " + ex);
    }

} // end of class

As can be seen, the walk() method takes a maximum depth parameter, while the list() method doesn’t. Otherwise they’re invoked in the same way. The walk() method can also follow symbolic links (see the API documentation).

To keep it simple, we haven’t done any filtering or mapping of the Stream pipelines.

If you want to run this code, first of all change the startDir to an appropriate directory on your computer (not "c://<path>"). You could also change it to System.getProperty("user.dir").

Experiment a bit more by changing the maxDepth variable to recursively walk deeper directories.

Summary

As we can see, the file walking approach using the Files.list() and Files.walk() methods is simple and easy to understand. Next week we’ll add some filtering and mapping to the Stream pipelines.

Have you written a file walker? Does this make it easier for the next time you need to write one? Please share your comments.

Stay safe and keep learning!

Leave a Comment

Your email address will not be published. Required fields are marked *

Code like a Java Guru!

Thank You

We're Excited!

Thank you for completing the form. We're excited that you have chosen to contact us about training. We will process the information as soon as we can, and we will do our best to contact you within 1 working day. (Please note that our offices are closed over weekends and public holidays.)

Don't Worry

Our privacy policy ensures your data is safe: Incus Data does not sell or otherwise distribute email addresses. We will not divulge your personal information to anyone unless specifically authorised by you.

If you need any further information, please contact us on tel: (27) 12-666-2020 or email info@incusdata.com

How can we help you?

Let us contact you about your training requirements. Just fill in a few details, and we’ll get right back to you.

Your Java tip is on its way!

Check that incusdata.com is an approved sender, so that your Java tips don’t land up in the spam folder.

Our privacy policy means your data is safe. You can unsubscribe from these tips at any time.