Trees and nodes

Nodes

In computer science, a tree is a hierarchical data structure used to organize information in a way that makes it easy to find related items. We will create a tree with nodes to store some data.

Other languages

Before we go on, here's some tips for users of other programming languages this year:

  • Rust should be able to follow the same or similar steps using optional values.
  • Java, C#, and C++ might need to simulate the absence/presence of data using arrays. You might also wish to search online to see how nodes are made in your language. Either way, you will still need to complete the same task as set below.

A node represents an element or value within the tree, and each node can have one parent node (its direct ancestor) and zero or more child nodes (its descendants).

The top-most node in the tree is called the root node, and the nodes at the bottom of the tree are called leaf nodes. Trees can be used to represent hierarchical relationships between items, such as a family tree, organizational chart, or file system directory structure.

Tree with nodes

Let's create a file system

The file system on your computer is a tree. It starts with the root directory (on Windows, C:\; on macOS or Linux, /). Let's recreate the general structure of one.

First, before we even create the tree, let's create the constituent nodes. The nodes of a file system are files and folders. Since there are two types of nodes, let's create structs for them.

struct File {
    /// The name of the file.
    var name: String
}

struct Folder {
    /// The name of the folder.
    var name: String

    /// The files contained in the folder.
    var files: [File] = []
}

You will notice that I have created an empty array already for files. This is so that when I create a Folder object, I don't have to specify a list of files — although, I could if I wanted to!

Next, let's create a folder and a file. Do not add the file to the folder yet — create the Folder first.

var folder = Folder(name: "My Folder")
var file = File(name: "README.txt")  // Notice no files array needed here.

Next, let's add the file to the folder.

folder.files.append(file)

Simple enough!

Adding behaviours

Let's give the files and folders the ability to represent their own path. You may be familiar with paths like C:\Users\yourname\Downloads\somefile.exe from Windows or /home/yourname/Downloads/somefile.flatpak from Linux.

We will create Linux-style paths where the folder path starts with /. For example, the path to our file will be /My Folder/README.txt.

To do this, we will make the structs conform to CustomStringConvertible. Let's start with Folder.

struct Folder : CustomStringConvertible {
    // Existing members of the struct here.
    // ...

    var description: String {
        return "/" + name
    }
}

print(folder)

This should print /My Folder, which is great.

Let's make File also conform to CustomStringConvertible:

struct File : CustomStringConvertible {
    // Existing members of the struct here.
    // ...

    var description: String {
        return "/" + name
    }
}

print(file)

This will return /README.txt … but that's not sufficient. We want to know the name of the folder that contains the file! The full path should be /My folder/README.txt.

To make this possible, the child/leaf (file) needs to be aware of its parent (folder) — it can't print its parent folder unless it knows what it is.

Let's add the concept of a parent folder to the File struct. This can be used to show the name of the parent.

struct File : CustomStringConvertible {
    var name: String
    var parentFolder: Folder?

    var description: String {
        guard let parentFolder else {
            return name
        }

        return "\(parentFolder)/\(name)"
    }
}

Notice that description now uses the parent folder's description in itself. The parent nodes will also use their parent folder's description to get their own name, and so on and so forth until you reach the root node which, lacking a parent, simply gives its name and nothing else.

When you combine all of these components together, you create a single, long path starting with the root node and ending with whichever object you printed.

This is an example of recursion, where a method (or in our case, a computed variable; same thing) calls itself until a defined scenario where it will stop doing so.

In this case, description gets used by each node, starting with the current one and going up through its parents until it reaches the root node. As the root node has no parent, the recursion ends.

Recursion in itself, even not using object-oriented programming, can be counted as a complex programming technique. However, it will not be taught in this unit.

Learn more about recursion in Swift.

As you can see, parentFolder is optional. That's because the file might not be contained in a folder — it might be the root node! This is why we need to use guard let to ensure the parent folder exists.

Of course, a file system that can only contain a single file might seem ludicrous to you. However, early file systems in the '70s and '80s had stranger limitations than that!

Adding more folders

Let's add a folder to contain the one we already have. This will be our top-level folder, right at the front of the path — the new root node.

Before we do that, we have a problem that needs solving: currently, a Folder can only contain Files.

We need to be able to have a general type to treat File and Folder the same. Using polymorphism, let's create a protocol that can describe both types called FilesystemNode.

A file system node should have a name, just like any file or folder. It should also have the concept of a parent node, so that a file can be contained in a folder — but so can another folder!

Since both files and folders will be able to contain parent nodes, let's also move the implementation of CustomStringConvertible from File to an extension.

protocol FilesystemNode : CustomStringConvertible {
    var name: String { get set }
    var parentNode: FilesystemNode? { get set }
}

extension FilesystemNode {
    var description: String {
        guard let parentNode = parentNode else {
            return name
        }

        return "\(parentFolder)/\(name)"
    }
}

Let's adjust the File and Folder structs accordingly:

struct File : FilesystemNode {
    var name: String
    var parentNode: FilesystemNode?  // Changed from Folder?
}

struct Folder : FilesystemNode {
    var name: String
    var parentNode: FilesystemNode?  // Just like File.
    var children: [FilesystemNode] = []  // Changed from var files: [File]
}

With those adjusted, let's create the new root folder. Let's call it "C Drive".:

var root = Folder(name: "C Drive")
var folder = Folder(name: "My Folder")
var file = File(name: "README.txt")

// Add My Folder to the C Folder.
root.children.append(folder)

// Add README.txt to My Folder, inside the C Drive.
folder.children.append(file)

Finally, let's try printing the file object to see its description:

print(file)

README.txt

Making files aware of their parent

Unfortunately, the file object isn't aware of its parent so it doesn't print anything other than "README.txt" when printed.

That is to say, its parent nodes are aware of their children but the relationship doesn't happen in reverse. Without establishing that relationship, any given node cannot tell you its full path (including the parent and root nodes).

To facilitate this, we will create a method in the Folder class to add child nodes (i.e. files and folders). This will make use of a new feature: inout variables.

struct Folder : FilesystemNode {
    // Variables...
    // ...

    mutating func add(_ child: inout some FilesystemNode) {
        child.parentNode = self
        children.append(child)
    }
}

This new method is mutating because it will modify the data in the object (i.e. it adds to the children array); we've seen that before so it should be fairly straight-forward.

What's new is the inout keyword.

inout

Normally, when you pass an argument to a Swift function, that value is treated as a constant within the method body: it cannot be modified, only used.

However, using inout tells Swift to make a new, mutable copy of that passed-in value. This allows us to modify the data before using it.

In this case, we are modifying the child.parentNode value and setting it to self. This means that the child's parent will become the FilesystemNode-conforming object whose add(_:) method was called; in this case, the Folder.

When you call a method with an inout parameter, its value must be passed with an & ampersand prepended (put at the start) of its name. Therefore, adding nodes to folders now looks like this:

// Add My Folder to the C Folder.
root.add(&folder)  // Changed from root.children.append

// Add README.txt to My Folder, inside the C Drive.
folder.add(&file)  // Changed from folder.children.append

some

What about the some keyword after inout?

That is more Swift safety at play. If we leave the some out, Swift infers that the child parameter must be any FilesystemNode; that is any object that conforms to the FilesystemNode protocol.

However, it means that at any given time the method is run, Swift has to do some runtime checks (while the program is running) to ensure the correct conformance.

As part of being able to do this, Swift will require you to declare the root, folder, and file variables generically as any FilesystemNode as well. This would look like:

//        Note the inserted `any FilesystemNode`
var root: any FilesystemNode = Folder(name: "C Drive")
var folder: any FilesystemNode = Folder(name: "My Folder")
var file: any FilesystemNode = File(name: "README.txt")

The problem here is that since Swift will now only treat these objects generically, there isn't a convenient way to call the Folder objects' add(_:) method: as far as Swift is concerned, they're not Folders, they're any FilesystemNode.

The some keyword in add(_:)'s method signature helps Swift to know that, at any given time that add(_:) is called, it will only be dealing with one kind of object that conforms to FilesystemNode.

Therefore, Swift can 'imagine' that this call:

folder.add(file)

is calling a method with this signature:

//                               Note: File instead of FilesystemNode
mutating func add(_ child: inout File)

instead of:

mutating func add(_ child: inout any FilesystemNode)

Swift, satisfied with this, will not need to do any runtime checks. Thus:

  • Swift will not require us to treat our objects generically as any FilesystemNode to be able to pass them to the method
  • we can call the Folder object's add(_:) method without too much hassle

Summary

Make sure that you have completed the file system activities above. You should have:

  1. a FilesystemNode protocol, describing what a filesystem node contains (i.e. a name, an optional parent)
  2. an extension to FilesystemNode so that it conforms to CustomStringConvertible
  3. a Folder struct conforming to FilesystemNode, which additionally contains child nodes (children) and can add items to itself (add(_:))
  4. a File struct conforming to FilesystemNode
  5. a selection of Folder and File objects

Searching Trees

The current system works well when working from the nodes down to the root of the tree.

What happens if we want to search through the tree to find a node?

In our current system, if we were to implement a search function for our FilesystemNode

func search(byName name: String) -> FilesystemNode?

In our Folder struct it could look something like this.

func search(byName name: String) -> FilesystemNode? {
    if self.name == name {
        return self
    }
    for child in children {
        if let foundFileNode = child.search(byName: name) {
            return foundFileNode
        }
    } 

    return nil
}

And a search function in our File struct

func search(byName name: String) -> FilesystemNode? {
    if self.name == name {
        return self
    } else {
        return nil
    }
}

If you were to try and run this code on the root folder, it won't find the file

if let foundFileNode = root.search(byName: "README.txt") {
    print(foundFileNode)
} else {
    print("Couldn't find file")
}

The code would output "Couldn't find file".

This is because when we added our folder to the root folder, we did this before we added our file to our folder, so the copy of folder that exists in the children of root is one without any children.

We could rewrite this so that we add file to folder before adding folder to root.

folder.add(&file)
root.add(&folder)

But if we try print out file we find that now our folder doesn't know that the parent is root. My Folder/README.txt

This is because our structs are treated as value types where a new copy of the object is made when it is passed into another variable.

So what is the solution?

Classes

Classes are reference types this means when they are passed into another variable, for example a list, they are still the same object in memory.

Syntax

Class syntax is as follows

class Folder: FileSystemNode {
    var name: String
    var parentNode: FileSystemNode?
    var children: [FileSystemNode]
    init(name: String) {
        self.name = name
        children = []
    }
    func add(_ child: inout some FileSystemNode) {
        child.parentNode = self
        children.append(child)
    }
}

init

You may have spotted the new function in the class init(). Classes are required to have an initialiser to create the object.

You may have noticed that nothing else in this code has changed.

What this means for our code is that we can enter each of the nodes in any order.

Now if we use our search function it should work as intented.

📝 Task

Recreate the following folder structures…

TBD