This is not a post about a new data structure--it's about one of the foundational structures of JavaScript: the object prototype. Unlike most languages, which create classes that are instantiated into full-blown objects, JavaScript uses a prototypal model of inheritance. Prototypes are easy to understand until they are not--I suspect what makes them difficult for most people is figuring out where that line is drawn. I know that for me, I still get bitten by their implications from time to time. Hence this post.
The basic idea behind prototypes is a simple one: an object is the sum of its own properties, plus any of the properties along its prototype chain (since, after all, the object's prototype could well have a prototype of its own). You might have, for example, a CatalogItem object in your library catalog software, and then a Periodical object might descend from it, adding magazine-specific information. This is mostly similar to class-based inheritance, but for two points. First, the prototype is not a "class," it's a real object. Second (and as a result), changing the prototype object also changes any descendant objects. As we'll see, combined with some of JavaScript's pass-by-value behavior, this has unintended consequences.
The first sticking point when using prototypes is that they're not inherited from the parent
object or assigned directly. You can't just say randomObject.prototype =
parentObject, sadly. An object gets its prototype from its constructor function, which
has a prototype property. So the typical pattern is:
var x = {
test: true;
}
var f = function() {};
f.prototype = x;
var y = new f(); //y.test == true
In this code, we create a parent object. Then, instead of directly inheriting from it, say
through new x() or something similar (because it's a real object, not a "class"),
we create a new function f to serve as our ad-hoc constructor, assign
x to the prototype of that function, and use the new keyword to ask
JavaScript to set up our prototype chain. y is now descended from x.
What does f have to do with anything? In this case, not much. When you run a
function as a constructor using the new keyword, JavaScript first creates a new
object using the function's prototype value. Then it runs that function with this
bound to the newly-created object for the purposes of initialization. The result is
something that looks sort-of like, but isn't really, Java inheritance.
var f = function() {
this.constructed = true;
};
f.prototype = {
test: true;
};
var y = new f(); //y.test and y.constructed are both true.
So now your prototype definition becomes an annoying two-step shuffle: first setting up your
initialization code inside the constructor, and then assigning the actual prototype
properties to the constructor function. ECMAScript 5, should it ever reach a reasonable
cross-section of browsers, adds the create method to Object.prototype, allowing
you to create descendants in only one step. But until then, most people who want to use
prototypal inheritance (and not fake class-based inheritance) probably use something like
the following function, which lets you easily create new objects from existing objects,
including an automatically-executed init function.
var chain = function(from, extension) {
//create constructor proxy
var f = function() {
//if an initializer was provided, call it
if (this.init) this.init();
};
f.prototype = from;
var r = new f(); //instantiate actual object
//copy in custom properties
if (extension) {
for (var key in extension) {
r[key] = extension[key];
}
}
//return the fully-constructed descendant
return r;
};
Yes, this is a bit confusing. It also breaks the instanceof operator, but let's
be clear: instanceof and typeof are basically broken in JavaScript
anyway, so it's really not much of a loss.
So let's say that you use this pattern, meaning that you don't write constructor functions per se. Instead, you write objects as maps of their distinct properties, and use your chain (or extend, in many frameworks) function to attach them to a prototype. Unfortunately, while this simplifies code, it makes it very easy to shoot yourself in the foot in actual use, because it masks where your object ends and the prototype begins.
As an example, take a simple collection MVC prototype I was writing the other day. It's an
object with several items inside an array property, plus a number of helper methods like
push and pop.
var Collection = chain({}, {
items: [],
push: function() {
this.items.push.apply(this.items, arguments);
}
});
Now I might decide to create a few new Collections from this prototype. So I use
chain as usual, basing each of my new objects on Collection. But the
results are... not what I would expect.
var list = chain(Collection);
var secondList = chain(Collection);
list.push(1, 2, 3);
console.log(secondList.items); //logs [1, 2, 3]--wait, what?
console.log(Collection.items); //logs [1, 2, 3]. Oh, crap.
It's one thing to be told that objects gain access to properties on their prototypes. It's another to watch it happen here by mistake, where referring to this.items actually alters the items on the prototype, and not on the new object. Those new objects don't actually have their own items property at all. This is much like the way JavaScript allows you to change the properties of objects in an outer scope when passed into a function (even though it's a pass-by-value language) because it's actually passing in the value of the object reference, not the object itself.
In other words, if you want to be able to create bound, non-primitive (Object/Array)
properties on an object in a prototype chain, you have to create them in the constructor
(or, in our case, in the pseudo-constructor). Adjusting our code to this pattern will work:
var Collection = chain({}, {
items: null,
init: function() {
this.items = [];
},
push: function() {
this.items.push.apply(this.items, arguments);
}
});
Now, instead of creating our property on the prototype, we assign it in the constructor,
making sure that it will get a fresh, empty value on each new object.
On reflection, this behavior is logical: we defined the array on the prototype, and since property access modifies JavaScript structures in-place, there's no reason that descendant objects would get their own versions instead of aliasing the prototype property. But it's still confusing, since modifying a primitive property (i.e., Numbers, Strings, and Booleans) does create an owned property on the descendant object. The result is also a frustrating mix of code, since some mutable properties are assigned directly on the prototype, but other defaults must be assigned in the constructor.
In general, I think this points to the basic problem with JavaScript's prototypes. It's not that they're a bad idea--several other languages have used them, including Lua--but that they're badly implemented. Unlike Lua, JavaScript doesn't provide a lot of options for examining or manipulating the prototype chain or doing meta-programming, and by attaching them to regular functions that are run as constructors (but only when preceded the new keyword), it turns a basic process into an accident waiting to happen. But then, that's JavaScript in a nutshell: a language "simple" enough to be written in ten days, resulting in a whole host of sharp edges waiting just out of sight.
A heap is a special kind of tree that's often used for sorting, to the point that it has its own name: the heapsort (instant demonstration opportunity). Even though as a description it's completely useless, "heap" is still a pretty great name for a data structure. A heap is also neat in that it can be implemented just using an array, instead of building a graph of interrelated node objects. After juggling references in our first two data structures, this should be a fun change of pace.
Note: if it wasn't obvious already, the fact that I've written "this should be a fun change of pace" about using a JavaScript array is a huge nerd alert for this blog.
So how does a heap work? Like any balanced tree, a heap is governed by rules, but in this case they're extremely simple. In fact, there's only one rule: each parent node's key must be greater than or equal to the keys of its child nodes (piece of cake). Because the rule only concerns children (regardless of which side they're on) and their immediate parent, our balancing operations are simplified to swaps between a child and a parent. Unlike AA trees or Red-Black trees, there aren't any rotations or branch shifts here.
These swaps are made easier by the fact that heaps are stored using plain arrays, with the position of each child determined via the array index. Basically, the first item in the array is the head node (with the greatest value), with its children stored in the next two slots, and their children in the next four, and so on. So for any given parent node at index n, its children are located at indexes 2n + 1 and 2n + 2. And for any child node c, its parent is located at (c - 1) / 2 rounded down to the nearest integer. You can simplify the math slightly if we start our array at 1 instead of 0, which the example below does. Here's the code for finding the children or parent of a given heap index in a 1-indexed array.
var leftOf = function(index) {
return (index * 2);
};
var rightOf = function(index) {
return (index * 2) + 1;
};
var parentOf = function(index) {
return Math.floor(index / 2);
};
How elegant is that--creating a tree out of a flat list and a little math? Just add some bounds checks to keep things safe (or, in JavaScript and other managed languages with dynamic arrays, check for undefined) and our data structure is literally complete. Now we just need to be able to insert and delete. To insert, we just push a value onto the end of the array (automatically making it the rightmost child on the bottom level) and then recursively "sift up" to balance.
Heap.prototype.insert = function(value) {
//push() returns the new length of the array
var index = this.push(value) - 1;
this.siftUp(index);
};
Heap.prototype.siftUp = function(index) {
//don't go past the root node
if (index <= 1) return;
var childKey = this[index];
var parentIndex = this.parentOf(index);
var parentKey = this[parentIndex];
//if the value is greater than its parent, swap
if (childKey > parentKey) {
var swap = this[parentIndex];
this[parentIndex] = this[index];
this[index] = swap;
}
this.siftUp(parentIndex); //recurse upward
return this;
};
Note: in this post, I'm just going to include real code from my implementation, since this has taken long enough to get out already and I don't feel like rewriting all the samples. It's still pretty easy to read. Just remember that in all these code blocks, this refers to the Heap object that we're traversing.
Inserting and siftUp() are pretty easy. Removal is more difficult, because it rebalances using a "sift down," and that process is a little weird. Sifting downward means that we find the parent of the last node (which, conveniently, is exactly in the middle of our array), and then (this is the weird part) we work our way backwards up the array linearly, while simultaneously checking for rule violations and recursing downward along the tree when they're found.
Like I said, it's strange. It may be easier to simply read the code.
Heap.prototype.remove = function(key) {
var index = 1;
//find the item to be deleted
for (; index < this.length; index++) {
var needle = this[index];
if (key == needle) break;
}
//swap it with the end-most child, just like an AA tree, then delete
this[index] = this[this.length - 1];
delete this[this.length - 1];
this.siftDown(index);
};
//The "termination" argument sets an internal endpoint for the sift.
//That will be important when we start doing heapsort.
Heap.prototype.siftDown = function(termination) {
//if no termination is provided, the heap spans the whole array
if (!termination || termination > this.length) termination = this.length - 1;
//start at the last non-leaf node, in the middle of the array
for (root = Math.floor(this.length / 2); root > 0; root--) {
var index = root;
while (index < termination) {
var children = [this.rightOf(index), this.leftOf(index)];
var swapIndex = index, swapKey = null;
//Find the bigger of the two children, since
//there's no point in swapping with the smaller child
for (var i = 0; i < 2; i++) {
var childIndex = children[i];
var childKey = this[childIndex];
if (!childKey || childIndex > termination) continue;
var swapKey = this[swapIndex];
if (childKey > swapKey) {
swapIndex = childIndex;
}
}
//if the child is larger, swap with the parent and then
//continue downward
if (swapIndex != index) {
var swap = this[index];
this[index] = this[swapIndex];
this[swapIndex] = swap;
index = swapIndex;
} else {
break;
}
}
}
return this;
};
I'll be honest, siftDown() was the most complicated part of this. This implementation is apparently more efficient than sifting upward, but it's much more difficult to understand, and it took me a long time to get right. But with a functioning downward sift, we can finally recreate the famed heapsort:
var heapSort = function(heap) {
//support using heapsort on Arrays as well
if (!(heap instanceof Heap)) {
var values = heap;
heap = new Heap();
heap.push.apply(heap, values);
}
//start with the heap taking up the whole array
var endAt = heap.length - 1;
while (endAt > 0) {
//construct the heap
heap.siftDown(endAt);
//move the largest value to the end
var swap = heap[endAt];
heap[endAt] = heap[1];
heap[1] = swap;
//shrink the heap and repeat
endAt--;
}
return heap.toArray();
};
It's surprisingly simple, right? Basically, we split the array into two parts--the working heap in front and the sorted values in back. We also keep track of how "long" the heap portion of the array is. Then we sort the heap, swap its last value with its first value (which moves the new largest value to the front of the sorted list), shrink the "heap length" value by one, and repeat until we reach the start of the array.
Besides getting the downward sift working properly, the next hardest part was re-learning to work with arrays as flat chunks of memory. I implemented my Heap objects as descendents of the native JavaScript Array object, because that seemed like a natural thing to do, but it means you can't call any of the methods the Heap inherits from Array: they'll return an Array instead of a Heap, and future calls to Heap instance methods will fail. So in my first heapsort, which used Array.slice() and Array.concat() to break apart and recombine both portions of the array, it kept breaking on the second pass when my heap stopped being a Heap. Not to mention it was more expensive, since I was constantly constructing new Array objects during each loop. By working strictly with array indexes, as if I were using a pointer, I didn't need to construct anything except for the temporary swap variable.
The cool thing about learning how heaps and heapsorts work is that it made me think about new ways to solve other problems, like (to pick an example) shuffling a deck of cards. Instead of a solution where I would keep track of which cards have been picked and then skip them as the shuffle proceeds, a heap-ish shuffle uses the back end of the deck as storage, and just keeps pulling random cards out of the shrinking front portion. It's much faster and less likely to blow up if your random number generator is not very random and/or lucky.
var deck = [];
for (var i = 1; i <= 52; i++) deck.push(i);
//deck = [1, 2, 3, 4 ...]
var shuffle = function(arr) {
var output = [], offset = arr.length - 1;
for (; offset >= 0; offset--) {
var index = Math.floor(Math.random() * offset);
output[offset] = arr[index];
var swap = arr[offset];
arr[offset] = arr[index];
arr[index] = swap;
}
return output;
}
shuffle(deck).sort();
//returns a randomized array
In the input box below, I've loaded the text of this post as a series of comma-separated values. When you press the "Run" button, those values will be pulled into an array and run through a heapsort, with the results displayed below. You can edit the input box between runs if you want to provide your own data (remember that spaces are significant). Note that although I think this heapsort should be fast for an all-JavaScript sort, it is slower than the browser's built-in Array.sort() method, probably because the latter is implemented in native code.
As always, feel free to read the commented Heap library code and the source for the example.
If there's one thing I can take away from this project, it's that Wikipedia is a great resource for lists of things and an absolutely miserable experience for actually learning about them. Trying to read the descriptions for most of the tree structures on Wikipedia is like attempting to decipher ancient Greek. And this is a shame, because trees--even special trees like Red-Black or AA trees--are really not that complicated.
It's no exaggeration to say that we are surrounded by data trees. HTML is a tree--elements contained in other elements, all the way up to the <html> node. Levels in Doom were a special kind of tree called a Binary Space Partition. Your hard drive, of course, is a tree of nested directories. But even outside of computer science, trees are often a natural conceptual framework for expressing certain kinds of collective relationships: the Dewey Decimal System, for example, or a business's org chart, or (near and dear to my heart) the sections and sub-sections of a piece of legislation.
So if we were implementing a very simple tree, one with no optimization whatsoever, it would be pretty easy. In fact, it's not far off from our linked list, but instead of each item containing a single reference to a descendent, it contains a collection of references to zero or more children, and (optionally) a single reference up to its parent.
For applications where semantic meaning is more important than raw speed, the number of children per node is pretty arbitrary. On your hard drive, for example, you probably have some folders that are many levels deep, and some that are very shallow but contain a lot of objects, because the priority is to organize them logically so that you and your programs can find them. But if you're building a tree for doing searches or sorting behind the scenes, you want the "branches" of your tree to be all the same length and spread pretty much evenly around the structure, so that navigating around the tree and down each branch takes a consistent amount of time.
In this post we're going to build one of these search-optimized trees--a so-called "self-balancing binary search tree." And the difference between those and a regular, ad-hoc tree is simply the addition of three constraints:
To navigate a tree built this way, you take your search value, and you compare it to the node at the root (which is traditionally at the top: binary trees grow down. Imagine they're in Australia, if that helps). If it's bigger than the value assigned to the root, you move to the child on the right side. If it's smaller, you move to the left. Then you do the comparison again, until you either hit the value you want, or you hit the end of the tree (where you may insert the value so you can find it next time). Here's an example of a node from one of these trees, followed by an algorithm for traversal.
var node = {
left: null, //reference to left-hand child (usually lesser)
right: null, //reference to right-hand child (usually greater)
key: 0, //the node key, used for lookup
data: null //the actual value of the node
}
//walk() recursively traverses a tree made of these nodes
//Then it renders it out visually to the console
function walk(tree, depth, side) { //depth and side are used for internal record-keeping
if (!depth) depth = 0;
if (!side) side = "";
var padding = "";
for (var i = 0; i < depth; i++) padding += "| "; //add spacing to the output
console.log(padding, side, tree.key, tree.data); //display the tree
//now we walk down the branches, if they exist
if (tree.left) walk(tree.left, depth + 1, 'left');
if (tree.right) walk(tree.right, depth + 1, 'right');
}
Recursion--creating functions that call themselves--is an essential strategy for being able to use trees, so if you're not very good with it, now would be a good time to study up.
Today we're going to implement an AA tree, which is named for its inventor, Arne Andersson (original paper available here), not for the support group ("Hi, I'm Thomas, and I'm a data structure." "Hi, Thomas!"). I picked the AA tree from all of the binary search trees because its rules for maintaining balance are extremely simple, and I didn't particularly want to spend a long time working through edge cases and chasing bugs. An AA tree is built by adding a level value to each node, and then following these five rules:
A skew is used to fix a violation of rule two, where the left child is at the same "level" as its parent. It rotates the tree so that the left child becomes the new parent, with the old parent as its right node and its old right node moved underneath. For once, Wikipedia's illustrations are actually pretty good:
And here's the code for skew():
function skew (branch) {
if (!branch || !branch.left) return branch; //basic input validation
if (branch.left.level >= branch.level) {
var swap = branch.left;
branch.left = swap.right;
swap.right = branch;
return swap;
}
return branch;
}
A split fixes violations of rule 4, where the rightmost grandchild of a node (so the right reference of its right reference) has the same level value. It makes the right child the new parent node, with its parent as the new left reference and its former left child takes its place as the right child node on the old parent. Again, from Wikipedia:
Here's our split() code:
function split(branch) {
//basic input validation
if (!branch || !branch.right || !branch.right.right) return branch;
if (branch.right.right.level >= branch.level) {
var swap = branch.right;
branch.right = swap.left;
swap.left = branch;
swap.level++;
return swap;
}
return branch;
}
It may help, when trying to figure out exactly what these functions are doing, to remember that a binary search tree is basically a number-line with a bunch of connections starting from an arbitrary middle node somewhere. Or, to put it another, way, if you squash a tree down onto a single "level", you get the following:
So all of these operations--skew, split, insertion, and deletion--are ways of changing the links between nodes without altering their horizontal order. The tree above, for example, could easily have 4 as its head node, linking from there to 2 and 7 (and from those to 1 and to 6 and 9, respectively), but the order of the nodes when flattened remains the same, and the same rules of traversal (go left if less, right if greater) still apply. This would make a pretty decent Professor Layton puzzle, actually.
Enough theory, let's start adding values. Whenever you do an insertion on the tree (deletions are more complicated), you start by traversing the structure to see if the value exists. If it doesn't, you add it at the end of a branch with a level of 1. Then work your way back up to the root on the back half of the recursion, rebalancing as you go with a skew and a split. You skew first, because the reverse makes it possible to end up in an invalid state and you'd have to perform the balancing operations all over again.
insert: function(key, data, branch) {
if (branch.key == key) {
branch.data = data;
return branch; //If we find a match for the key, just update in place
}
if (key < branch.key) {
if (branch.left) {
branch.left = insert(key, data, branch.left); //recurse to the left
} else {
branch.left = new AANode(key, data); //at the leaf, add the new node.
}
} else {
if (branch.right) {
branch.right = insert(key, data, branch.right); //recurse to the right
} else {
branch.right = new AANode(key, data); //at the leaf, add the new node.
}
}
branch = skew(branch);
branch = split(branch);
return branch;
}
Deletions, as I said, are more complicated, but they're not terribly hard to understand--the deletion part, at least. Once again, we walk the tree to find the value, but this time we hold onto it for a little bit while we go looking for its nearest key to swap into its place. We find that by walking down either the left or right branch (either will do) one step, then going the other direction until we reach a null (i.e., for the precursor value, we walk left once, and then repeatedly right). We swap the replacement node's data and key with those of the node to be deleted, and then prune the end of the branch.
remove: function(key, branch) {
if (!branch) return branch; //if we recurse past the end of the tree, return the null value.
if (key < branch.key) {
branch.left = remove(key, branch.left);
} else if (key > branch.key) {
branch.right = remove(key, branch.right);
} else {
//if this is a leaf node, we can just return "null" to the recursive assignment to remove it.
if (!branch.left && !branch.right) return null;
//we look for the "closest" key in value, located at the end of one of the child branches
var parent = branch;
if (branch.left) {
var replacement = branch.left;
while (replacement.right) {
parent = replacement;
replacement = replacement.right;
}
} else if (branch.right) {
var replacement = branch.right;
while (replacement.left) {
parent = replacement;
replacement = replacement.left;
}
}
//we swap the replacement key and data into the to-be-removed node
branch.key = replacement.key;
branch.data = replacement.data;
//then remove the replacement node from the tree, thus completing the "swap"
//we can't do this in the while loop above, because technically it could
//be either child, regardless of initial search direction.
if (parent.left == replacement) {
parent.left = null;
} else {
parent.right = null;
}
}
//decrease levels, in case they've gotten out of control
var minLevel = branch.left ? branch.left.level : branch.level;
minLevel = branch.right && branch.right.level + 1 < minLevel ?
branch.right.level + 1 :
minLevel;
if (minLevel < branch.level) {
branch.level = minLevel;
if (branch.right && minLevel < branch.right.level)
branch.right.level = minLevel;
}
//rebalance, using the sequence given in Andersson's paper.
branch = skew(branch);
branch.right = skew(branch.right);
if (branch.right) branch.right.right = skew(branch.right.right);
branch = split(branch);
branch.right = split(branch.right);
return branch;
}
It looks complicated, but it's really not that bad. The most confusing part is probably the swap in the middle, but if you remember the number-line version of the tree, all we're doing is switching the to-be-deleted node's position in the tree with it's neighbor, so the rules of left and right traversal remain valid. Then, because the neighbor was at the bottom of its branch, we can drop the now-swapped node from the tree without worrying about its children. The most conceptually-difficult part of the delete operation is the rebalancing at the end, but Andersson's paper provides a sequence of splits and skews that we can simply follow in order.
Implementing these trees in JavaScript was a challenge, not because they're hard, but because of a language quirk: JavaScript is strictly pass-by-value, except where the argument being passed in is an object, in which case what gets passed in is a reference to the object, still passed by value. In this, it's basically like Java. As a result, you can alter properties of the object because you have a reference to it, but you can't replace it or modify it wholesale, because that changes the value of the reference to something completely new, while leaving the old reference unchanged in the caller's scope. You can't, in other words, write a swap(a, b) function in JavaScript, which is a shame, since (you may have noticed) maintaining these trees requires a lot of swaps.
I started out trying to write my skew and split functions with reassignments done in the function scope, just as in Andersson's paper (which is written in Pascal-like syntax, and Pascal is a pass-by-reference language). But when the function exited, my node arrangements would be unchanged--or worse, would be altered in bizarre and unpredictable ways--because I was unknowingly operating on and assigning new references that only existed in the local scope. Changing my functions to expressly use return values eliminated the problem. It also made my code more verbose and difficult to read, in my opinion. But once I came to grips with correct JavaScript calling style, the rest fell into place pretty quickly. Deletion was the hardest part: I eventually went back to the original paper, re-read the prose description, and implemented a "clean room" version of it.
Maybe because you're not actually supposed to touch the inside of a tree as much as you do a list, this actually ended up being much simpler than the linked list from last post. There's more theory, but arguably less bookkeeping. As before, I wrote it as a functional library object, then wrapped it up in an object that maintains the references to the structure in working order. I like it, and could definitely see myself using these for something in JavaScript--as you're about to see.
This time, I want to demonstrate using this structure for a real purpose, not just a get/set interface. So here's what we're going to do: when you click the Start button below, a script will pull in the contents of this post and store each word and its frequency in an AA tree structure. Don't worry, it's surprisingly quick. At that point, you can start typing into the text box below. As you do, it'll search through the tree with the closest option (which returns the last non-null branch it finds) to autocomplete a word (shown in the "do you mean..." link). If it finds an exact match, or if you click the link to use the auto-completion, it'll tell you how many times I used that word. The tree structure should be fast enough to do autocompletion in realtime, which is pretty cool.
As always, the code for the example is here, and the underlying tree implementation is here, both with as many comments as my poor eyes could stand. If you're the adventuresome type, you can access the actual tree structure from your console via window.demos.aaTreeDemo (I recommend calling window.demos.aaTreeDemo.tree.render(), if you're curious about the resulting structure). Once you've prepped the tree, you can also see a simple ASCII-style rendering of its structure (with the direction, key, data, and level of each node) using the View Tree button, but be warned: it's long.
I first learned about linked lists when I was programming in C, probably in either PalmOS or WindowsCE. In older languages like C, you don't have dynamically-sized arrays, so linked lists are an efficient way to build a data structure if you don't know how many items it may eventually contain. They're very simple: each item in the list contains both the data for a single entry and a reference to the next item in the chain. On the last item, next is set to something that will evaluate to false, like null. I find it easiest to visualize this if I think of it as working with links in a chain. Here's a chain of two items:
var x = {
data: "X",
next: y //pointer to the next item.
}
var y = {
data: "Y",
next: null //y ends the list
}
Adding an item to a linked list is pretty easy. You just traverse to the end of the list by looping until next is null, then set it to be a reference to your new item.
function addToEnd(listNode, data) {
//iterate to the end of the list
while (listNode.next != null) {
listNode = listNode.next;
}
//add the data
listNode.next = {data: data, next: null};
}
addToEnd(x, "Z"); //results in a node-list of "X" → "Y" → "Z"
In fact, we can adapt this same process to insert arbitrary nodes: set the next reference on the "left side" of the split to the new item, and the new item's next to the "right side" of the split.
function insertAfter(node, data) {
var newNode = {};
newNode.data = data;
newNode.next = node.next; // Insert this node before the next item
node.next = newNode; //Set the old node to point to this new node
}
insertAfter(y, "A"); //"X" → "Y" → "A" → "Z"
And to delete items, you just change the reference on the previous item to point to a node after the deleted item.
//for these purposes, we'll search using a node reference, not a value.
function deleteNode(list, node) {
//search for the node in the list, retaining the previous node
var previous = null;
while (list != node) {
previous = list;
list = list.next;
}
if (list) { //make sure we actually found something
previous.next = list.next; //cut our item out of the chain
}
}
deleteNode(x, y); //"X" → "A" → "Z"
There are some drawbacks to this structure, which our deleteNode() function is making clear. Namely, because there's no 'index' of what we've placed in the linked list and we only store forward references in each node, to find anything in it we have to traverse through the entire chain of next references until we find what we're looking for. For this reason, linked lists probably work best if you're unlikely to need to search for a specific value, and instead you just want to consume the contents of the list.
Also, and this feels odd in a language without pointers, we have to remember that the "list reference" we're using is actually just a reference to the first item in the list, not to the list itself. If we accidentally set that variable to an entry further down the chain (say, we stored a reference to the "A" node in x), we'll effectively lop off the first set of entries. One way to help fix both these problems is to add a previous property to each node, so that we can traverse both forward and backwards. This is known as a "double-linked list," and it's actually easy and useful enough that it's my default implementation in the finished script for this entry. For example, here's a deletion function for nodes in a double-linked list:
function deleteFromDouble(node) {
node.prev.next = node.next; //set the previous node to refer to the next
node.next.prev = node.prev; //set the next node to refer to the previous
}
//we'll get rid of the "A" node by way of its reference in "X"
deleteFromDouble(x.next); //"X" → "Z"
Implementing linked lists in JavaScript is strange. It's not that they're hard to do--JavaScript references are close enough to pointers that it works fine, and garbage collection actually makes it a lot easier, since you don't have to deallocate items after removing them from the list. But a basic linked list is kind of an 'implicit' structure: if you break it apart, what you get is two lists, because the list is created by a series of references, not by a formal container object. I couldn't initially make up my mind where to put my helper functions: did I want to create a container for the list, similar to a jQuery or underscore.js object, and attach them there? Should each node inherit from a prototype, so that each one, wormlike, can be sliced away and treated as the head of a new list? Or should they be entirely structural, with functions taking a list argument and operating on that?
In the end, I went with options B and C. I created a node prototype that could be instantiated, but then I wrote its methods as functional constructs in a separate library, and used Function.call() to wrap them up. A theoretical user of this library can choose whether she wanted to work in object-oriented or functional styles, similar to underscore.js. I say theoretical, because in a modern langauge like JavaScript that has dynamic and sparse arrays, there's not really much point to using linked lists. I've heard of people using them in ActionScript because they can be strongly-typed for speed (as opposed to arrays, which are always loosely-typed), but even there the strong-typed Vector array class added in Flash 10 probably deprecates them.
One thing that I do like about linked lists in JavaScript is that they lend themselves well to functional programming. Here's the traverse() method, which calls a cursor function on each node, moving either forward or backward:
function traverse(node, cursor, direction) {
var direction = direction == 'back' ? 'prev' : 'next'; //direction is optional
cursor(node);
while (node[direction]) {
node = node[direction];
cursor(node);
}
}
var output = '';
traverse(a, function(item) {
output += item.data;
}); //output now equals "XZ"
As clever as that is, and it does have a certain intellectual appeal, I doubt I'll be using linked lists in my projects any time soon. Arrays in modern languages are pretty flexible, and they provide indexes for access in constant time without running over the entire chain. The one case in which linked lists are arguably superior--appending and removing items from the middle of the structure--is not something that's enough of a hassle to give up indexes, widespread library support, and easy serialization. On the other hand, they were a good warmup for the structures to come: next time, we'll be getting our green thumbs going with one of the tree structures.
Now we come to the show-and-tell section. The form below allows you to manipulate a linked list. On top, there are buttons for adding and deleting items in the list. Underneath that, there's a visualization of its current state--click an item to make it a target for the add/delete buttons. Finally, there's a log of the actions being taken whenever you change the list. You can also access the list from your browser console as demos.linkedListExample. Finally, here are the source files for the list implementation and the example.
I regret relatively little about not formally studying computer science. I enjoyed my classes in Communication much more than I probably would have liked rehashing Java against, and I've always heard that CS classes are relatively poor at preparing students for real-world programming tasks anyway. Also, it makes me feel slightly less like a giant nerd that I didn't actually study this stuff.
For a COMM major and a journalist, I think I'm a pretty good coder anyway. I understand pointers and recursion, which are big stumbling blocks for people. I can write in functional, procedural, or object-oriented styles. I can code a front end or a relational database query. But all that aside, I've always felt that the one thing I did miss out on, going to self-training route, was learning about data in a (pardon the pun) structured environment. These days, I consider data structures to be the distinguishing factor between a decent programmer and a great one. And while I know my way around the basics, there are a lot of patterns that I don't understand: the kinds of trees that are used in databases or complicated searches, for example, or anything that uses big O notation. I think it's time I figured that stuff out.
So I'm starting this microblog--as close as Blosxom gets to Tumblr, I suspect--as a place to work through as many items on Wikipedia's list of data structures as possible (skipping the boring ones and the minor derivations, of course). I'll be implementing them in JavaScript, for a few reasons: it's the language where I spend most of my time, it's the lingua franca of the web, and it will allow me to create little in-browser demonstrations. I'll talk about each structure, discuss how the language choice impacted my implementation, and link to the completed version so you can take a look for yourself. I'll be starting in the linear structures with linked lists, but I'll probably jump around quite a bit.
In the interests of keeping this lightweight, I'll probably host comments on Google+ as soon as I can figure out how to do so. Please feel free to join me in the experiment, make suggestions or criticisms, and ask questions.
--Thomas Wilburn