Using tables to store data in a database is so common many developers often apply the same relational-database-voodoo solutions and focus on code architecture instead. It’s a convenient trap one might fall into when deadlines are tough or little attention is given to the type of information being stored. It’s amazing how little graphs are considered as a solution to represent data, often perceived as too complex to translate into code.

Ultimately the persistence of graph data translates into a table but the way you handle your data is completely different.

The application of graphs in programming is **huge**, more than the typical minimum distance problems!

### Basics

It is well worth to explore the mathematical definitions but I won’t be covering so deeply every concept. Instead I will present a few simple examples that illustrate how much you can do with so little.

First of all it’s important to understand what I mean by node graph:

Notice the example above is an **undirected graph**, this means it’s possible to traverse the graph starting on **A** all the way to **B** and vice-versa. This wouldn’t be the case if we had a **directed graph** instead:

In this case we have the same graph but once we go from **A** to **B** we can’t go back! Practically this can be translated to:

graph -> getEdge('A','B'); //true graph -> getEdge('B','A'); //undefined

With **direction** we can define **path constraints**!

#### Distance

Say you’d like to add a value by going through an edge, in that case you are adding a **distance** or simply a **label**. With labels you can describe an **operation**:

graph -> getEdge('A','B'); //+ graph -> getEdge('B','A'); //+ graph -> getEdge('B','C'); //- graph -> getEdge('C','B'); //-

…similarly with distance values you can add **rates**:

graph -> getEdge('A','B'); //0.5 graph -> getEdge('B','A'); //0.5 graph -> getEdge('B','C'); //1.2 graph -> getEdge('C','B'); //1.2

#### Multigraphs

Multigraphs offer a more broad range of solutions for complex graphs. We simply allow nodes to have more than one edge connecting the same nodes, how? You add a name to the edge of course.

graph -> getEdge('A','B', 'input'); //ioread graph -> getEdge('A','B', 'output'); //iowrite graph -> getEdge('B','C', 'input'); //fileread graph -> getEdge('B','C', 'output'); //filewrite

This example is merely illustrative, my point being: if you’re not used to graph theory I hope by now you start understanding the potential of what you can do if you own a good **graph library**.

It’s amazing the amount of times I’ve seen business logic **hammered** programatically. With graphs you can **hide complexity** and apply clear **business logic** loaded from a database (for example) promoting cleaner code instead of applying long, nested **if** blocks or unending configuration files.

### Concrete example

Before yawns are had I’ll play a bit with an example and for the sake of it I’ll use javascript with the cool public graphlib.

Say we have a worldwide online shop with multiple outposts (1 per continent) and we want to access shipping costs of a package. Before shipping to the customers’ address the package has to first be shipped along to the nearest outpost. The scenario would look something like this:

Let also add some prices per trip:

From | To | Value |
---|---|---|

CE | NA | 14.85 $ |

CE | SA | 13.15 $ |

CE | AF | 6.20 $ |

CE | AS | 23.60 $ |

SA | AF | 13.15 $ |

AS | AU | 9.65 $ |

… this would be a bit boring so lets add a special constraint to spice it up. Our outpost in Africa managed to negotiate a great discount, valid only when shipping to North America. The shipping path stays the same but now we have a price exception.

From | To | Value |
---|---|---|

AF | NA | 13.15 $ |

The problem with the exception is that if we wanted to retrieve the shortest path **and** the lowest cost possible it wouldn’t work with one graph alone, i.e. if we imported both tables onto our graph via **setEdge()** every time we computed the shortest path between Africa and North America the result would be a direct connection between Africa and North America thus violating our first constraints.

No biggie we can solve this with a secondary graph with the same nodes but different paths. This concept can also be called using a **layered graph**. Here’s a code draft for the solution:

import graphlib from 'graphlib'; let costGraph; let costMap; let pathGraph; let pathMap; buildGraphs(data) { const options = { directed: false, compound: false, multigraph: false }; pathGraph = new graphlib.Graph(options); costGraph = new graphlib.Graph(options); data.paths.forEach((edge) => { pathGraph.setEdge(edge.from, edge.to); }); data.costs.forEach((edge) => { costGraph.setEdge(edge.from, edge.to, Number(edge.value)); }); costMap = graphlib.alg.floydWarshall( costGraph, (e) => costGraph.edge(e), (e) => costGraph.nodeEdges(e) ); pathMap = graphlib.alg.floydWarshall( pathGraph, (e) => 1, (e) => pathGraph.nodeEdges(e) ); } getPath(from, to) { const subMap = pathMap[from]; const path = []; path.push(to); return buildPath(subMap, path, from, to); //Build path recursively } getCost(from, to) { return costMap[from][to].distance; }

As you can notice, I created 2 variables called **maps** to store the path computation offered by **graphlib**. This way I can simply query what I need in separate methods keeping the rest of the code clean, if I require to reload graph data I can call **buildGraph** with the data in the format of JSON

{ paths: [{ from: 'CE', to: 'NA' }, { from: 'CE', to: 'SA' }...] costs: [{ from: 'CE', to: 'AF', value: '6.20' }, { from: 'AF', to: 'NA', value: '13.15' }...] }

Notice the separation of the chaff from the wheat: the costs are now all associated together in the same graph with discounts included and I kept the paths lean with the only paths allowed.

#### Representation

I got tired of drawing illustrations, how about we generate a graph from data? For this I decided to use one of the most powerful graph libraries in the javascript world: **D3**. The best part about D3 is that there are **enough** examples you can use as a first step. Once you have chosen the kind of graph representation you want to use, it also offers a powerful API to play around with. For this example I will use a force directed layout.

Formatting a D3 data file is as easy as formatting a JSON object:

import _ from 'lodash'; import appRoot from 'app-root-path'; import fs from 'fs'; exportGraph(graph) { let formatData = { nodes: [], links: [] }; let nodes = graph.nodes; let links = graph.edges; _.forEach(nodes, (row, key) => { formatData.nodes.push({ name: row.v, group: key }); }); _.forEach(links, (row) => { this.format.links.push({ source: _.find(formatData.nodes, { name:row.v }).group, target: _.find(formatData.nodes, { name:row.w }).group, value:row.value }); }); fs.writeFile(appRoot + '/' + 'graph.json', JSON.stringify(formatData), function (err) { if (err) throw err; }); }

With some help from my *friend*, **lodash**, the task looks pretty clean.

Finally for the actual frontend code, I got an example from the d3 reference repo and tweaked with it a bit, you can run the example I present on my own repo by building the project and pointing a simple server to the index.html in the d3 folder. For example I used a chrome extension for this purpose.

However I left the example unfinished, the paths don’t represent correctly our layered graph. It’s not that hard to tweak it and it’s a fun exercise I’ll leave up to you.

### The fun never ends

Sometimes you need to see the application to understand the potential and that’s what I aimed at with this post… this is just the tip of the iceberg that is **graph theory**!

Don’t be intimidated to search GitHub or even Google for good graph libraries that do all the heavy lifting for you. Many are also well tested even for production use.