Graph play

Using tables to store data in a database is so common many developers often apply the same relational-database-voodoo solutions and focus on code architecture instead. It’s a convenient trap one might fall into when deadlines are tough or little attention is given to the type of information being stored. It’s amazing how little graphs are considered as a solution to represent data, often perceived as too complex to translate into code.

Ultimately the persistence of graph data translates into a table but the way you handle your data is completely different.

The application of graphs in programming is huge, more than the typical minimum distance problems!

Basics

It is well worth to explore the mathematical definitions but I won’t be covering so deeply every concept. Instead I will present a few simple examples that illustrate how much you can do with so little.

First of all it’s important to understand what I mean by node graph:

Undirected 3 node graph

Undirected 3 node graph

Notice the example above is an undirected graph, this means it’s possible to traverse the graph starting on A all the way to B and vice-versa. This wouldn’t be the case if we had a directed graph instead:

Directed 3 node graph

Directed 3 node graph

In this case we have the same graph but once we go from A to B we can’t go back! Practically this can be translated to:

graph -> getEdge('A','B'); //true
graph -> getEdge('B','A'); //undefined

With direction we can define path constraints!

Distance

Say you’d like to add a value by going through an edge, in that case you are adding a distance or simply a label. With labels you can describe an operation:

Undirected labeled graph

Undirected labeled graph

graph -> getEdge('A','B'); //+
graph -> getEdge('B','A'); //+
graph -> getEdge('B','C'); //-
graph -> getEdge('C','B'); //-

…similarly with distance values you can add rates:

Undirected weighted graph

Undirected weighted graph

graph -> getEdge('A','B'); //0.5
graph -> getEdge('B','A'); //0.5
graph -> getEdge('B','C'); //1.2
graph -> getEdge('C','B'); //1.2

Multigraphs

Multigraphs offer a more broad range of solutions for complex graphs. We simply allow nodes to have more than one edge connecting the same nodes, how? You add a name to the edge of course.

Undirected 3 node multigraph

Undirected 3 node multigraph

graph -> getEdge('A','B', 'input'); //ioread
graph -> getEdge('A','B', 'output'); //iowrite
graph -> getEdge('B','C', 'input'); //fileread
graph -> getEdge('B','C', 'output'); //filewrite

This example is merely illustrative, my point being: if you’re not used to graph theory I hope by now you start understanding the potential of what you can do if you own a good graph library.

It’s amazing the amount of times I’ve seen business logic hammered programatically. With graphs you can hide complexity and apply clear business logic loaded from a database (for example) promoting cleaner code instead of applying long, nested if blocks or unending configuration files.

Concrete example

Before yawns are had I’ll play a bit with an example and for the sake of it I’ll use javascript with the cool public graphlib.

Say we have a worldwide online shop with multiple outposts (1 per continent) and we want to access shipping costs of a package. Before shipping to the customers’ address the package has to first be shipped along to the nearest outpost. The scenario would look something like this:

NA - North America SA - South America CE - Central Europe AF - Africa AS - Asia AU - Australia

NA – North America
SA – South America
CE – Central Europe
AF – Africa
AS – Asia
AU – Australia

Let also add some prices per trip:

From To Value
CE NA 14.85 $
CE SA 13.15 $
CE AF 6.20 $
CE AS 23.60 $
SA AF 13.15 $
AS AU 9.65 $

… this would be a bit boring so lets add a special constraint to spice it up. Our outpost in Africa managed to negotiate a great discount, valid only when shipping to North America. The shipping path stays the same but now we have a price exception.

From To Value
AF NA 13.15 $

The problem with the exception is that if we wanted to retrieve the shortest path and the lowest cost possible it wouldn’t work with one graph alone, i.e. if we imported both tables onto our graph via setEdge() every time we computed the shortest path between Africa and North America the result would be a direct connection between Africa and North America thus violating our first constraints.

No biggie we can solve this with a secondary graph with the same nodes but different paths. This concept can also be called using a layered graph. Here’s a code draft for the solution:

import graphlib from 'graphlib';

let costGraph;
let costMap;
let pathGraph;
let pathMap;

buildGraphs(data) {
  const options = { directed: false, compound: false, multigraph: false };
  pathGraph = new graphlib.Graph(options);
  costGraph = new graphlib.Graph(options);

  data.paths.forEach((edge) => {
    pathGraph.setEdge(edge.from, edge.to);
  });

  data.costs.forEach((edge) => {
    costGraph.setEdge(edge.from, edge.to, Number(edge.value));
  });

  costMap = graphlib.alg.floydWarshall(
    costGraph,
    (e) => costGraph.edge(e),
    (e) => costGraph.nodeEdges(e)
  );

  pathMap = graphlib.alg.floydWarshall(
    pathGraph,
    (e) => 1,
    (e) => pathGraph.nodeEdges(e)
  );
}

getPath(from, to) {
  const subMap = pathMap[from];
  const path = [];

  path.push(to);
  return buildPath(subMap, path, from, to); //Build path recursively
}

getCost(from, to) {
  return costMap[from][to].distance;
}

As you can notice, I created 2 variables called maps to store the path computation offered by graphlib. This way I can simply query what I need in separate methods keeping the rest of the code clean, if I require to reload graph data I can call buildGraph with the data in the format of JSON

{
  paths: [{ from: 'CE', to: 'NA' }, { from: 'CE', to: 'SA' }...]
  costs: [{ from: 'CE', to: 'AF', value: '6.20' }, { from: 'AF', to: 'NA', value: '13.15' }...]
}

Notice the separation of the chaff from the wheat: the costs are now all associated together in the same graph with discounts included and I kept the paths lean with the only paths allowed.

Representation

I got tired of drawing illustrations, how about we generate a graph from data? For this I decided to use one of the most powerful graph libraries in the javascript world: D3. The best part about D3 is that there are enough examples you can use as a first step. Once you have chosen the kind of graph representation you want to use, it also offers a powerful API to play around with. For this example I will use a force directed layout.

Formatting a D3 data file is as easy as formatting a JSON object:

import _ from 'lodash';
import appRoot from 'app-root-path';
import fs from 'fs';

exportGraph(graph) {

  let formatData = {
    nodes: [],
    links: []
  };

  let nodes = graph.nodes;
  let links = graph.edges;

  _.forEach(nodes, (row, key) => {
    formatData.nodes.push({
      name: row.v,
      group: key
    });
  });

  _.forEach(links, (row) => {
    this.format.links.push({
      source: _.find(formatData.nodes, { name:row.v }).group,
      target: _.find(formatData.nodes, { name:row.w }).group,
      value:row.value
    });
  });

  fs.writeFile(appRoot + '/' + 'graph.json', JSON.stringify(formatData), function (err) {
    if (err) throw err;
  });
}

With some help from my friend, lodash, the task looks pretty clean.

Finally for the actual frontend code, I got an example from the d3 reference repo and tweaked with it a bit, you can run the example I present on my own repo by building the project and pointing a simple server to the index.html in the d3 folder. For example I used a chrome extension for this purpose.

However I left the example unfinished, the paths don’t represent correctly our layered graph. It’s not that hard to tweak it and it’s a fun exercise I’ll leave up to you.

The fun never ends

Sometimes you need to see the application to understand the potential and that’s what I aimed at with this post… this is just the tip of the iceberg that is graph theory!

Don’t be intimidated to search GitHub or even Google for good graph libraries that do all the heavy lifting for you. Many are also well tested even for production use.

References:

1 – Repo with a code example

2 – Graphlib for javascript

3 – D3.js

By |August 30th, 2015|Categories: ALL|

About the Author:

joaohrpereira@gmail.com'
Software engineer passionate about the web and mobile technologies. On my free time I usually read, code and travel.