February 2, 2013

An example of KeyValue indexation for graphs

The DB architecture was defined in the previous post, let's illustrate it with a graph example
{  
  'graph:1:A': {
    'graph:1:D': 'knows',
    'graph:1:B': 'knows'
  },'graph:1:B': {
    'graph:1:C': 'knows'
  },'graph:1:C': {
    'graph:1:D': 'knows',
    'graph:1:E': 'knows'
  },'graph:1:D': {
  },'graph:1:E': {
  }
}
These data items can be created with
{type: 'create', data: {'graph:1:D': 'knows', 'graph:1:B': 'knows'}, id:'graph:1:A'}
//etc..

which produces in the inverted-index table, the rows
{  
  'graph:1:D=knows': {
    'graph:1:A': null,
    'graph:1:C': null
  },'graph:1:B=knows': {
    'graph:1:A': null
  },'graph:1:B=knows&graph:1:D=knows': {
    'graph:1:A': null
  },'graph:1:E=knows': {
    'graph:1:C': null
  },'graph:1:D=knows&graph:1:E=knows': {
    'graph:1:C': null
  },'graph:1:C=knows': {
    'graph:1:B': null
  }
}
The keys contain ':', this notations called composite column keys is actually useful inside rows (As the columns in a row are sorted, searching by one composite prefix is possible, so Cassandra allows to search 'graph:1:*' or other more specific range query ['graph:1:D', 'graph:1:J']) but all this isn't useful when applied on row keys directly, the row keys partitioners, usually chosen, don't preserve ordering, so this technic is simply a way to prefix, partition data when it is wanted.

The above example is actually modeling outgoing connections in the data items, and the in-going ones are in the inversed-indexes items.

we can read outgoing connections for a node with:
{type: 'read', id:'graph:1:C'}
or read its ingoing connections with
{type: 'search', data: {'graph:1:C': 'knows'}}
search nodes with A and C as predecessors
{type: 'search', data: {'graph:1:C': 'knows', 'graph:1:A': 'knows'}}
searching nodes with A or C as predecessor
{type: 'search', data: [{'graph:1:C': 'knows'}, {'graph:1:A': 'knows'}]}

Note that there is a dissymetry here, ingoing connections (predecessors) queries can be richer than direct read of data (successors), you can't concretely search for nodes that would have A and C as successors. The way to deal with this is to add in the data the other connections directions, e.g. 'graph:1:B': {'graph:1:A': 'isKnownBy'}...
More complex queries will be the responsibility of the app. Although we plan to extend the API to execute a sequence of searches, instead of multiple API calls.
ex:
{
type: 'search',
data: {'graph:1:C': 'knows'},
each_i: {
type: 'search',
data: {i: 'knows'},
each_i: {type:'read', id: i}
}
}