July 19, 2016
June 25, 2016
June 7, 2016
April 7, 2016
October 15, 2015
July 6, 2015
May 29, 2015
May 5, 2015
March 18, 2015
March 13, 2015
February 26, 2015
February 16, 2015
February 13, 2015
January 31, 2015
January 26, 2015
Maze solver
Solving mazes: http://caub.github.io/misc/maze shamefully with this great pathfinding library.
It searches paths through image pixels, here's a fiddle illustrating it http://jsfiddle.net/crl/6amoefe0/
It searches paths through image pixels, here's a fiddle illustrating it http://jsfiddle.net/crl/6amoefe0/
January 17, 2015
January 14, 2015
December 23, 2014
DBify
A simple app that will allow you to call your DB directly, it's meant for playing and testing only.
Examples:
Examples:
post("/sql", ["select * from test where hello=?", 1])
post(
"/mongo", ["find", {"hello": 1}])
Demo: https://dbify.herokuapp.com/
June 11, 2014
February 22, 2014
MongoCli: MongoDB + websocket + knockoutJS
Mongo-cli is a web 'swiss-knife' that I made last year, for conveniently saving any type of data from javascript, with a set of
permissions R/W/Modify by person, and a notification to the group of
persons concerned (allowed readers).
So it's a minimal (regarding features and complexity/size of code) Content Management System.
It's named mongo-cli (Command Line Interface) as the way it's used as an interactive shell to MongoDB, simply from Javascript:
Demos:
- heroku nodejs (Paint, Chat, Todos)
- heroku java (Paint, Chat, Todos)
- cloudbees java
It's on github:
https://github.com/n11/mongo-cli-nodejs
https://github.com/n11/mongo-cli-java
So it's a minimal (regarding features and complexity/size of code) Content Management System.
It's named mongo-cli (Command Line Interface) as the way it's used as an interactive shell to MongoDB, simply from Javascript:
send({fn:'insert', args:[{a: 2, hello: 'world'}]}, console.log)
send({fn:'find', args:[{a: 2}]}, console.log)
Demos:
- heroku nodejs (Paint, Chat, Todos)
- heroku java (Paint, Chat, Todos)
- cloudbees java
It's on github:
https://github.com/n11/mongo-cli-nodejs
https://github.com/n11/mongo-cli-java
February 4, 2013
Key-Value Indexation on Cassandra
During my work at Inria Sophia-Antipolis, there was an exciting challenge to add a search logic on top of Cassandra database, which is a bit 'raw' and designed for efficiency.
Cassandra tables are structured in a key-value scheme like DHTs, at the difference that values are themselves ordered* maps of inner (key-value) tuples called columns, the table is called column-family. (*specified at their creation)
Cassandra tables are structured in a key-value scheme like DHTs, at the difference that values are themselves ordered* maps of inner (key-value) tuples called columns, the table is called column-family. (*specified at their creation)
Cql3 Index creation is an alternative way, but would not be very adequate, with any possible index name.
A realistic way is for example tree structures such as B-trees, however in our use case, we chose to explore a very simple method: inverted indexes that acts simply like a flat directory lookup.
A realistic way is for example tree structures such as B-trees, however in our use case, we chose to explore a very simple method: inverted indexes that acts simply like a flat directory lookup.
Let's illustrate it with an example:
{
'a5e9c146f': {
'text': 'The quick brown fox jumps over lazy dog',
'title': 'Fox'
'author': 'mike123@gmail.com',
'date': '2012-11-12'
}
}
{
'author=mike123%40gmail.com' : {'a5e9c146f': null},
'text=quick' : {'a5e9c146f': null},
'text=brown' : {'a5e9c146f': null},
'date=2012-11-12' : {'a5e9c146f': null},
//...
key=value pairs are generated from the original data, either with the exact value or post-processed in special cases (in this example, 'text' value is split in single words).Column entries are simply formed with a unique data id and an empty value (we don't need it).
Row-keys can also be compositions of name-value pairs sorted alphabetically and concatenated:
//...
'author=mike123%40gmail.com&text=quick' : {'a5e9c146f': null},
'author=mike123%40gmail.com&text=brown' : {'a5e9c146f': null},
'text=quick&title=Fox' : {'a5e9c146f': null},
//..
'date=2012-11-12&text=fox&text=quick' : {'a5e9c146f': null},
//...
//...
}
You can see how it grows up, we would end up with 2^k insertions. This exponential growth is not acceptable, we can limit the length of composites with min(3, k), for instance. This way, we don't exceed Sum(i=0..min(3,k) of BinomialCoefficient(i, k)) insertions, which scales better, in O(n³). Actually there are high chances that rows like text=the or text=dog already exist in database, so we are just appending this data id column to these rows.
Search queries
{
'title': ['Fox'], 'text': ['brown', 'quick']
}
will be a query matching our example above, thanks to the index row 'text=brown&text=quick&title=Fox'.For larger composites exceeding 3 elements, the first 3 elements will be queried directly against the database, the next 3 also, untill the last 3 (in order to reduce results count); before intersecting these different result sets.
We can also form 'OR' queries, that perform parallel queries and merge results in the same map:
With numbers, we rather want to search over a range. Cassandra has a built-in column-keys comparator (range queries are allowed within a row), that could serve for 1-Dimension ranges. But we preferred to use a DB-agnostic method, with k-ary trees (in particular decimal trees).
[
{'title':['Fox'], 'text':['brown','quick']},
{'title':['Fox'], 'text':['test']},
{'title':['Fox'], 'date':['2012-11-12']}
]
Range queries
For instance, for indexing '2012-11-12' (1352678400 unix time in seconds), we add all starting subprefixes of this value to the rest of indexed pairs of this data:
Searching a range, the whole 2012 year, [1325376000, 1356998400] is done by finding the 'best' covering indexes. The more accurate the covering is, the larger the OR query will be.
[132,133,134,135] has a resolution of only 3 (representing an interval of 1e7 seconds=115 days), [1325,1326,1327,1328,1329,133,134,1350,1351,1352,1353,1354,1355,1356] has a resolution of 4 (thus at worse 1e6 seconds=11days of error), etc..
{
'_t=1' : {'a5e9c146f': null},
//..
'_t=1352678400' : {'a5e9c146f': null},
//..
'_t=1&_t=13&text=quick' : {'a5e9c146f': null},
'_t=13&_t=135&text=quick' : {'a5e9c146f': null},
'_t=135&_t=1352&text=quick' : {'a5e9c146f': null},
//...
'_t=135267&text=quick' : {'a5e9c146f': null},
//...
'_t=131&_t=13526&_t=1352678400' : {'a5e9c146f': null},
//...
[132,133,134,135] has a resolution of only 3 (representing an interval of 1e7 seconds=115 days), [1325,1326,1327,1328,1329,133,134,1350,1351,1352,1353,1354,1355,1356] has a resolution of 4 (thus at worse 1e6 seconds=11days of error), etc..
For 2D, we can continue to apply this method for each coordinate:
{type: 'create', data: {t: 958, name:'Bolt'}, handler: {t: 'range4'}} // generates indexes [0,09,095,0958]
{type: 'search', data: {t: [900,1000]}, handler: {t: 'range4'}} //finds [09] as the optimal list of indexes to search, "handler: {t: 'range4'}" is optional, it will enforce a maximal range value of length 4
{
'_lat=8e21&_lat=8e21f&_lon=1a1242' : {'a5e9c146f': null},
//...
Searching in a bounding box is similar, by covering ranges of each dimension.
note: there exists alternatives like space-filling curves, to go down to 1D. (see this blog)Conclusion
- Searches are fast (mainly under 3 'AND' searched terms)
- Insertion, update and deletion will need to regenerate those indexes each time.
- We index (and search) differently K-V terms: either directly (exact value of V), or by range, or with a semantic parsing of V (e.g. stemming, de-conjugate verbs, eliminate plurals ... similarly to Lucene text-processing).
- An optional object in the query maps a specific field in data to a defined handler in our backend, ex: range1|range2|...|range15|text|binary (rangeN generates indexes with the first N subprefixes)
{type: 'create', data: {t: 958, name:'Bolt'}, handler: {t: 'range4'}} // generates indexes [0,09,095,0958]
{type: 'search', data: {t: [900,1000]}, handler: {t: 'range4'}} //finds [09] as the optimal list of indexes to search, "handler: {t: 'range4'}" is optional, it will enforce a maximal range value of length 4
February 2, 2013
An example of KeyValue indexation for graphs
{
'graph:1:A': {
'graph:1:D': 'knows',
'graph:1:B': 'knows'
},'graph:1:B': {
'graph:1:C': 'knows'
},'graph:1:C': {
'graph:1:D': 'knows',
'graph:1:E': 'knows'
},'graph:1:D': {
},'graph:1:E': {
}
}
These data items can be created with{type: 'create', data: {'graph:1:D': 'knows', 'graph:1:B': 'knows'}, id:'graph:1:A'}
//etc..
which produces in the inverted-index table, the rows
{
'graph:1:D=knows': {
'graph:1:A': null,
'graph:1:C': null
},'graph:1:B=knows': {
'graph:1:A': null
},'graph:1:B=knows&graph:1:D=knows': {
'graph:1:A': null
},'graph:1:E=knows': {
'graph:1:C': null
},'graph:1:D=knows&graph:1:E=knows': {
'graph:1:C': null
},'graph:1:C=knows': {
'graph:1:B': null
}
}
The keys contain ':', this notations called composite column keys is actually useful inside rows (As the columns in a row are sorted, searching by one composite prefix is possible, so Cassandra allows to search 'graph:1:*' or other more specific range query ['graph:1:D', 'graph:1:J']) but all this isn't useful when applied on row keys directly, the row keys partitioners, usually chosen, don't preserve ordering, so this technic is simply a way to prefix, partition data when it is wanted.The above example is actually modeling outgoing connections in the data items, and the in-going ones are in the inversed-indexes items.
we can read outgoing connections for a node with:
{type: 'read', id:'graph:1:C'}
or read its ingoing connections with
{type: 'search', data: {'graph:1:C': 'knows'}}
search nodes with A and C as predecessors
{type: 'search', data: {'graph:1:C': 'knows', 'graph:1:A': 'knows'}}
searching nodes with A or C as predecessor
{type: 'search', data: [{'graph:1:C': 'knows'}, {'graph:1:A': 'knows'}]}
Note that there is a dissymetry here, ingoing connections (predecessors) queries can be richer than direct read of data (successors), you can't concretely search for nodes that would have A and C as successors. The way to deal with this is to add in the data the other connections directions, e.g. 'graph:1:B': {'graph:1:A': 'isKnownBy'}...
More complex queries will be the responsibility of the app. Although we plan to extend the API to execute a sequence of searches, instead of multiple API calls.
ex:
{
type: 'search',
data: {'graph:1:C': 'knows'},
each_i: {
type: 'search',
data: {i: 'knows'},
each_i: {type:'read', id: i}
}
}
Subscribe to:
Posts (Atom)