Sunday, June 10, 2012

neo4j: Using Cypher Query Language with .NET


After reading the neo4j REST documentation I've created  JSONCypherQuery class with some custom methods in in order to use custom Cypher queries in .NET application.

Creating Classes for Serialization


Below is an example request and response from ne04j REST API documentation


Example request
  • POST http://localhost:7474/db/data/cypher
  • Accept: application/json
  • Content-Type: application/json
{"query": "start x  = node(27) match x -[r]-> n return type(r), n.name?,
n.age?", "params": {}},
Example response
  • 200: OK
  • Content-Type: application/json
{
  "data" : [ [ "know", "him", 25 ], [ "know", "you", null ] ],
  "columns" : [ "type(r)", "n.name?", "n.age?" ]
}

To create a request string formatted the same way as
{"query": "start x  = node(27) match x -[r]-> n return type(r), n.name?, n.age?","params": {}},

we can create a class


class JSONQueryCommand
    {
        [JsonProperty("query")]
        public string Query { get; set; }
        [JsonProperty("params")]
        public string Parameters { get; set; }
    }

and method which posts a request and gets a response as string.

Handling Requests and Responses

The method below transforms the Cypher query string into appropriate web request:


private static string CreateRequest(string query)
        {
            string response = "";
            try
            {
                //Connect
                //http://localhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://localhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query");
                req.Method = "POST";
                req.ContentType = "application/json";
                string parameters = null;
                JSONQueryCommand cmd = new JSONQueryCommand();
                cmd.Query = query;
                cmd.Parameters = parameters;
                string json = JsonConvert.SerializeObject(cmd);
                using (var streamWriter = new StreamWriter(req.GetRequestStream()))
                {

                    streamWriter.Write(json);
                }
                var httpResponse = (HttpWebResponse)req.GetResponse();
                using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
                {
                    var responseText = streamReader.ReadToEnd();
                    response = responseText;
                    //Now you have your response.
                    //or false depending on information in the response    
                }
            }
            catch (Exception ex)
            {
            }
            return response;
        }

In order to handle the received string and extract the data we need additional methods. Below is the simple method that extracts scalar value:


public static object GetScalar(string request)
        {
            string response = CreateRequest(request);
            var joResponse = JObject.Parse(response);
            var jaData = (JArray)joResponse["data"];
            var dataArray = jaData.First();
            var firstDataElement = dataArray.First();
            JValue jResult = (JValue)firstDataElement;
            object result = jResult.Value;
            return result;
        }

Example

After creating these methods we can simply use custom Cypher queries.

object response = JSONCypherQuery.GetScalar("start a = node(1) MATCH (a)--(b) return count(b);");
int neighborsCount= Convert.ToInt32(response);


The code block

 req.Method = "POST";
                req.ContentType = "application/json";
                string parameters = null;
                JSONQueryCommand cmd = new JSONQueryCommand();
                cmd.Query = query;
                cmd.Parameters = parameters;
                string json = JsonConvert.SerializeObject(cmd);


creates a web rquest with content

{"query":"start a = node(1) MATCH (a)--(b) return count(b);","params":null}

which is sent to server and if we know what kind of data are we expecting to receive from server, we can create an appropriate parsing/deserializing method. For the example query the result is scalar and the method GetScalar extracts it from response string.




Social Network Analysis with neo4j: Graphs

I have been analyzing social networks and recently switched the database from standard relational to neo4j. The latter is much more appropriate for working with graphs and some basic insights about nodes and edges can be calculated in an instant.

The difference is obvious while dealing with big dataset. Below is an example SQL query for retrieving all connections among users in selected distance:


with recursive cluster (party, path, depth) 
 as ( select cast(@userId as character varying), cast(@userId as character varying), 1 
 union 
 ( 
 select (case 
 when this.party = amc.userA then amc.userB 
 when this.party = amc.userB then amc.userA 
 end), (this.path || '.' || (case 
 when this.party = amc.userA then amc.userB 
 when this.party = amc.userB then amc.userA 
 end)), this.depth + 1 
 from cluster this, chat amc 
 where ((this.party = amc.userA and position(amc.userB in this.path) = 0) 
 or (this.party = amc.userB and position(amc.userA in this.path) = 0)) AND this.depth < @depth + 1 ) ) 
 select party, path 
 from cluster 
 where not exists ( 
 select * 
 from cluster c2 where cluster.party = c2.party 
 and ( 
 char_length(cluster.path) > char_length(c2.path)
 or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path) 
 ) 
 ) 
 order by party, path;


Running such query on database with several million users and connections takes very long time (talking in hours on proprietary PC).




Below is the Cypher query for neo4j database which counts all friends of friends (equivalent to above one in case of depth = 2)



neo4j-sh (0)$ start b = node:User(UserId='9F56478E6CAFB9CFF8C720C5DFC392C49495C582') MATCH (b) --(friend)--(friendoffriend) RETURN count(friendoffriend)
==> +-----------------------+
==> | count(friendoffriend) |
==> +-----------------------+
==> | 131457                |
==> +-----------------------+
==> 1 row, 635 ms

The advantage in performance and simplicity is obvious.

Running queries in neo4j console



Below are some more example Cypher queries for working with graphs. you can try out these and other on simple example network on this website.

Graph Screenshot from neo4j Console


Find Neighbors


start a=node(*)
match (a)-->(b)
return a, b;
+-----------------------+
| a         | b         |
+-----------------------+
| Node[0]{} | Node[1]{} |
| Node[1]{} | Node[2]{} |
| Node[1]{} | Node[3]{} |
| Node[1]{} | Node[4]{} |
| Node[1]{} | Node[5]{} |
| Node[2]{} | Node[6]{} |
| Node[2]{} | Node[7]{} |
| Node[3]{} | Node[4]{} |
| Node[5]{} | Node[6]{} |
+-----------------------+
9 rows
0 ms

Find Mutual Connections


start a=node(*), b=node(*)
match (a)--(x)--(b)
return a, b, x
+-----------------------------------+
| a         | b         | x         |
+-----------------------------------+
| Node[0]{} | Node[2]{} | Node[1]{} |
| Node[0]{} | Node[3]{} | Node[1]{} |
| Node[0]{} | Node[4]{} | Node[1]{} |
| Node[0]{} | Node[5]{} | Node[1]{} |
| Node[1]{} | Node[3]{} | Node[4]{} |
| Node[1]{} | Node[4]{} | Node[3]{} |
| Node[1]{} | Node[6]{} | Node[2]{} |
| Node[1]{} | Node[6]{} | Node[5]{} |
| Node[1]{} | Node[7]{} | Node[2]{} |
| Node[2]{} | Node[0]{} | Node[1]{} |
| Node[2]{} | Node[3]{} | Node[1]{} |
| Node[2]{} | Node[4]{} | Node[1]{} |
| Node[2]{} | Node[5]{} | Node[6]{} |
| Node[2]{} | Node[5]{} | Node[1]{} |
| Node[3]{} | Node[0]{} | Node[1]{} |
| Node[3]{} | Node[1]{} | Node[4]{} |
| Node[3]{} | Node[2]{} | Node[1]{} |
| Node[3]{} | Node[4]{} | Node[1]{} |
| Node[3]{} | Node[5]{} | Node[1]{} |
| Node[4]{} | Node[0]{} | Node[1]{} |
| Node[4]{} | Node[1]{} | Node[3]{} |
| Node[4]{} | Node[2]{} | Node[1]{} |
| Node[4]{} | Node[3]{} | Node[1]{} |
| Node[4]{} | Node[5]{} | Node[1]{} |
| Node[5]{} | Node[0]{} | Node[1]{} |
| Node[5]{} | Node[2]{} | Node[6]{} |
| Node[5]{} | Node[2]{} | Node[1]{} |
| Node[5]{} | Node[3]{} | Node[1]{} |
| Node[5]{} | Node[4]{} | Node[1]{} |
| Node[6]{} | Node[1]{} | Node[2]{} |
| Node[6]{} | Node[1]{} | Node[5]{} |
| Node[6]{} | Node[7]{} | Node[2]{} |
| Node[7]{} | Node[1]{} | Node[2]{} |
| Node[7]{} | Node[6]{} | Node[2]{} |
+-----------------------------------+
34 rows
0 ms

Count Mutual Connections


start a=node(*), b=node(*)
match (a)--(x)--(b)
where id(a) < id(b)
return a, b, count(distinct x)
+-------------------------------------------+
| a         | b         | count(distinct x) |
+-------------------------------------------+
| Node[0]{} | Node[5]{} | 1                 |
| Node[2]{} | Node[5]{} | 2                 |
| Node[3]{} | Node[4]{} | 1                 |
| Node[6]{} | Node[7]{} | 1                 |
| Node[1]{} | Node[4]{} | 1                 |
| Node[0]{} | Node[3]{} | 1                 |
| Node[4]{} | Node[5]{} | 1                 |
| Node[1]{} | Node[6]{} | 2                 |
| Node[0]{} | Node[4]{} | 1                 |
| Node[1]{} | Node[7]{} | 1                 |
| Node[0]{} | Node[2]{} | 1                 |
| Node[1]{} | Node[3]{} | 1                 |
| Node[2]{} | Node[3]{} | 1                 |
| Node[2]{} | Node[4]{} | 1                 |
| Node[3]{} | Node[5]{} | 1                 |
+-------------------------------------------+
15 rows
0 ms

Calculate Clustering Coefficient


start a = node(1)
match (a)--(b)
with a, b as neighbours
match (a)--()-[r]-()--(a)
where id(a) <> id(neighbours) and id(neighbours) <> 0
return count(distinct neighbours), count(distinct r)
+------------------------------------------------+
| count(distinct neighbours) | count(distinct r) |
+------------------------------------------------+
| 4                          | 1                 |
+------------------------------------------------+
1 row
0 ms

The clustering coefficient of a selected node is defined as probability that two randomly selected neighbors are connected to each other. So once having number of neighbors and number of mutual connections we can calculate:

1. The number of possible connections between two neighbors = n!/(2!(n-2)!) = 4!/(2!(4-2)!) = 24/4 = 6
where n is the number of neighbors n = 4

and the actual number of connections is 1,

therefore the clustering coefficient of node 1 is 1/6

References

Cypher Query Language

Networks, Crowds, and Markets: Reasoning About a Highly Connected World By David Easley and Jon Kleinberg







Saturday, June 9, 2012

neo4j: Creating .NET REST API

After spending several days trying to implement some of the existing neo4j libraries for .NET without success, I've decided to create a library on my own.

Personally I prefer Cypher query language over Gremlin so I went through official neo4j documentation and started with development of REST client, based on Newtonsoft.NET.

Below is the code for adding node to index:


public string AddNodeToIndex(int nodeReference, string key, string value, string indexName)
        {
            string response = "";
            JSONAddNodeToIndex jsonObj = new JSONAddNodeToIndex();
            jsonObj.Uri = "http://localhost:7474/db/data/node/" + nodeReference.ToString();
            jsonObj.Value = value;
            jsonObj.Key = key;

            string json = JsonConvert.SerializeObject(jsonObj);
            //index name:favorites : ttp://localhost:7474/db/data/index/node/favorites
            HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://localhost:7474/db/data/index/node/" + indexName);
            req.Method = "POST";
            req.ContentType = "application/json";
            using (var streamWriter = new StreamWriter(req.GetRequestStream()))
            {

                streamWriter.Write(json);
            }
            var httpResponse = (HttpWebResponse)req.GetResponse();
            using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
            {
                var responseText = streamReader.ReadToEnd();
                response = responseText;
            }

            return response;
        }
    }


Hopefully soon there will be more code clean enough to share.