graphpath -- Analyze RDF Data

The graphpath package provides a python implementation of the GraphPath little-language together with an inference rule evaluator and support for common RDF API's.

graphpath.expr
The elements for constructing expressions including Class(r), Node(v) and Property(r) with their operators.
graphpath.entail
The classes for defining and evaluating inference rules. RuleBase is a dictionary of rules and Sandbox is a rule evaluator and entailment cache.
graphpath.redadapt
Adds support for the Redland RDF API to GraphPath. After this module is imported, GraphPath expressions can be bound to Redland RDF.Model objects.
graphpath.libadapt
Adds support for the rdflib RDF API to GraphPath. After this module is imported, GraphPath expressions can be bound to rdflib.store.AbstractTripleStore objects.
graphpath.util
Provides support for testing GraphPath on different versions of python and with different RDF API's. Also contains some potentially useful support classes.

graphpath.expr

This module implements GraphPath expressions as python objects. The GraphPath operators are implemented on these objects as overloaded python operators. The expression objects are immutable and comparable. Their operators are documented in the reference.

The following classes provide the elementary GraphPath steps from which more complex expressions may be composed using the operators. (The module also contains the classes that implement composite expressions but they are not intended to be directly instantiated.)

Once constructed, an expression is evaluated by binding it (with >>) to a graph object (e.g. a Model or TripleStore) and iterating the result.

In the following, r and v stand for node objects. In the Redland environment r and v must be RDF.Node objects. For rdflib r should be a URIRef and v should be either a URIRef or Literal. In addition, all implementations are expected to accept a string for v.

Node(v)
Constructs a node step that matches the single node, v.
Nodes(s)
Constructs a enumerated set step that matches any node from a set given by s. The argument, s, is an iterable yielding nodes.
Class(r)
Constructs a class step that matches any node of type r
Subject()
Constructs a subject step that matches the initial node of any arc.
Any()
Constructs an any step that matches any node.
Self()
A synonym for Any, constructs a self step that matches any node.
Property(r)
Constructs a property step that matches each arc labeled r.
HasNo(p)
The negative predicate operator, inverts the sense of the predicate, p.
Map(p)
Represent an expression p as a mapping of initial nodes to terminal nodes. If m=Map(p) then iter(m) iterates the initial nodes of all paths matching p and m[key] is the set of terminal nodes of those paths where key is the initial node.
trace(p)
Create a traced expression from p. If q=trace(p) then q and p are equivalent expressions except that q will print tracing information to sys.stdout during evaluation.

graphpath.entail

This module implements the GraphPath rule and inference system.

RuleBase()

Constructs an empty collection of rules.

A RuleBase object implements the mapping protocol where keys may be of type Class or Property and values may be arbitrary GraphPath expression objects. If rules is a RuleBase object, p a GraphPath expression, and r a resource, then:

rules[Property(r)]=p
Defines a property r containing arcs for all paths matching p. More precisely, every path matching p entails an arc in r with the same respective initial and terminal nodes.
rules[Class(r)]=p
Defines a class r containing all nodes matching the predicate p. To be exact, every path matching p entails an rdf type arc with the same initial node and a terminal node r.

Rule definitions are cumulative so that rules[Property(r)]=p; rules[Property(r)]=q is equivalent to rules[Property(r)]=p|q. Similarly, repeated class definitions are equivalent to a union.

A RuleBase can be queried using the usual mapping operations. The keys (Class and Property objects) are iterated with iter(rules) and a rule may be recovered with a subscripting operation: rule=rules[Property(r)].

Sandbox(graph, rules)

Constructs a rule evaluator for the given graph and set of rules. The graph is a Model or TripleStore object or another Sandbox instance. The rules argument is a RuleBase object.

A Sandbox object represents a graph and can be bound to a GraphPath expression (with the >> operator). For example:

graph = RDF.Model()
rules = Rulebase()
...
augmented = Sandbox( data, rules )
for result in augmented>>Class(r)/Property(p)[Property(q)/Node(v)]:
	print result

The Sandbox conceptually contains all of the arcs from the initial graph augmented with their entailments, inferred by applying the given rules. Expressions bound to the Sandbox will be matched against the augmented graph.

The inference algorithm assumes that the initial graph remains unchanged for the life of the Sandbox. However, construction of a new Sandbox is cheap and does not involve rule evaluation, which occurs on demand. A rule is evaluated when required to evaluate an expression bound to the Sandbox or another rule, recursively. Entailments are cached in the Sandbox and released when it is destroyed.

Adapter Protocol

Modules are provided to add support for the Redland and rdflib API's.

graphpath.redadapt
After graphpath.redadapt is imported, GraphPath expressions can be bound to Redland RDF.Model objects.
graphpath.libadapt
And after graphpath.libadapt is imported, GraphPath expressions can be bound to rdflib.store.AbstractTripleStore objects.

Both modules may be imported into the same program, although mixing RDF API's in the same GraphPath expression is not generally possible.

Each adapter module defines a class, Population, that adapts graphs for binding. However, it is not normally necessary to construct objects of this class explicitly. The expression Population(g)>>p should yield the same bound GraphPath expression as g>>p.

The Population protocol can be implemented to adapt new graph data structures to GraphPath.

Population(g)
Constructs an adapter for the given graph data structure, to be bound to an expression. When passed an argument of the wrong type, the constructor should raise graphpath.expr.StrategyError, which will enable other bindings to be attempted. To enable binding, the constructor (ie the class or a factory) should be appended to the list: graphpath.expr.adapters.
values(subject, property)
Returns a python Set of all nodes o, such that the arc (subject, property, o) is in the graph.
match(property, object)
Returns a python Set of all nodes s, such that the arc (s, property, object) is in the graph. (This method is preferentially used in predicate evaluation.)
rdf_type
An attribute whose value is the symbol for the rdf:type property in this graph. (Used by the Class() step.)
__iter__
Returns an iterator for the subject nodes s, for all arcs (s, p, o). (Used by the Subject() step and to evaluate certain expressions where no other strategy is available.)

The type of the subject, predicate and object (or value) arguments should be whatever type is used for these in underlying RDF API.

It should also be possible to use elementary python types for values, including strings at least. The adapter should convert these to and from the underlying RDF API value node type.

Individual methods can be left unimplemented although this will affect the ability to evaluate or the speed of certain GraphPath expressions. Unimplemented methods should raise graphpath.expr.StrategyError.