graphpath
-- Analyze RDF DataThe graphpath
package provides a python implementation
of the GraphPath little-language together with an inference rule evaluator and
support for common RDF API's.
graphpath.expr
Class(r)
, Node(v)
and
Property(r)
with their operators.graphpath.entail
RuleBase
is a dictionary of rules and
Sandbox
is a rule evaluator and entailment cache.graphpath.redadapt
RDF.Model
objects. graphpath.libadapt
rdflib
RDF API to GraphPath.
After this module is imported, GraphPath expressions can be bound
to rdflib.store.AbstractTripleStore
objects. graphpath.util
graphpath.expr
This module implements GraphPath expressions as python objects. The GraphPath operators are implemented on these objects as overloaded python operators. The expression objects are immutable and comparable. Their operators are documented in the reference.
The following classes provide the elementary GraphPath steps from which more complex expressions may be composed using the operators. (The module also contains the classes that implement composite expressions but they are not intended to be directly instantiated.)
Once constructed, an expression is evaluated by binding it (with >>
)
to a graph object (e.g. a Model
or TripleStore
) and iterating the result.
In the following, r
and v
stand for node objects.
In the Redland environment r
and v
must be RDF.Node
objects. For rdflib
r
should be a URIRef and v
should be either a URIRef
or Literal
. In addition,
all implementations are expected to accept a string
for v
.
Node(v)
v
.Nodes(s)
s
.
The argument, s
, is an iterable yielding nodes.Class(r)
r
Subject()
Any()
Self()
Any
, constructs a self step that matches any node.Property(r)
r
.HasNo(p)
p
.Map(p)
p
as a mapping of initial nodes to terminal nodes.
If m=Map(p)
then iter(m)
iterates the initial nodes of all paths matching p
and m[key]
is the set of terminal nodes of those paths where key
is the initial node.trace(p)
p
. If q=trace(p)
then q
and p
are equivalent expressions except that q
will print tracing information to sys.stdout
during evaluation.graphpath.entail
This module implements the GraphPath rule and inference system.
RuleBase()
Constructs an empty collection of rules.
A RuleBase
object
implements the mapping protocol where keys may be of type Class
or Property
and values may be arbitrary GraphPath expression objects.
If rules
is a RuleBase
object, p
a GraphPath
expression, and r
a resource, then:
rules[Property(r)]=p
r
containing arcs for all paths matching p
.
More precisely, every path matching p
entails an arc in r
with the same respective initial and terminal nodes. rules[Class(r)]=p
r
containing all nodes matching the predicate p
. To be exact, every
path matching p
entails an rdf type
arc with the same initial node and a terminal node
r
.
Rule definitions are cumulative so that rules[Property(r)]=p; rules[Property(r)]=q
is
equivalent to rules[Property(r)]=p|q
. Similarly, repeated class definitions are
equivalent to a union.
A RuleBase
can be queried using the usual mapping operations.
The keys (Class
and Property
objects) are iterated with iter(rules)
and a rule may be recovered with a subscripting operation: rule=rules[Property(r)]
.
Sandbox(graph, rules)
Constructs a rule evaluator for the given graph and set of rules. The graph
is a
Model
or TripleStore
object or another Sandbox
instance. The
rules
argument is a RuleBase
object.
A Sandbox
object represents a graph and can be bound to a GraphPath expression (with the >>
operator). For example:
graph = RDF.Model() rules = Rulebase() ... augmented = Sandbox( data, rules ) for result in augmented>>Class(r)/Property(p)[Property(q)/Node(v)]: print result
The Sandbox
conceptually contains all of the arcs from the initial graph
augmented with their entailments, inferred by applying the given rules
.
Expressions bound to the Sandbox
will be matched against the augmented graph.
The inference algorithm assumes that the initial graph remains unchanged for the life of the
Sandbox
. However, construction of a new Sandbox
is cheap and does not involve
rule evaluation, which occurs on demand. A rule is evaluated when required to evaluate an expression bound
to the Sandbox
or another rule, recursively. Entailments are cached
in the Sandbox
and released when it is destroyed.
Modules are provided to add support for the Redland and rdflib API's.
graphpath.redadapt
graphpath.redadapt
is imported, GraphPath expressions can be bound
to Redland RDF.Model
objects.graphpath.libadapt
graphpath.libadapt
is imported, GraphPath expressions can be bound
to rdflib.store.AbstractTripleStore
objects.Both modules may be imported into the same program, although mixing RDF API's in the same GraphPath expression is not generally possible.
Each adapter module defines a class, Population
,
that adapts graphs for binding.
However, it is not normally necessary to construct objects of this class explicitly.
The expression
Population(g)>>p
should yield the same bound GraphPath expression as
g>>p
.
The Population
protocol can be implemented
to adapt new graph data structures to GraphPath.
Population(g)
graphpath.expr.StrategyError
,
which will enable other bindings to be attempted. To enable binding, the
constructor (ie the class or a factory) should be appended to the list:
graphpath.expr.adapters
.
values(subject, property)
match(property, object)
rdf_type
Class()
step.)__iter__
The type of the subject, predicate and object (or value) arguments should be whatever type is used for these in underlying RDF API.
It should also be possible to use elementary python types for values, including strings at least. The adapter should convert these to and from the underlying RDF API value node type.
Individual methods can be left unimplemented although this will affect the ability
to evaluate or the speed of certain GraphPath expressions. Unimplemented methods
should raise graphpath.expr.StrategyError
.