Canonical URL
Do not index
Do not index
FROM
vs FROM NAMED
, what’s the difference, and when should I use one or the other is a constant source of confusion for SPARQL users.It’s one of the main reasons why a query can surprisingly return zero results and the most experienced of us have been tricked by it at least once. This short post goes into a little bit of a detail of the difference and discusses how both can be used to address different use cases.
Others have written excellent posts demystifying parts of the SPARQL specification related to this. Jeen Broekstra’s post on the default graph and Marcelo Barbieri’s general post on working with named graphs (including Stardog specific extensions) are particularly useful. We won’t touch on the default graph matter here at all (thanks, Jeen!) and will focus on queries with
FROM
or FROM NAMED
keywords.You can specify the query’s dataset externally to the query using HTTP parameters as defined in the SPARQL Protocol. As far as this post is concerned, that has the same effect as using FROM and FROM NAMED inside the query string.
Query dataset and the active graph
A lot of confusion around
FROM
and FROM NAMED
stems from the assumption that any SPARQL query runs against a collection of graphs. That seems like a natural thing to assume, after all, the RDF 1.1 dataset is exactly that: a collection of graphs (including the default, aka unnamed, graph). The reality, however, is a little more nuanced: each graph pattern in the query, at any point of time during the execution, matches data in one graph only. That graph is called the active graph for that pattern. It can change during the execution and is defined by two things:- if the pattern is within a
GRAPH
block
FROM
andFROM NAMED
definitions
Understanding what the active graph is for a pattern, such as a BGP, at any point of time, is key to understanding why, for example, the pattern matches no results. The rules are pretty straightforward:
- patterns outside of
GRAPH
blocks are evaluated against the merge of all graphs listed inFROM
. The active graph is the merge of all such graphs, it does not change during execution.
- patterns inside
GRAPH iri {..}
are evaluated against theiri
graph in the data ifiri
is listed inFROM NAMED
. The active graph is that graph in the data or the empty graph if it’s not listed inFROM NAMED
. Again, it does not change.
- patterns inside
GRAPH ?g {..}
are separately evaluated against every graph listed inFROM NAMED
and the results are combined under set union. Every graph listed inFROM NAMED
becomes the active graph for some evaluation of the pattern.
These rules have some simple, yet important, consequences:
- if your query uses
FROM
but your patterns are insideGRAPH
(with?g
or a constant IRI), the patterns will not match any data.
- if your query uses
FROM NAMED
but your patterns are outside ofgraph
blocks, the patterns will not match any data.
Remembering just these two facts by itself will keep you out of trouble most of the time: if you use
GRAPH
use FROM NAMED
, if you don’t use GRAPH
use FROM
(and vice versa).The rest of the differences are nuances. It’s OK to skip over to the section about the use cases for both. But if you’re interested in whether you can always replace your
FROM
by FROM NAMED
, put your patterns inside a GRAPH
blocks, and get the same results, read on.The answer to that question is, as you may suspect, “no”.
FROM vs FROM NAMED difference in query results
So why exactly does the spec offer both
FROM
and FROM NAMED
? Can’t we just have one or the other? At the end of the day what most people want to do is to query a collection of graphs so why do we need two ways to enumerate those graphs? To answer this question, we first examine how the difference between FROM
and FROM NAMED
can manifest in query results.Matching data within graphs vs across graphs
Consider the following small data snippet (in TRiG):
:products {
:p1 a :Product ;
:part-of :o1 .
:o1 a :Order;
:customer :c1 .
}
:customers {
:c1 :name "Jim" .
}
and the query:
SELECT * FROM :products FROM :customers {
?product a :Product ;
:part-of ?order .
?order :customer ?customer .
?customer :name ?name
}
The product data is stored in the
:products
graph and the customer data is stored in the :customers
graph, they’re connected through order nodes. The query selects customer names associated with products.This query’s BGP consisting of 4 triple patterns (TP) has to be evaluated over a single graph to match results even though the data is spread over two distinct named graphs. This is what the
FROM
keywords deliver: the active graph for the BGP is a merge of the :products
graph and the :customers
graph so the patterns match data. One can say that the data is matched across named graph boundaries.Now, let’s see what happens if the query is rewritten to use
FROM NAMED
: SELECT * FROM NAMED :products FROM NAMED :customers {
GRAPH ?g {
?product a :Product ;
:part-of ?order .
?order :customer ?customer .
?customer :name ?name
}
}
First, we need to move the BGP into a
GRAPH
block (remember the simple rules we mentioned earlier). If you don’t do that, any mature query optimiser will immediately realise that the query cannot produce results and won’t even evaluate the BGP. Second, and most importantly, to understand which results this query produces (if any) you need to understand the active graph for the BGP (for the curious, the relevant part of the spec is Section 18.6 Definition: Evaluation of Graph).Informally
GRAPH ?g
means that the query loops over all graph IRIs in FROM NAMED
and substitutes each of them into ?g
, that becomes the active graph for the BGP, the pattern is evaluated, and the results are combined over all of the listed graphs. That is, the key part is that at any point of time the active graph is a single named graph in the data. But neither :products
nor :customers
by itself contains enough data to match the entire BGP, either products are missing or customers. Thus the BGP produces the empty result for both evaluations and so does the union. The query returns no results.Again, one can say that with
FROM NAMED
and GRAPH
patterns match within graphs rather than across graphs. But it’s important to understand what it actually means and this is where the concept of active graph helps.One way of thinking why
GRAPH ?g
matches within graphs is to think what value the graph variable ?g
would have if matching was done across graphs, i.e. half in :products and half in :customers. It’d be very difficult to define it in a meaningful way.Property paths and graph boundaries
Another interesting difference between
FROM
vs FROM NAMED
manifests when property paths are involved. The spec says that property paths do not span multiple graphs in the dataset. There’s a little bit of ambiguity here on what is meant by “dataset” exactly. It doesn’t mean that one cannot match paths in the data across named graph boundaries, as the following example shows: :products {
:p1 a :Product ;
:part-of :o1 ;
:part-of :o2 .
}
:orders {
:o1 a :Order ;
:customer :c1 .
:o2 a :Order ;
:customer :c2 .
}
:customers {
:c1 :name "Jim" .
:c2 :name "Mary" .
}
It’s still the same products and customers dataset except that now there’s a separate graph for orders and a second order for the same product (
:p1
) but a different customer (:c2
). Now, a query to find pairs of customers who buy the same products could look like: SELECT * FROM :products FROM :orders {
?customerX (^:customer/^:part-of/:part-of/:customer)+ ?customerY
FILTER (str(?customerX) < str(?customerY))
}
Pro tip: if you’re interested not just in pairs of customers but actual connections in the data that link them together, i.e. particular orders and products, check out path queries in Stardog.
The dataset for this query is defined by the two
FROM
statements which make a single merged active graph. Since it’s a single active graph for the query, the property path will match :c1
and :c2
even though the path spans both :products
and :orders
graphs in the data.Again, switching the keyword
FROM
to FROM NAMED
changes the execution and the results: SELECT * FROM NAMED :products FROM NAMED :orders {
GRAPH ?g { ?customerX (^:customer/^:part-of/:part-of/:customer)+ ?customerY }
FILTER (str(?customerX) < str(?customerY))
}
Now the
(^:customer/^:part-of/:part-of/:customer)+
pattern is evaluated twice, once against :products
and once against :orders
. Each time it cannot go across the graph boundary so both evaluations return empty results and so does the query.One can often reason about property path evaluations as sequences of BGP evaluations till a fix-point, i.e. when no new nodes can be reached. So it should not be surprising the
FROM
vs FROM NAMED
difference manifests similarly for property paths as for BGPs.Named graphs in the wild: use cases
Let’s now go back to the earlier question whether both
FROM
and FROM NAMED
are needed. We know they can lead to different query results but it’s interesting how their properties relate to actual use cases.When it comes to named graphs in RDF, we often see that the data layout follows one of the following two patterns: either few relatively large graphs where each contains a unique part of the dataset, or numerous small graphs representing similar objects. We call these patterns graph per domain and graph per entity.
Graph per domain
By domain here we mean some part of the graph which is structurally different from other parts of the graph. Popular examples include:
- a table or a collection of tables esp. if the graph has roots in a relational database(s). The products and customers example above follows this pattern: the
:products
and:customers
graphs could easily correspond to product and customer tables.
- a data source, i.e. the named graph identifies where a particular part of the data came from, such as an upstream database, a connector, a data stream, etc. The graph IRI is then convenient to record provenance information.
- a dataset. It’s not uncommon for Stardog customers to maintain different datasets in a single database in different named graphs esp. if they need to be queried together.
Graphs following this pattern are typically linked, for example, in the graph per table case such links would be foreign key relationships. Product nodes whose attributes are stored in the
:products
graph have links to order nodes which can be stored in a different graph. Thus querying across graphs is usually a key requirement and it is directly supported by the FROM
semantics.One reason people tend to partition their dataset onto several graphs is the relative ease of using the SPARQL Graph Store Protocol compared to generic SPARQL Update queries. It offers REST-like API for adding, updating, and deleting graphs which looks friendlier to a slew of users, esp. Web developers, than an expressive query language.
Graph per business entity
The other, substantially different pattern of using named graphs is when each graph keeps together data about a single object, like a product, a customer, or any other business entity. This pattern usually results in millions of small graphs. In contrast to the above use case, these graphs tend to be independent and structurally similar. Continuing the analogy with relational models, such graphs often correspond to a single table row or a small set of linked rows from several tables (i.e. if the relational schema is normalized to a high form).
One reason
GRAPH ?g {...}
blocks become useful in such a scenario (and thus FROM NAMED
) is because it’s often important to figure out which graphs contain objects matching the search criteria. Consider the following simple example: SELECT * FROM NAMED ... {
GRAPH ?g {
?p a :GroceryProduct ;
:expires-on ?expdate .
FILTER(?expdate < "2021-01-06"^^xsd:date)
}
?g :created-on ?date
}
The query finds all grocery products which will soon expire and gets the information on when those products’ data was added to the dataset. The named graph IRIs are pretty useful for recording additional metadata since they can naturally be used as subjects of other triples, e.g.
:created-on
triples.SPARQL 1.1 unfortunately does not provide standard means to specify a large number of graphs as a dataset without enumerating them individually, which causes problems in such cases. Stardog provides special IRIs, like
:context:named
or :context:virtual,
which work like wildcards. A more general pattern-based mechanism would be useful.Restricting BGPs to match within graphs typically does not cause issues with this use case because products tend to be independent and not directly linked. They of course will be linked through other objects, like orders or customers, and it’s not difficult to make queries work under the
GRAPH
semantics: SELECT * FROM NAMED ... {
GRAPH ?product-graph {
?product a :Product ;
:part-of ?order .
}
GRAPH :orders {
?order :customer ?customer .
}
GRAPH :customers {
?customer :name ?name
}
}
Using different graph terms is sufficient to match across graphs.
It’s not uncommon to use the same IRI for the node representing a business entity and the named graph where the triples for that entity are stored.
To sum up
FROM
and FROM NAMED
work a little differently in SPARQL but in most cases it’s sufficient to remember that the former tends to be used when query patterns are outside of GRAPH
blocks and the latter when they are inside. This simple rule alone can help you avoid the frustration of seeing no results for seemingly well-written queries. The key to a deeper understanding of the differences is the concept of active graph for a query pattern and how it may, or may not, change within the query execution.This post only talks about pure RDF 1.1 named graphs but the discussed concepts, like the query dataset or the active graph, are relevant in case your database (like Stardog) is capable of data virtualization. You can learn how the query dataset can encompass virtual graphs and Stardog’s Virtual Transparency feature from our other posts and tutorials.