www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

RDF Data Access and Data Management

Data Representation
SPARQL
RDF Graphs Security
Automated Generation of RDF Views over Relational Data Sources
RDF Insert Methods in Virtuoso
Integration Middleware
Linked Data
Inference Rules & Reasoning
RDF and Geometry
Performance Tuning
General RDF Index Scheme Index Scheme Selection Dump and Reload Graphs Dump RDF View Graph to n3 Erroneous Cost Estimates and Explicit Join Order Loading Using SPARUL DBpedia Benchmark RDF Store Benchmarks
RDF Data Access Providers (Drivers)

14.10. Performance Tuning

For RDF query performance, we have the following possible questions:

14.10.1. General

For running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of system RAM and to stripe storage on all available disks. See NumberOfBuffers and Striping ini parameters.

Also, if running with a large database, setting MaxCheckpointRemap to 1/4th of the database size is recommended. This is in pages, 8K per page.


14.10.2. RDF Index Scheme

Starting with version 6.00.3126 the default RDF index scheme consists of 2 full indices over RDF quads plus 3 partial indices. This index scheme is generally adapted to all kinds of workloads, regardless of whether queries generally specify a graph.

Alternate indexing schemes are possible but will not be generally needed. For upgrading old databases with a different index scheme see the corresponding documentation.

The index scheme consists of the following indices:

This index scheme is created by the following statements:

create table DB.DBA.RDF_QUAD (
  G IRI_ID_8,
  S IRI_ID_8,
  P IRI_ID_8,
  O any,
  primary key (P, S, O, G)
  )
alter index RDF_QUAD on DB.DBA.RDF_QUAD partition (S int (0hexffff00));

create distinct no primary key ref bitmap index RDF_QUAD_SP on RDF_QUAD (S, P) partition (S int (0hexffff00));
create bitmap index RDF_QUAD_POGS on RDF_QUAD (P, O, G, S) partition (O varchar (-1, 0hexffff));
create distinct no primary key ref bitmap index RDF_QUAD_GS on RDF_QUAD (G, S) partition (S int (0hexffff00));
create distinct no primary key ref index RDF_QUAD_OP on RDF_QUAD (O, P) partition (O varchar (-1, 0hexffff));
;

The idea is to favor queries where the predicate is specified in triple patterns. The entire quad can be efficiently accessed when P and either or both S and O are known. This has the advantage of clustering data by the predicate which improves working set: A page read from disk will only have entries pertaining to the same predicate, chances of accessing other entries of the page are thus higher than if the page held values for arbitrary predicates. For less frequent cases where only S is known, as in describe, the distinct P's of the S are found in the SP index. These SP pairs are then used for accessing the PSOG index to get the O and G. For cases where only the G is known, as in dropping a graph, the distinct S's of the G are found in the GS index. The P's of the S are then found in the SP index. After this the whole quad is found in the PSOG index.

The SP, OP and GS indices do not store duplicates. If a S has many values of the P, there is only one entry. Extries are not deleted from SP, OP or GS. This does not lead to erroneous results since a full index, that is either POSG or PSOG is always consulted in order to know if a quad actually exists. For cases of updating data, most often a graph is entirely dropped and a substantially similar graph is inserted in its place. The SP, OP and GS indices get to stay relatively unaffected.

Still over time specially if there are frequent updates and values do not repeat between consecutive states the SP, OP and GS indices will get polluted which may affect performance. Dropping and recreating the index will then remedy the situation.

For cases where this is not practical the index scheme should only have full indices, i.e. each key holds all columns of the primary key of the quad. This will be the case if the distinct no primary key ref options are not specified in the create index statement. In such cases all indices remain in strict sync across deletes.

Many RDF workloads have bulk load and read intensive access patterns with few deletes. The default index scheme is optimized for these. With these situations this scheme offers significant space savings resulting in better working set. Typically this layout takes 60-70% of the space of a layout with 4 full indices.


14.10.3. Index Scheme Selection

If the graph is always given, as one or more FROM or FROM NAMED, and there are no patterns where only graph and predicate are given, then the default indices should be appropriate. If the predicate and graph are given but subject is not, then it is sometimes useful to add

create bitmap index RDF_QUAD_PGOS on DB.DBA.RDF_QUAD (G, P, O, S) partition (O varchar (-1, 0hexffff));
Note:

If the server is pre 5.0.7, leave out the partitioning clause.

Making the PGOS index can help in some cases even if it is not readily apparent from the queries that one is needed. This is so for example if the predicate by itself is selective, i.e. there is a predicate that occurs in only a few triples.

If the graph itself is not given in the queries, then the default index scheme will be unworkable. For this, the appropriate scheme is:

create table RDF_QUAD (G iri_id_8, S iri_id_8, P iri_id_8, O any, primary key (S, P, O, G))
alter index RDF_QUAD on RDF_QUAD partition (S int (0hexffff00));
create bitmap index RDF_QUAD_OPGS on DB.DBA.RDF_QUAD (O, P, G, S) partition (O varchar (-1, 0hexffff));
create bitmap index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P, O, G, S) partition (O varchar (-1, 0hexffff));
create bitmap index RDF_QUAD_GPOS on DB.DBA.RDF_QUAD (G, P, O, S) partition (O varchar (-1, 0hexffff));
Note:

For a pre 5.0.7 server, leave the partition clauses and the alter index statement out.

If there are existing triples and one does not wish to reload them, then the following sequence will convert the data:

log_enable (2);
drop index RDF_QUAD_OGPS;
checkpoint;
create table R2 (G iri_id_8, S iri_id_8, P iri_id_8, O any, primary key (S, P, O, G))
alter index R2 on R2 partition (S int (0hexffff00));

insert into r2 (g, s, p, o) SELECT g, s, p, o from rdf_quad;

drop table RDF_QUAD;
checkpoint;
alter table r2 rename RDF_QUAD;
create bitmap index RDF_QUAD_OPGS on DB.DBA.RDF_QUAD (O, P, G, S) partition (O varchar (-1, 0hexffff));
create bitmap index RDF_QUAD_POGS on RDF_QUAD (P, O, G, S) partition (O varchar (-1, 0hexffff));
create bitmap index RDF_QUAD_GPOS on RDF_QUAD (G, P, O, S) partition (O varchar (-1, 0hexffff));
checkpoint;
log_enable (1);

First drop the OGPS index to make space. Then, in row autocommit mode and without logging, copy the quads into a new primary key layout. Drop the old and rename the new over the old. Make the additional indices. Do a checkpoint after the drops so as to actually free the space also in the checkpointed state. Finish with a checkpoint so as to finalize the changes, since logging was turned off. Even if logging had been on, one would not wish to have to replay the reindexing if the server terminated abnormally. Finally turn logging back on for the session.

Note:

This is all meant to be done with a SQL client like isql and not through a web interface. The web interface has no real session and the log_enables do nothing there.

Other indexing schemes may be tried. We note however that in all cases, one or other of the indices should begin with G. This is because for schema operations it is necessary to read through a graph. If no index begins with G, this becomes a full table scan and is unworkable, leading to an extremely slow server start and making operations like drop graph as good as unusable.

Public web service endpoints are proven to be sources of especially bad queries. While local application develpers can obtain instructions from database administrator and use ISQL access to the database in order to tune execution plans, "external" clients do not know details of configuration and/or lacks appropriate skills. The most common problem is that public endpoints usually get requests that does not mention the required graph, because that queries were initially written for use with triple stores. If the web service provides access to a single graph (or to a short list of graphs) then it is strongly recommended to configure it by adding a row into DB.DBA.SYS_SPARQL_HOST. The idea is that if the client specifies default graph in the request or uses named graphs and group graph patterns then he is probably smarter than average and will provide meaningful queries. If no graph names are specified then the query will benefit from preset graph because this will give the compiler some more indexes to choose from -- indexes taht begin with G.

Sometimes web service endpoint is used to access data of only one application, not all data in the system. In that case one may wish to declare a separate storage that consists of only RDF Views made by that application and define input:storage in appropriate row of DB.DBA.SYS_SPARQL_HOST.


14.10.4. Dump and Reload Graphs

In order to dump all graphs with extension exclude those graphs of a certain type by using a sparql ask query, you may use the following script sequence:
create procedure dump_graphs (in dir varchar := 'dumps', in file_length_limit integer := 1000000000)
{
  declare inx int;
  inx := 1;
  set isolation = 'uncommitted';
  for (select * from (sparql define input:storage "" select distinct ?g { graph ?g { ?s ?p ?o } . filter ( ?g != virtrdf: ) } ) as sub option (loop)) do
    {
      dump_one_graph ("g", sprintf ('%s/graph%06d_', dir, inx), file_length_limit);
      inx := inx + 1;
    }
}
;

create procedure dump_one_graph (in srcgraph varchar, in out_file varchar, in file_length_limit integer := 1000000000)
{
  declare file_name varchar;
  declare env, ses any;
  declare ses_len, max_ses_len, file_len, file_idx integer;
  set isolation = 'uncommitted';
  max_ses_len := 10000000;
  file_len := 0;
  file_idx := 1;
  file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
  string_to_file (file_name || '.graph', srcgraph, -2);
  string_to_file (file_name, sprintf ('# Dump of graph <%s>, as of %s\n', srcgraph, cast (now() as varchar)), -2);
  --env := vector (dict_new (16000), 0, '', '', '', 0, 0);
  env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
  ses := string_output ();
  for (select * from (sparql define input:storage "" select ?s ?p ?o { graph `iri(?:srcgraph)` { ?s ?p ?o } } ) as sub option (loop)) do
    {
      http_ttl_triple (env, "s", "p", "o", ses);
      ses_len := length (ses);
      if (ses_len > max_ses_len)
        {
          file_len := file_len + ses_len;
          if (file_len > file_length_limit)
            {
              http (' .\n', ses);
              string_to_file (file_name, ses, -1);
              file_len := 0;
              file_idx := file_idx + 1;
              file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
              string_to_file (file_name, sprintf ('# Dump of graph <%s>, as of %s (part %d)\n', srcgraph, cast (now() as varchar), file_idx), -2);
              env := vector (dict_new (16000), 0, '', '', '', 0, 0);
            }
          else
            string_to_file (file_name, ses, -1);
          ses := string_output ();
        }
    }
  if (length (ses))
    {
      http (' .\n', ses);
      string_to_file (file_name, ses, -1);
    }
}
;

create procedure load_graphs (in dir varchar := 'dumps/')
{
  declare arr any;
  declare g varchar;

  arr := sys_dirlist (dir, 1);
  log_enable (2, 1);
  foreach (varchar f in arr) do
    {
      if (f like '*.ttl')
	{
	  declare continue handler for sqlstate '*'
	    {
	      log_message (sprintf ('Error in %s', f));
	    };
  	  g := file_to_string (dir || '/' || f || '.graph');
	  DB.DBA.TTLP_MT (file_open (dir || '/' || f), g, g, 255);
	}
    }
  exec ('checkpoint');
}
;

14.10.5. Dump RDF View Graph to n3

The RDF_QM_TREE_DUMP procedure and its associated procedures below are used for dumping one or more RDFView Graphs in a Virtuoso server to a set of turtle ttl dataset files in the specified dump directory. The dump generation is made as fast as possible by grouping mappings by underlying tables so many properties from neighbor database columns can be extracted in one table scan. The size of the generated files is limited to 5MB. The dump process creates internal stored procedures; their texts are saved in file .dump_procedures.sql in the directory of dump files for debugging purposes.

Note that the dump directory must be included in the DirsAllowed parameter of the Virtuoso configuration file (e.g., virtuoso.ini), or the server will not be allowed to create nor access the dataset file(s).

The Virtuoso RDF bulk loader scripts can then be used to load the dumped datasets for the RDFView graphs directly into a Virtuoso RDF QUAD store.

14.10.5.1. Parameters


14.10.5.2. Procedure Code

CREATE PROCEDURE DB.DBA.RDF_QM_TREE_DUMP 
  ( in  dest_dir  VARCHAR, 
    in  graph_iri VARCHAR := NULL, 
    in  storage   VARCHAR := NULL, 
    in  root      VARCHAR := NULL
  )
{
 DECLARE all_qms, 
         grouped_qmvs, 
         launcher_text  ANY;
 DECLARE grp_ctr, 
         qm_ctr, 
         qm_count       INTEGER;
 DECLARE sql_file, 
         launcher_name  VARCHAR;
 IF (NOT (dest_dir LIKE '%/'))
   dest_dir := dest_dir || '/';
 sql_file := dest_dir || '.dump_procedures.sql';
 IF (storage IS NULL)
   storage := 'http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage';
 string_to_file (
   sql_file, 
   '-- This file contains procedure created by DB.DBA.RDF_QM_TREE_DUMP() for storage ' 
      || COALESCE (storage, 'NULL') 
      || ' and root quad map ' 
      || COALESCE (root, 'NULL') 
      || '\n\n', 
   -2);
 all_qms := dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, root), 2);
 grouped_qmvs := DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (all_qms);
 launcher_name := 'RDF_QM_TREE_DUMP_BATCH_' || md5 (serialize (graph_iri) || storage || serialize (root));
 launcher_text := string_output ();
 http ('CREATE PROCEDURE DB.DBA."' || launcher_name || '" (in dest_dir VARCHAR)\n{\n', launcher_text);
 FOR (grp_ctr := length (grouped_qmvs); grp_ctr > 0; grp_ctr := grp_ctr-2)
   {
     DECLARE tables, qms, proc_text ANY;
     DECLARE group_key, proc_name, dump_prefix, cmt VARCHAR;
     tables := grouped_qmvs [grp_ctr-2];
     qms := grouped_qmvs [grp_ctr-1];
     qm_count := length (qms);
     group_key := md5 (serialize (graph_iri) || storage || serialize (root) || serialize (tables));
     proc_name := 'RDF_QM_TREE_DUMP_GRP_' || group_key;
     proc_text := string_output ();
     cmt := sprintf ('%d quad maps on join of', qm_count);
     FOREACH (VARCHAR t IN tables) DO cmt := cmt || ' ' || t;
     http ('  --  ' || cmt || '\n', launcher_text);
     http ('  DB.DBA."' || proc_name || '" (dest_dir);\n', launcher_text);
     http ('CREATE PROCEDURE DB.DBA."' || proc_name || '" (in dest_dir VARCHAR)\n', proc_text);
     http ('{\n', proc_text);
     http ('  -- ' || cmt || '\n', proc_text);
     http ('  DECLARE ses, env ANY;\n', proc_text);
     http ('  DECLARE file_ctr, cmt_len INTEGER;\n', proc_text);
     http ('  file_ctr := 0;\n', proc_text);
     http ('  dbg_obj_princ (' || WS.WS.STR_SQL_APOS (cmt) || ', '', file '', file_ctr);\n', proc_text);
     http ('  ses := string_output ();\n', proc_text);
     http ('  http (' || WS.WS.STR_SQL_APOS ('#' || cmt || '\n') || ', ses);\n', proc_text);
     http ('  env := VECTOR (dict_new (16000), 0, '''', '''', '''', 0, 0, 0, 0);\n', proc_text);
     http ('  cmt_len := LENGTH (ses);\n', proc_text);
     http ('  FOR (SPARQL DEFINE input:storage <' || storage || '>\n', proc_text);
     http ('    SELECT ?s1, ?p1, ?o1\n', proc_text);
     IF (graph_iri IS NOT NULL)
       {
         http ('    WHERE { GRAPH <', proc_text); http_escape (graph_iri, 12, proc_text, 1, 1); http ('> {\n', proc_text);
       }
     ELSE
       http ('    WHERE { GRAPH ?g1 {\n', proc_text);
     FOR (qm_ctr := 0; qm_ctr < qm_count; qm_ctr := qm_ctr + 1)
       {
         IF (qm_ctr > 0) http ('            UNION\n', proc_text);
         http ('            { quad map <' || qms[qm_ctr] || '> { ?s1 ?p1 ?o1 } }\n', proc_text);
       }
     http ('          } } ) DO {\n', proc_text);
     http ('      http_ttl_triple (env, "s1", "p1", "o1", ses);\n', proc_text);
     http ('      IF (LENGTH (ses) > 5000000)\n', proc_text);
     http ('        {\n', proc_text);
     http ('          http ('' .\\n'', ses);\n', proc_text);
     http ('          string_to_file (sprintf (''%s' || group_key || '_%05d.ttl'', dest_dir, file_ctr), ses, -2);\n', proc_text);
     http ('          file_ctr := file_ctr + 1;\n', proc_text);
     http ('          dbg_obj_princ (' || WS.WS.STR_SQL_APOS (cmt) || ', '', file '', file_ctr);\n', proc_text);
     http ('          ses := string_output ();\n', proc_text);
     http ('          http (' || WS.WS.STR_SQL_APOS ('#' || cmt || '\n') || ', ses);\n', proc_text);
     http ('          env := VECTOR (dict_new (16000), 0, '''', '''', '''', 0, 0, 0, 0);\n', proc_text);
     http ('        }\n', proc_text);
     http ('    }\n', proc_text);
     http ('  IF (LENGTH (ses) > cmt_len)\n', proc_text);
     http ('    {\n', proc_text);
     http ('      http ('' .\\n'', ses);\n', proc_text);
     http ('      string_to_file (sprintf (''%s' || group_key || '_%05d.ttl'', dest_dir, file_ctr), ses, -2);\n', proc_text);
     http ('    }\n', proc_text);
     http ('}\n', proc_text);
     proc_text := string_output_string (proc_text);
     string_to_file (sql_file, proc_text || ';\n\n' , -1);
     EXEC (proc_text);
   }
 http ('}\n', launcher_text);
 launcher_text := string_output_string (launcher_text);
 string_to_file (sql_file, launcher_text || ';\n\n' , -1);
 EXEC (launcher_text);
 CALL ('DB.DBA.' || launcher_name)(dest_dir);
}
;

CREATE FUNCTION DB.DBA.RDF_QM_CONTENT_OF_QM_TREE 
  ( in  graph_iri  VARCHAR := NULL,
    in  storage    VARCHAR := NULL, 
    in  root       VARCHAR := NULL, 
    in  dict       ANY := NULL
  ) returns ANY
{
 DECLARE res, subqms any;
 DECLARE graphiri varchar;
 graphiri := DB.DBA.JSO_SYS_GRAPH();
 IF (storage IS NULL)
   storage := 'http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage';
 DB.DBA.RDF_QM_ASSERT_STORAGE_FLAG (storage, 0);
 IF (dict IS NULL)
   dict := dict_new ();
 IF (root IS NULL)
   {
     subqms := ((SELECT DB.DBA.VECTOR_AGG (sub."qmiri")
         FROM (
           SPARQL DEFINE input:storage ""
           SELECT DISTINCT (str(?qm)) AS ?qmiri
           WHERE { GRAPH `iri(?:graphiri)` {
                     { `iri(?:storage)` virtrdf:qsUserMaps ?lst .
                       ?lst ?p ?qm .
                       FILTER (0 = bif:strstr (str(?p), str(rdf:_)))
                     } UNION {
                       `iri(?:storage)` virtrdf:qsDefaultMap ?qm .
                     } } } ) AS sub ) );
     FOREACH (varchar qmid IN subqms) DO
       DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, qmid, dict);
     RETURN dict;
   }
 DB.DBA.RDF_QM_ASSERT_JSO_TYPE (root, 'http://www.openlinksw.com/schemas/virtrdf#QuadMap');
 IF (graph_iri IS NOT NULL AND
   EXISTS ((SPARQL DEFINE input:storage ""
       SELECT (1) WHERE {
           GRAPH `iri(?:graphiri)` {
               `iri(?:root)` virtrdf:qmGraphRange-rvrFixedValue ?g .
               FILTER (str (?g) != str(?:graph_iri))
             } } ) ) )
   RETURN dict;
 IF (NOT EXISTS ((SPARQL DEFINE input:storage ""
       SELECT (1) WHERE {
           GRAPH `iri(?:graphiri)` {
               `iri(?:root)` virtrdf:qmMatchingFlags virtrdf:SPART_QM_EMPTY .
             } } ) ) )
   dict_put (dict, root, 1);
 subqms := ((SELECT DB.DBA.VECTOR_AGG (sub."qmiri")
     FROM (
       SPARQL DEFINE input:storage ""
       SELECT DISTINCT (str(?qm)) as ?qmiri
       WHERE { GRAPH `iri(?:graphiri)` {
   		`iri(?:root)` virtrdf:qmUserSubMaps ?lst .
               ?lst ?p ?qm .
               FILTER (0 = bif:strstr (str(?p), str(rdf:_)))
             } } ) AS sub ) );
 FOREACH (VARCHAR qmid IN subqms) DO
   DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, qmid, dict);
 RETURN dict;
}
;

CREATE FUNCTION DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (in qms ANY) returns ANY
{
 DECLARE res ANY;
 DECLARE ctr INTEGER;
 DECLARE graphiri VARCHAR;
 graphiri := DB.DBA.JSO_SYS_GRAPH();
 res := dict_new (LENGTH (qms) / 20);
 FOREACH (VARCHAR qmiri IN qms) DO
   {
     DECLARE tbls, acc ANY;
     tbls := ((SELECT DB.DBA.VECTOR_AGG (sub."tbl")
         FROM (SELECT subsub."tbl"
           FROM (
             SPARQL DEFINE input:storage ""
             SELECT DISTINCT ?tbl
             WHERE { GRAPH `iri(?:graphiri)` {
                       { `iri(?:qmiri)` virtrdf:qmTableName ?tbl .
                       } UNION {
                         `iri(?:qmiri)` virtrdf:qmATables ?atbls .
                         ?atbls ?p ?atbl .
                         ?atbl virtrdf:qmvaTableName ?tbl
                       } UNION {
                         `iri(?:qmiri)` ?fldmap ?qmv .
                         ?qmv virtrdf:qmvATables ?atbls .
                         ?atbls ?p ?atbl .
                         ?atbl virtrdf:qmvaTableName ?tbl .
                       } } } ) subsub
           ORDER BY 1 ) AS sub ) );
     acc := dict_get (res, tbls);
     IF (acc IS NULL)
       vectorbld_init (acc);
     vectorbld_acc (acc, qmiri);
     dict_put (res, tbls, acc);
   }
 res := dict_to_vector (res, 2);
 FOR (ctr := LENGTH (res); ctr > 0; ctr := ctr-2)
   {
     DECLARE acc ANY;
     acc := aref_set_0 (res, ctr-1);
     vectorbld_final (acc);
     aset_zap_arg (res, ctr-1, acc);
   }
 RETURN res;
}
;

--test dbg_obj_princ (DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (null), 2)));
--test dbg_obj_princ (dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (null), 2));
--test DB.DBA.RDF_QM_TREE_DUMP ('dump/demo', null, null, null);
--test DB.DBA.RDF_QM_TREE_DUMP ('dump/tpch', 'http://localhost:8600/tpch', null, null);	


14.10.6. Erroneous Cost Estimates and Explicit Join Order

The selectivity of triple patterns is determined at query compile time from sampling the data. It is possible that misleading data is produced. To see if the cardinality guesses are generally valid, look at the query plan with explain ().

Below is a sample from the LUBM qualification data set in the Virtuoso distribution. After running make test in binsrc/test/lubm, there is a loaded database with the data. Start a server in the same directory to see the data.

SQL> explain ('SPARQL prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT *
FROM <lubm>
WHERE { ?x rdf:type ub:GraduateStudent }');

REPORT
VARCHAR
_______________________________________________________________________________

{ 
 
Precode:
      0: $25 "callret" := Call __BOX_FLAGS_TWEAK (<constant (lubm)>, <constant (1)>)
      5: $26 "lubm" := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($25 "callret")
      12: $27 "callret" := Call __BOX_FLAGS_TWEAK (<constant (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)>, <constant (1)>)
      17: $28 "-ns#type" := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($27 "callret")
      24: $29 "callret" := Call __BOX_FLAGS_TWEAK (<constant (http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent)>, <constant (1)>)
      29: $30 "owl#GraduateStudent" := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($29 "callret")
      36: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD_OGPS    1.9e+03 rows
Key RDF_QUAD_OGPS  ASC ($32 "s-3-1-t0.S")
<col=415 O = $30 "owl#GraduateStudent"> , <col=412 G = $26 "lubm"> , <col=414 P = $28 "-ns#type">
row specs: <col=415 O LIKE <constant (T)>>
 
Current of: <$34 "<DB.DBA.RDF_QUAD s-3-1-t0>" spec 5>
 
After code:
      0: $35 "x" := Call ID_TO_IRI ($32 "s-3-1-t0.S")
      5: BReturn 0
Select ($35 "x", <$34 "<DB.DBA.RDF_QUAD s-3-1-t0>" spec 5>)
}

22 Rows. -- 1 msec.

This finds the graduate student instances in the lubm graph. First the query converts the IRI literals to id's. Then, using a match of OG on OGPS it finds the IRI's of the graduate students. Then it converts the IRI id to return to the string form.

The cardinality estimate of 1.9e+03 rows is on the FROM line.

Doing an explain on the queries will show the cardinality estimates. To drill down further, one can split the query into smaller chunks and see the estimates for these, up to doing it at the triple pattern level. To indicate a variable that is bound but whose value is not a literal known at compile time, one can use the parameter marker ??.

SQL>explain ('SPARQL define sql:table-option "order"  prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT *
FROM <lubm>
WHERE { ?x rdf:type ?? }');

This will not know the type but will know that a type will be provided. So instead of guessing 1900 matches, this will guess a smaller number, which is obviously less precise. Thus literals are generally better.

In some cases, generally to work around an optimization error, one can specify an explicit join order. This is done with the sql:select-option "order" clause in the SPARQL query prefix.

SQL>SELECT SPARQL_to_sql_text (' define sql:select-option "order" prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT *
FROM <lubm>
WHERE
  {
    ?x rdf:type ub:GraduateStudent .
    ?x ub:takesCourse <http://www.Department0.University0.edu/GraduateCourse0>
  }');

shows the SQL text with the order option at the end.

If an estimate is radically wrong then this should be reported as a bug.

If there is a FROM with a KEY on the next line and no column specs then this is a full table scan. The more columns are specified the less rows will be passed to the next operation in the chain. In the example above, there are three columns whose values are known before reading the table and these columns are leading columns of the index in use so column specs are

<col=415 O = $30 "owl#GraduateStudent"> , <col=412 G = $26 "lubm"> , <col=414 P = $28 "-ns#type">
Note:

A KEY with only a row spec is a full table scan with the row spec applied as a filter. This is usually not good unless this is specifically intended.

If queries are compiled to make full table scans when this is not specifically intended, this should be reported as a bug. The explain output and the query text should be included in the report.

An explicit join order is specified by the define sql:select-option "order" clause in the SPARQL query prefix: Consider:

SQL>explain ('SPARQL define sql:select-option "order, loop" prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT *
FROM <lubm>
WHERE
  {
    ?x ub:takesCourse ?c .
    ?x rdf:type ub:GraduateStudent
  }');

One will see in the output that the first table access is to retrieve all in the lubm graph which take some course and then later to check if this is a graduate student. This is obviously not the preferred order but the sql:select-option "order" forces the optimizer to join from left to right.

It is very easy to end up with completely unworkable query plans in this manner but if the optimizer really is in error, then this is the only way of overriding its preferences. The effect of sql:select-option is pervasive, extending inside unions, optionals, subqueries etc within the statement.

We note that if, in the above query, both the course taken by the student and the type of the student are given, the query compilation will be, at least for all non-cluster cases, an index intersection. This is not overridden by the sql:select-option clause since an index intersection is always a safe guess, regardless of the correctness of the cardinality guesses of the patterns involved.


14.10.7. Loading

There are many functions for loading RDF text, in RDF/XML and Turtle.

For loading RDF/XML, the best way is to split the data to be loaded into multiple streams and load these in parallel using RDF_LOAD_RDFXML (). To avoid running out of rollback space for large files and in order to have multiple concurrent loads not interfere with each other, the row autocommit mode should be enabled.

For example,

log_enable (2);
-- switch row-by-row autocommit on and logging off for this session
DB.DBA.RDF_LOAD_RDFXML (file_to_string_output ('file.xml'), 'base_uri', 'target_graph');
-- more files here ...
checkpoint;

Loading a file with text like the above with isql will load the data. Since the transaction logging is off, make a manual checkpoint at the end to ensure that data is persisted upon server restart since there is no roll forward log.

If large amounts of data are to be loaded, run multiple such streams in parallel. One may have for example 6 streams for 4 cores. This means that if up to two threads wait for disk, there is still work for all cores.

Having substantially more threads than processors or disks is not particularly useful.

There exist multithreaded load functions which will load one file on multiple threads. Experience shows that loading multiple files on one thread per file is better.

For loading Turtle, some platforms may have a non-reentrant Turtle parser. This means that only one load may run at a time. One can try this by calling ttlp () from two sessions at the same time. If these do not execute concurrently, then the best way may be to try ttlp_mt and see if this runs faster than a single threaded ttlp call.

14.10.7.1. Loading LOD RDF data

To load the rdf data to LOD instance, perform the following steps:


14.10.7.2. Loading UniProt RDF data

To load the uniprot data, create a function for example such as:

create function DB.DBA.UNIPROT_LOAD (in log_mode integer := 1)
{
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename1'),'http://base_uri_1', 'destination_graph_1', log_mode, 3);
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename2'),'http://base_uri_2', 'destination_graph_2', log_mode, 3);
  ...
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename9'),'http://base_uri_9', 'destination_graph_9', log_mode, 3);
}

If you are starting from blank database and you can drop it and re-create in case of error signaled, use it this way:

checkpoint;
checkpoint_interval(6000);
DB.DBA.UNIPROT_LOAD (0),
checkpoint;
checkpoint_interval(60);

If the database contains important data already and there's no way to stop it and backup before the load then use:

checkpoint;
checkpoint_interval(6000);
DB.DBA.UNIPROT_LOAD (),
checkpoint;
checkpoint_interval(60);

Note that the 'number of threads' parameter of DB.DBA.RDF_LOAD_RDFXML() mentions threads used to process data from file, an extra thread will read the text and parse it, so for 4 CPU cores there's no need in parameter value greater than 3. Three processing threads per one parsing tread is usually good ratio because parsing is usually three times faster than the rest of loading so CPU loading is well balanced. If for example you are using 2 x Quad Xeon, then you can choose between 8 single-threaded parsers or 2 parsers with 3 processing threads each. With 4 cores you may simply load file after file with 3 processing threads. The most important performance tuning is to set the [Parameters] section of virtuoso configuration file:

NumberOfBuffers = 1000000
MaxDirtyBuffers = 800000
MaxCheckpointRemap = 1000000
DefaultIsolation = 2

Note: these numbers are reasonable for 16 GB RAM Linux box. Usually when there are no such massive operations as loading huge database, you can set up the values as:

NumberOfBuffers = 1500000
MaxDirtyBuffers = 1200000
MaxCheckpointRemap = 1500000
DefaultIsolation = 2

Tip: Thus after loading all data you may wish to shutdown, tweak and start server again. If you have ext2fs or ext3fs filesystem, then it's better to have enough free space on disk not to make it more than 80% full. When it's almost full it may allocate database file badly, resulting in measurable loss of disk access speed. That is not Virtuoso-specific fact, but a common hint for all database-like applications with random access to big files.

Here is an example of using awk file for splitting big file smaller ones:

BEGIN {
	file_part=1000
	e_line = "</rdf:RDF>"
        cur=0
        cur_o=0
	file=0
	part=file_part
      }
	{
	    res_file_i="res/"FILENAME
	    line=$0
	    s=$1
	    res_file=res_file_i"_"file".rdf"

	    if (index (s, "</rdf:Description>") == 1)
	    {
		cur=cur+1
		part=part-1
	    }

	    if (part > 0)
	    {
	    	print line >> res_file
	    }

	    if (part == 0)
	    {
#		print "===================== " cur
	    	print line >> res_file
		print e_line >> res_file
		close (res_file)
		file=file+1
		part=file_part
	    	res_file=res_file_i"_"file".rdf"
		system ("cp beg.txt " res_file)
	    }
        }
END { }

14.10.7.3. Loading DBPedia RDF data

You can use the following script as an example for loading DBPedia RDF data in Virtuoso:

#!/bin/sh

PORT=$1
USER=$2
PASS=$3
file=$4
g=$5
LOGF=`basename $0`.log

if [ -z "$PORT" -o -z "$USER" -o -z "$PASS" -o -z "$file" -o -z "$g" ]
then
  echo "Usage: `basename $0` [DSN] [user] [password] [ttl-file] [graph-iri]"
  exit
fi

if [ ! -f "$file" -a ! -d "$file" ]
then
    echo "$file does not exists"
    exit 1
fi

mkdir READY 2>/dev/null
rm -f $LOGF $LOGF.*

echo "Starting..."
echo "Logging into: $LOGF"

DOSQL ()
{
    isql $PORT $USER $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec="$1" > $LOGF
}

LOAD_FILE ()
{
    f=$1
    g=$2
    echo "Loading $f (`cat $f | wc -l` lines) `date \"+%H:%M:%S\"`" | tee -a $LOG

    DOSQL "ttlp_mt (file_to_string_output ('$f'), '', '$g', 17); checkpoint;" > $LOGF

    if [ $? != 0 ]
    then
	echo "An error occurred, please check $LOGF"
	exit 1
    fi

    line_no=`grep Error $LOGF | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
    newf=$f.part
    inx=1
    while [ ! -z "$line_no" ]
    do
	cat $f |  awk "BEGIN { i = 1 } { if (i==$line_no) { print \$0; exit; } i = i + 1 }"  >> bad.nt
	line_no=`expr $line_no + 1`
	echo "Retrying from line $line_no"
	echo "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> ." > tmp.nt
	cat $f |  awk "BEGIN { i = 1 } { if (i>=$line_no) print \$0; i = i + 1 }"  >> tmp.nt
	mv tmp.nt $newf
	f=$newf
	mv $LOGF $LOGF.$inx
	DOSQL "ttlp_mt (file_to_string_output ('$f'), '', '$g', 17); checkpoint;" > $LOGF

	if [ $? != 0 ]
    then
	    echo "An error occurred, please check $LOGF"
	    exit 1
    fi
	line_no=`grep Error $LOGF | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
	inx=`expr $inx + 1`
    done
    rm -f $newf 2>/dev/null
    echo "Loaded.  "
}

echo "======================================="
echo "Loading started."
echo "======================================="

if [ -f "$file" ]
then
    LOAD_FILE $file $g
    mv $file READY 2>> /dev/null
elif [ -d "$file" ]
then
    for ff in `find $file -name '*.nt'`
    do
	LOAD_FILE $ff $g
	mv $ff READY 2>> /dev/null
    done
else
   echo "The input is not file or directory"
fi
echo "======================================="
echo "Final checkpoint."
DOSQL "checkpoint;" > temp.res
echo "======================================="
echo "Check bad.nt file for skipped triples."
echo "======================================="

exit 0


14.10.7.4. Loading Bio2RDF data

The shell script below was used to import files in n3 notation into OpenLink Virtuoso RDF storage.

When an syntax error it will cut content from next line and will retry. This was used on ubuntu linux to import bio2rdf and freebase dumps.

Note it uses gawk, so it must be available on system where is tried. Also for recovery additional disk space is needed at max the size of original file.

#!/bin/bash

PASS=$1
f=$2
g=$3

# Usage
if [ -z "$PASS" -o -z "$f" -o -z "$g" ]
then
  echo "Usage: $0 [password] [ttl-file] [graph-iri]"
  exit
fi

if [ ! -f "$f" ]
then
    echo "$f does not exists"
    exit
fi

# Your port here
PORT=1111  #`inifile -f dbpedia.ini -s Parameters -k ServerPort`
if test -z "$PORT"
then
    echo "Cannot find INI and inifile command"
    exit
fi

# Initial run
isql $PORT dba $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec="ttlp_mt (file_to_string_output ('$f'), '', '$g'); checkpoint;" > $0.log

# If disconnect etc.
if [ $? != 0 ]
then
    echo "An error occurred, please check $0.log"
    exit
fi

# Check for error
line_no=`grep Error $0.log | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
newf=$f.part
inx=1

# Error recovery
while [ ! -z "$line_no" ]
do
    cat $f |  awk "BEGIN { i = 0 } { if (i==$line_no) { print \$0; exit; } i = i + 1 }"  >> bad.nt
    line_no=`expr $line_no + 1`
    echo "Retrying from line $line_no"
    cat $f |  awk "BEGIN { i = 0 } { if (i>=$line_no) print \$0; i = i + 1 }"  > tmp.nt
    mv tmp.nt $newf
    f=$newf
    mv $0.log $0.log.$inx
    # Run the recovered part
    isql $PORT dba $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec="ttlp_mt (file_to_string_output ('$f'), '', '$g'); checkpoint;" > $0.log

    if [ $? != 0 ]
    then
	echo "An error occurred, please check $0.log"
	exit
    fi
   line_no=`grep Error $0.log | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
   inx=`expr $inx + 1`
done


14.10.8. Using SPARUL

Since SPARUL updates are generally not ment to be transactional, it is best to run these in log_enable (2) mode, which commits every operation as it is done. This prevents one from running out of rollback space. Also for bulk updates, transaction logging can be turned off. If so, one should do a manual checkpoint after the operation to ensure persistence across server restart since there is no roll forward log.

To have a roll forward log and row by row autocommit, one may use log_enable (3). This will write constantly into the log which takes extra time. Having no logging and doing a checkpoint when the whole work is finished is faster.

Many SPARUL operations can be run in parallel in this way. If they are independent with respect to their input and output, they can run in parallel and row by row autocommit will ensure they do not end up waiting for each others' locks.


14.10.9. DBpedia Benchmark

We ran the DBpedia benchmark queries again with different configurations of Virtuoso. Comparing numbers given by different parties is a constant problem. In the case reported here, we loaded the full DBpedia 3, all languages, with about 198M triples, onto Virtuoso v5 and Virtuoso Cluster v6, all on the same 4 core 2GHz Xeon with 8G RAM. All databases were striped on 6 disks. The Cluster configuration was with 4 processes in the same box. We ran the queries in two variants:

The times below are for the sequence of 5 queries. As there is a query in the set that specifies no condition on S or O and only P, thus cannot be done with the default indices With Virtuoso v5. With Virtuoso Cluster v6 it can, because v6 is more space efficient. So we added the index:

create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s);
Virtuoso v5 with gspo, ogps, pogs Virtuoso Cluster v6 with gspo, ogps Virtuoso Cluster v6 with gspo, ogps, pogs
cold 210 s 136 s 33.4 s
warm 0.600 s 4.01 s 0.628 s

Now let us do it without a graph being specified. Note that alter index is valid for v6 or higher. For all platforms, we drop any existing indices, and:

create table r2 (g iri_id_8, s, iri_id_8, p iri_id_8, o any, primary key (s, p, o, g))
alter index R2 on R2 partition (s int (0hexffff00));

log_enable (2);
insert into r2 (g, s, p, o) SELECT g, s, p, o from rdf_quad;

drop table rdf_quad;
alter table r2 rename RDF_QUAD;
create bitmap index rdf_quad_opgs on rdf_quad (o, p, g, s) partition (o varchar (-1, 0hexffff));
create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s) partition (o varchar (-1, 0hexffff));
create bitmap index rdf_quad_gpos on rdf_quad (g, p, o, s) partition (o varchar (-1, 0hexffff));

The code is identical for v5 and v6, except that with v5 we use iri_id (32 bit) for the type, not iri_id_8 (64 bit). We note that we run out of IDs with v5 around a few billion triples, so with v6 we have double the ID length and still manage to be vastly more space efficient.

With the above 4 indices, we can query the data pretty much in any combination without hitting a full scan of any index. We note that all indices that do not begin with s end with s as a bitmap. This takes about 60% of the space of a non-bitmap index for data such as DBpedia.

If you intend to do completely arbitrary RDF queries in Virtuoso, then chances are you are best off with the above index scheme.

Virtuoso v5 with gspo, ogps, pogs Virtuoso Cluster v6 with gspo, ogps, pogs
warm 0.595 s 0.617 s

The cold times were about the same as above, so not reproduced.

It is in the SPARQL spirit to specify a graph and for pretty much any application, there are entirely sensible ways of keeping the data in graphs and specifying which ones are concerned by queries. This is why Virtuoso is set up for this by default.

On the other hand, for the open web scenario, dealing with an unknown large number of graphs, enumerating graphs is not possible and questions like which graph of which source asserts x become relevant. We have two distinct use cases which warrant different setups of the database, simple as that.

The latter use case is not really within the SPARQL spec, so implementations may or may not support this.

Once the indices are right, there is no difference between specifying a graph and not specifying a graph with the queries considered. With more complex queries, specifying a graph or set of graphs does allow some optimizations that cannot be done with no graph specified. For example, bitmap intersections are possible only when all leading key parts are given.

The best warm cache time is with v5; the five queries run under 600 ms after the first go. This is noted to show that all-in-memory with a single thread of execution is hard to beat.

Cluster v6 performs the same queries in 623 ms. What is gained in parallelism is lost in latency if all operations complete in microseconds. On the other hand, Cluster v6 leaves v5 in the dust in any situation that has less than 100% hit rate. This is due to actual benefit from parallelism if operations take longer than a few microseconds, such as in the case of disk reads. Cluster v6 has substantially better data layout on disk, as well as fewer pages to load for the same content.

This makes it possible to run the queries without the pogs index on Cluster v6 even when v5 takes prohibitively long.

The purpose is to have a lot of RAM and space-efficient data representation.

For reference, the query texts specifying the graph are below. To run without specifying the graph, just drop the FROM <http://dbpedia.org> from each query. The returned row counts are indicated below each query's text.

SQL>SPARQL
SELECT ?p ?o
FROM <http://dbpedia.org>
WHERE
  {
    <http://dbpedia.org/resource/Metropolitan_Museum_of_Art> ?p ?o .
  };

p                                                                                 o
VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://umbel.org/umbel/ac/Artifact
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/MuseumsInNewYorkCity
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/ArtMuseumsAndGalleriesInTheUnitedStates
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/Museum103800563
..
-- 335 rows

SQL>SPARQL
PREFIX p: <http://dbpedia.org/property/>
SELECT ?film1 ?actor1 ?film2 ?actor2
FROM <http://dbpedia.org>
WHERE
  {
  ?film1 p:starring <http://dbpedia.org/resource/Kevin_Bacon> .
  ?film1 p:starring ?actor1 .
  ?film2 p:starring ?actor1 .
    ?film2 p:starring ?actor2 .
};

film1                                       actor1                                    film2                                        ctor2
VARCHAR                                     VARCHAR                                   VARCHAR                                      ARCHAR
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Kevin_Bacon
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Meryl_Streep
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Joseph_Mazzello
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/David_Strathairn
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/John_C._Reilly
...
--  23910 rows

SQL>SPARQL
PREFIX p: <http://dbpedia.org/property/>
SELECT ?artist ?artwork ?museum ?director
FROM <http://dbpedia.org>
WHERE
  {
  ?artwork p:artist ?artist .
  ?artwork p:museum ?museum .
    ?museum p:director ?director
  };

artist                                          artwork                                              museum                                                                            director
VARCHAR                                         VARCHAR                                              VARCHAR                                                                           VARCHAR
_______________________________________________

http://dbpedia.org/resource/Paul_C%C3%A9zanne   http://dbpedia.org/resource/The_Basket_of_Apples     http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Paul_Signac         http://dbpedia.org/resource/Neo-impressionism        http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Georges_Seurat      http://dbpedia.org/resource/Neo-impressionism        http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Edward_Hopper       http://dbpedia.org/resource/Nighthawks               http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Mary_Cassatt        http://dbpedia.org/resource/The_Child%27s_Bath       http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
..
-- 303 rows

SQL>SPARQL
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?s ?homepage
FROM <http://dbpedia.org>
WHERE
  {
   <http://dbpedia.org/resource/Berlin> geo:lat ?berlinLat .
   <http://dbpedia.org/resource/Berlin> geo:long ?berlinLong .
   ?s geo:lat ?lat .
   ?s geo:long ?long .
   ?s foaf:homepage ?homepage .
   FILTER (
     ?lat        <=     ?berlinLat + 0.03190235436 &&
     ?long       >=     ?berlinLong - 0.08679199218 &&
     ?lat        >=     ?berlinLat - 0.03190235436 &&
     ?long       <=     ?berlinLong + 0.08679199218) };

s                                                                                 homepage
VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/Berlin_University_of_the_Arts                         http://www.udk-berlin.de/
http://dbpedia.org/resource/Berlin_University_of_the_Arts                         http://www.udk-berlin.de/
http://dbpedia.org/resource/Berlin_Zoological_Garden                              http://www.zoo-berlin.de/en.html
http://dbpedia.org/resource/Federal_Ministry_of_the_Interior_%28Germany%29        http://www.bmi.bund.de
http://dbpedia.org/resource/Neues_Schauspielhaus                                  http://www.goya-berlin.com/
http://dbpedia.org/resource/Bauhaus_Archive                                       http://www.bauhaus.de/english/index.htm
http://dbpedia.org/resource/Canisius-Kolleg_Berlin                                http://www.canisius-kolleg.de
http://dbpedia.org/resource/Franz%C3%B6sisches_Gymnasium_Berlin                   http://www.fg-berlin.cidsnet.de
..
-- 48 rows

SQL>SPARQL
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX p: <http://dbpedia.org/property/>
SELECT ?s ?a ?homepage
FROM <http://dbpedia.org>
WHERE
  {
   <http://dbpedia.org/resource/New_York_City> geo:lat ?nyLat .
   <http://dbpedia.org/resource/New_York_City> geo:long ?nyLong .
   ?s geo:lat ?lat .
   ?s geo:long ?long .
   ?s p:architect ?a .
   ?a foaf:homepage ?homepage .
   FILTER (
     ?lat        <=     ?nyLat + 0.3190235436 &&
     ?long       >=     ?nyLong - 0.8679199218 &&
     ?lat        >=     ?nyLat - 0.3190235436 &&
     ?long       <=     ?nyLong + 0.8679199218) };
s                                                                                 a               homepage
VARCHAR                                                                           VARCHAR              VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/GE_Building                                           http://dbpedia.org/resource/Associated_Architects              http://www.associated-architects.co.uk
http://dbpedia.org/resource/Giants_Stadium                                        http://dbpedia.org/resource/HNTB              http://www.hntb.com/
http://dbpedia.org/resource/Fort_Tryon_Park_and_the_Cloisters                     http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Central_Park                                          http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Prospect_Park_%28Brooklyn%29                          http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Meadowlands_Stadium                                   http://dbpedia.org/resource/360_Architecture              http://oakland.athletics.mlb.com/oak/ballpark/new/faq.jsp
http://dbpedia.org/resource/Citi_Field                                            http://dbpedia.org/resource/HOK_Sport_Venue_Event              http://www.hoksve.com/
http://dbpedia.org/resource/Citigroup_Center                                      http://dbpedia.org/resource/Hugh_Stubbins_Jr.              http://www.klingstubbins.com
http://dbpedia.org/resource/150_Greenwich_Street                                  http://dbpedia.org/resource/Fumihiko_Maki              http://www.pritzkerprize.com/maki2.htm
http://dbpedia.org/resource/Freedom_Tower                                         http://dbpedia.org/resource/David_Childs              http://www.som.com/content.cfm/www_david_m_childs
http://dbpedia.org/resource/7_World_Trade_Center                                  http://dbpedia.org/resource/David_Childs              http://www.som.com/content.cfm/www_david_m_childs
http://dbpedia.org/resource/The_New_York_Times_Building                           http://dbpedia.org/resource/Renzo_Piano              http://www.rpbw.com/
http://dbpedia.org/resource/Trump_World_Tower                                     http://dbpedia.org/resource/Costas_Kondylis              http://www.kondylis.com

13 Rows. -- 2183 msec.

14.10.10. RDF Store Benchmarks

14.10.10.1. Introduction

In a particular RDF Store Benchmarks there is difference if the queries are executed with specified graph or with specified multiple graphs. As Virtuoso is quad store, not triple store with many tables, it runs queries inefficiently if graphs are specified and there are no additional indexes except pre-set GSPO and OGPS. Proper use of the FROM clause or adding indexes with graph column will contribute for better results.


14.10.10.2. Using bitmap indexes

If is known in advance for the current RDF Store Benchmarks that some users will not indicate specific graphs then should be done:

Both methods do not require any changes in query texts

You can create other indexes as well. Bitmap indexes are preferable, but if O is the last column, then the index can not be bitmap, so it could be, for e.g.:

create index RDF_QUAD_PSGO on DB.DBA.RDF_QUAD (P, S, G, O);

but cannot be:

create bitmap index RDF_QUAD_PSGO on DB.DBA.RDF_QUAD (P, S, G, O);