www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

Virtuoso Functions Guide

Administration
Aggregate Functions
Array Manipulation
BPEL APIs
Backup
Compression
Cursor
Date & Time Manipulation
Debug
Dictionary Manipulation
Encoding & Decoding
File Manipulation
Free Text
Hashing / Cryptographic
LDAP
Locale
Mail
Miscellaneous
Number
Phrases
ap_add_phrases
ap_build_match_list
db.dba.ann_phrase_cl...
db.dba.ann_phrase_cl...
RDF data
Remote SQL Data Source
Replication
SOAP
SQL
String
Transaction
Type Mapping
UDDI
User Defined Types & The CLR
Virtuoso Java PL API
Virtuoso Server Extension Interface (VSEI)
Web & Internet
XML
XPATH & XQUERY

Functions Index

AP_BUILD_MATCH_LIST

Returns report of all occurrences of phrases from the specified sets in the text.
AP_BUILD_MATCH_LIST (in phrase_set_ids vector of integers, in source_UTF8_text varchar not null, in lang_name varchar not null, in source_text_is_html integer, in report_flags integer);
Description

Forms a report that lists all occurrences of phrases from the specified sets in the text.

The report describes "phrase hits", i.e. occurrences of annotation phrases in the text, using "arrows" that point to specific fragments in the text, such as words of found phrases or HTML tags.

The structure of the report is complicated, due to contradiction in requirements. It is compact to provide reasonable performance and scalability, so common data should not be repeated, saving memory. It is complete enough to prevent application from reading omitted data from system tables, saving time.

All objects of one type are listed as items of some vector and the whole report consists of several such vectors. An item in one vector may refer to item in other vector by its index, without storing a local copy.

Detailed description of the report structure can be found here

.
Parameters
phrase_set_ids – vector of numeric identifiers of phrase sets at work, they may belong to various phrase classes, but if language of some phrase set differs from value of lang_name argument then the phrase set is silently ignored.
source_UTF8_text – a plain text or an HTML
lang_name – language name
source_text_is_html – 0 for plain text, 1 for standard-compliant HTML or 2 for "dirty" HTML
report_flags –
Examples