Hubbub
Data Structures | Functions
parser.c File Reference
#include <assert.h>
#include <string.h>
#include <parserutils/charset/mibenum.h>
#include <parserutils/input/inputstream.h>
#include <hubbub/parser.h>
#include "charset/detect.h"
#include "tokeniser/tokeniser.h"
#include "treebuilder/treebuilder.h"
#include "utils/parserutilserror.h"

Go to the source code of this file.

Data Structures

struct  hubbub_parser
 Hubbub parser object. More...
 

Functions

hubbub_error hubbub_parser_create (const char *enc, bool fix_enc, hubbub_parser **parser)
 Create a hubbub parser. More...
 
hubbub_error hubbub_parser_destroy (hubbub_parser *parser)
 Destroy a hubbub parser. More...
 
hubbub_error hubbub_parser_setopt (hubbub_parser *parser, hubbub_parser_opttype type, hubbub_parser_optparams *params)
 Configure a hubbub parser. More...
 
hubbub_error hubbub_parser_insert_chunk (hubbub_parser *parser, const uint8_t *data, size_t len)
 Insert a chunk of data into a hubbub parser input stream. More...
 
hubbub_error hubbub_parser_parse_chunk (hubbub_parser *parser, const uint8_t *data, size_t len)
 Pass a chunk of data to a hubbub parser for parsing. More...
 
hubbub_error hubbub_parser_completed (hubbub_parser *parser)
 Inform the parser that the last chunk of data has been parsed. More...
 
const char * hubbub_parser_read_charset (hubbub_parser *parser, hubbub_charset_source *source)
 Read the document charset. More...
 

Function Documentation

hubbub_error hubbub_parser_completed ( hubbub_parser parser)

Inform the parser that the last chunk of data has been parsed.

Parameters
parserParser to inform
Returns
HUBBUB_OK on success, appropriate error otherwise

Definition at line 279 of file parser.c.

References HUBBUB_BADPARM, hubbub_error_from_parserutils_error(), HUBBUB_OK, hubbub_tokeniser_run(), hubbub_parser::stream, and hubbub_parser::tok.

hubbub_error hubbub_parser_create ( const char *  enc,
bool  fix_enc,
hubbub_parser **  parser 
)

Create a hubbub parser.

Parameters
encSource document encoding, or NULL to autodetect
fix_encPermit fixing up of encoding if it's frequently misused
parserPointer to location to receive parser instance
Returns
HUBBUB_OK on success, HUBBUB_BADPARM on bad parameters, HUBBUB_NOMEM on memory exhaustion, HUBBUB_BADENCODING if ::enc is unsupported

Definition at line 41 of file parser.c.

References HUBBUB_BADPARM, HUBBUB_CHARSET_CONFIDENT, hubbub_charset_extract(), hubbub_charset_fix_charset(), HUBBUB_CHARSET_UNKNOWN, hubbub_error_from_parserutils_error(), HUBBUB_NOMEM, HUBBUB_OK, hubbub_tokeniser_create(), hubbub_tokeniser_destroy(), hubbub_treebuilder_create(), hubbub_parser::stream, hubbub_parser::tb, and hubbub_parser::tok.

hubbub_error hubbub_parser_destroy ( hubbub_parser parser)

Destroy a hubbub parser.

Parameters
parserParser instance to destroy
Returns
HUBBUB_OK on success, appropriate error otherwise

Definition at line 102 of file parser.c.

References HUBBUB_BADPARM, HUBBUB_OK, hubbub_tokeniser_destroy(), hubbub_treebuilder_destroy(), hubbub_parser::stream, hubbub_parser::tb, and hubbub_parser::tok.

hubbub_error hubbub_parser_insert_chunk ( hubbub_parser parser,
const uint8_t *  data,
size_t  len 
)

Insert a chunk of data into a hubbub parser input stream.

Inserts the given data into the input stream ready for parsing but does not cause any additional processing of the input. This is useful to allow hubbub callbacks to add computed data to the input.

Parameters
parserParser instance to use
dataData to parse (encoded in UTF-8)
lenLength, in bytes, of data
Returns
HUBBUB_OK on success, appropriate error otherwise

Definition at line 218 of file parser.c.

References HUBBUB_BADPARM, hubbub_tokeniser_insert_chunk(), and hubbub_parser::tok.

hubbub_error hubbub_parser_parse_chunk ( hubbub_parser parser,
const uint8_t *  data,
size_t  len 
)

Pass a chunk of data to a hubbub parser for parsing.

Parameters
parserParser instance to use
dataData to parse (encoded in the input charset)
lenLength, in bytes, of data
Returns
HUBBUB_OK on success, appropriate error otherwise

Definition at line 235 of file parser.c.

References HUBBUB_BADENCODING, HUBBUB_BADPARM, HUBBUB_CHARSET_TENTATIVE, hubbub_error_from_parserutils_error(), HUBBUB_OK, hubbub_tokeniser_run(), hubbub_parser::stream, and hubbub_parser::tok.

const char* hubbub_parser_read_charset ( hubbub_parser parser,
hubbub_charset_source source 
)

Read the document charset.

Parameters
parserParser instance to query
sourcePointer to location to receive charset source
Returns
Pointer to charset name (constant; do not free), or NULL if unknown

Definition at line 305 of file parser.c.

References name, and hubbub_parser::stream.

hubbub_error hubbub_parser_setopt ( hubbub_parser parser,
hubbub_parser_opttype  type,
hubbub_parser_optparams params 
)