Skip navigation links

Package org.apache.commons.csv

Jakarta-Commons CSV Format Support

See: Description

Package org.apache.commons.csv Description

Jakarta-Commons CSV Format Support

CSV (or its dialects) are widely used as interfaces to legacy systems or manual data-imports. Basically CSV stands for "Comma Separated Values" but this simple abbreviation leads to more confusion than definitions.

Common to all file dialects is its basic structure: The CSV data-format is record oriented, whereas each record starts on a new textual line. A record is build of a list of values. Keep in mind that not all records must have an equal number of values:

      csv    := records*
      record := values*

The following list contains the csv aspects the WAKE CSV parser supports:

Separators (for lines)
The record separators are hardcoded and cannot be changed. The must be '\n' or '\r\n'.
Delimiter (for values)
The delimiter for values is freely configurable (default ',').
Comments
Some CSV-dialects support a simple comment syntax. A comment is a record which must start with a designated character (the commentStarter). A record of this kind is treated as comment and gets removed from the input (default '(char)0')
Encapsulator
Two encapsulator characters (default '"') are used to enclose -> complex values.
Simple values
A simple value consist of all characters (except the separator) until (but not including) the next separator or a record-terminator. Optionally all leading whitespaces of a simple value can be ignored (default: true).
Complex values
Complex values are encapsulated within the defined encapsulator character. The encapsulator itself must be escaped by '\' or doubled when used inside complex values. Complex values preserve all kind of formatting (including newlines -> multiline-values)
Unicode escapes
Some non-unicode CSVs use unicode escaping sequences for certain unicode characters. The standard unicode-escaping is '\uXXXX' whereas XXXX is the unicode-character-code in hex-format. The parser can optionally resolve unicode escapes (default: disabled).
Empty line skipping
Optionally empty lines in CSV files might be skiped. Not skiping empty lines will return an empty record on '\nEOF' combinations.

In addition to individually defined dialects, two predefined dialects (strict-csv, and excel-csv) can be set directly.

Example usage:

String[] parsedLine = CSVParser.parseLine("a,b,c");
for (int i = 0; i < parsedLine.length; ++i) {
  System.out.println("value " + i + "=" + parsedLine[i]);
}
Skip navigation links

Copyright © 2014 The Apache Software Foundation. All rights reserved.