-
Class Summary
Class |
Description |
CharBuffer |
A simple StringBuffer replacement that aims to
reduce copying as much as possible.
|
CSVParser |
Parses CSV files according to the specified configuration.
|
CSVPrinter |
Print values as a comma separated list.
|
CSVStrategy |
CSVStrategy
Represents the strategy for a CSV.
|
CSVUtils |
Utility methods for dealing with CSV files
|
Package org.apache.commons.csv Description
Jakarta-Commons CSV Format Support
CSV (or its dialects) are widely used as interfaces to legacy systems or
manual data-imports. Basically CSV stands for "Comma Separated Values" but
this simple abbreviation leads to more confusion than definitions.
Common to all file dialects is its basic structure: The CSV data-format
is record oriented, whereas each record starts on a new textual line. A
record is build of a list of values. Keep in mind that not all records
must have an equal number of values:
csv := records*
record := values*
The following list contains the csv aspects the WAKE CSV parser supports:
- Separators (for lines)
- The record separators are hardcoded and cannot be changed. The must be '\n' or '\r\n'.
- Delimiter (for values)
- The delimiter for values is freely configurable (default ',').
- Comments
- Some CSV-dialects support a simple comment syntax. A comment is a record
which must start with a designated character (the commentStarter). A record
of this kind is treated as comment and gets removed from the input (default '(char)0')
- Encapsulator
- Two encapsulator characters (default '"') are used to enclose -> complex values.
- Simple values
- A simple value consist of all characters (except the separator) until
(but not including) the next separator or a record-terminator. Optionally
all leading whitespaces of a simple value can be ignored (default: true).
- Complex values
- Complex values are encapsulated within the defined encapsulator character.
The encapsulator itself must be escaped by '\' or doubled when used inside complex values.
Complex values preserve all kind of formatting (including newlines -> multiline-values)
- Unicode escapes
- Some non-unicode CSVs use unicode escaping sequences for certain unicode characters. The standard
unicode-escaping is '\uXXXX' whereas XXXX is the unicode-character-code in hex-format. The parser
can optionally resolve unicode escapes (default: disabled).
- Empty line skipping
- Optionally empty lines in CSV files might be skiped. Not skiping empty lines will return
an empty record on '\nEOF' combinations.
In addition to individually defined dialects, two predefined dialects (strict-csv, and excel-csv)
can be set directly.
Example usage:
String[] parsedLine = CSVParser.parseLine("a,b,c");
for (int i = 0; i < parsedLine.length; ++i) {
System.out.println("value " + i + "=" + parsedLine[i]);
}