com.ibm.icu.text
public final class UTF16 extends Object
Standalone utility class providing UTF16 character conversions and indexing conversions.
Code that uses strings alone rarely need modification.
By design, UTF-16 does not allow overlap, so searching for strings is a safe
operation. Similarly, concatenation is always safe. Substringing is safe if
the start and end are both on UTF-32 boundaries. In normal code, the values
for start and end are on those boundaries, since they arose from operations
like searching. If not, the nearest UTF-32 boundaries can be determined
using bounds()
.
The following examples illustrate use of some of these methods.
// iteration forwards: Original for (int i = 0; i < s.length(); ++i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration forwards: Changes for UTF-32 int ch; for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) { ch = UTF16.charAt(s,i); doSomethingWith(ch); } // iteration backwards: Original for (int i = s.length() -1; i >= 0; --i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration backwards: Changes for UTF-32 int ch; for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) { ch = UTF16.charAt(s,i); doSomethingWith(ch); }Notes:
Lead
and Trail
in the API, which gives a better
sense of their ordering in a string. offset16
and
offset32
are used to distinguish offsets to UTF-16
boundaries vs offsets to UTF-32 boundaries. int char32
is
used to contain UTF-32 characters, as opposed to char16
,
which is a UTF-16 code unit.
bounds(string, offset16) != TRAIL
.
UCharacter.isLegal()
can be used to check
for validity if desired.
UNKNOWN: ICU 2.1
Nested Class Summary | |
---|---|
static class | UTF16.StringComparator UTF16 string comparator class. |
Field Summary | |
---|---|
static int | CODEPOINT_MAX_VALUE
The highest Unicode code point value (scalar value) according to the
Unicode Standard. |
static int | CODEPOINT_MIN_VALUE
The lowest Unicode code point value. |
static int | LEAD_SURROGATE_BOUNDARY
Value returned in
bounds() .
|
static int | LEAD_SURROGATE_MAX_VALUE
Lead surrogate maximum value |
static int | LEAD_SURROGATE_MIN_VALUE
Lead surrogate minimum value |
static int | SINGLE_CHAR_BOUNDARY
Value returned in
bounds() .
|
static int | SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points |
static int | SURROGATE_MAX_VALUE
Maximum surrogate value |
static int | SURROGATE_MIN_VALUE
Surrogate minimum value |
static int | TRAIL_SURROGATE_BOUNDARY
Value returned in
bounds() .
|
static int | TRAIL_SURROGATE_MAX_VALUE
Trail surrogate maximum value |
static int | TRAIL_SURROGATE_MIN_VALUE
Trail surrogate minimum value |
Method Summary | |
---|---|
static StringBuffer | append(StringBuffer target, int char32)
Append a single UTF-32 value to the end of a StringBuffer.
|
static int | append(char[] target, int limit, int char32)
Adds a codepoint to offset16 position of the argument char array. |
static StringBuffer | appendCodePoint(StringBuffer target, int cp)
Cover JDK 1.5 APIs. |
static int | bounds(String source, int offset16)
Returns the type of the boundaries around the char at offset16.
|
static int | bounds(StringBuffer source, int offset16)
Returns the type of the boundaries around the char at offset16. |
static int | bounds(char[] source, int start, int limit, int offset16)
Returns the type of the boundaries around the char at offset16. |
static int | charAt(String source, int offset16)
Extract a single UTF-32 value from a string.
|
static int | charAt(CharSequence source, int offset16)
Extract a single UTF-32 value from a string.
|
static int | charAt(StringBuffer source, int offset16)
Extract a single UTF-32 value from a string.
|
static int | charAt(char[] source, int start, int limit, int offset16)
Extract a single UTF-32 value from a substring.
|
static int | charAt(Replaceable source, int offset16)
Extract a single UTF-32 value from a string.
|
static int | countCodePoint(String source)
Number of codepoints in a UTF16 String |
static int | countCodePoint(StringBuffer source)
Number of codepoints in a UTF16 String buffer |
static int | countCodePoint(char[] source, int start, int limit)
Number of codepoints in a UTF16 char array substring |
static StringBuffer | delete(StringBuffer target, int offset16)
Removes the codepoint at the specified position in this target
(shortening target by 1 character if the codepoint is a
non-supplementary, 2 otherwise). |
static int | delete(char[] target, int limit, int offset16)
Removes the codepoint at the specified position in this target
(shortening target by 1 character if the codepoint is a
non-supplementary, 2 otherwise). |
static int | findCodePointOffset(String source, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at
or after the given UTF-16 offset. |
static int | findCodePointOffset(StringBuffer source, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at
the given UTF-16 offset. |
static int | findCodePointOffset(char[] source, int start, int limit, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at
the given UTF-16 offset. |
static int | findOffsetFromCodePoint(String source, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
|
static int | findOffsetFromCodePoint(StringBuffer source, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
|
static int | findOffsetFromCodePoint(char[] source, int start, int limit, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
|
static int | getCharCount(int char32)
Determines how many chars this char32 requires.
|
static char | getLeadSurrogate(int char32)
Returns the lead surrogate.
|
static char | getTrailSurrogate(int char32)
Returns the trail surrogate.
|
static boolean | hasMoreCodePointsThan(String source, int number)
Check if the string contains more Unicode code points than a certain
number. |
static boolean | hasMoreCodePointsThan(char[] source, int start, int limit, int number)
Check if the sub-range of char array, from argument start to limit,
contains more Unicode code points than a certain
number. |
static boolean | hasMoreCodePointsThan(StringBuffer source, int number)
Check if the string buffer contains more Unicode code points than a
certain number. |
static int | indexOf(String source, int char32)
Returns the index within the argument UTF16 format Unicode string of
the first occurrence of the argument codepoint. |
static int | indexOf(String source, String str)
Returns the index within the argument UTF16 format Unicode string of
the first occurrence of the argument string str. |
static int | indexOf(String source, int char32, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of
the first occurrence of the argument codepoint. |
static int | indexOf(String source, String str, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of
the first occurrence of the argument string str. |
static StringBuffer | insert(StringBuffer target, int offset16, int char32)
Inserts char32 codepoint into target at the argument offset16.
|
static int | insert(char[] target, int limit, int offset16, int char32)
Inserts char32 codepoint into target at the argument offset16.
|
static boolean | isLeadSurrogate(char char16)
Determines whether the character is a lead surrogate. |
static boolean | isSurrogate(char char16)
Determines whether the code value is a surrogate. |
static boolean | isTrailSurrogate(char char16)
Determines whether the character is a trail surrogate. |
static int | lastIndexOf(String source, int char32)
Returns the index within the argument UTF16 format Unicode string of
the last occurrence of the argument codepoint. |
static int | lastIndexOf(String source, String str)
Returns the index within the argument UTF16 format Unicode string of
the last occurrence of the argument string str. |
static int | lastIndexOf(String source, int char32, int fromIndex) Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex. This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character. source is searched backwards starting at the last character starting at the specified index.
Examples: |
static int | lastIndexOf(String source, String str, int fromIndex) Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity. |
static int | moveCodePointOffset(String source, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints |
static int | moveCodePointOffset(StringBuffer source, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints |
static int | moveCodePointOffset(char[] source, int start, int limit, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints within a subarray. |
static String | newString(int[] codePoints, int offset, int count)
Cover JDK 1.5 API. |
static String | replace(String source, int oldChar32, int newChar32)
Returns a new UTF16 format Unicode string resulting from replacing all
occurrences of oldChar32 in source with newChar32.
|
static String | replace(String source, String oldStr, String newStr)
Returns a new UTF16 format Unicode string resulting from replacing all
occurrences of oldStr in source with newStr.
|
static StringBuffer | reverse(StringBuffer source)
Reverses a UTF16 format Unicode string and replaces source's content
with it.
|
static void | setCharAt(StringBuffer target, int offset16, int char32)
Set a code point into a UTF16 position.
|
static int | setCharAt(char[] target, int limit, int offset16, int char32)
Set a code point into a UTF16 position in a char array.
|
static String | valueOf(int char32)
Convenience method corresponding to String.valueOf(char). |
static String | valueOf(String source, int offset16)
Convenience method corresponding to String.valueOf(codepoint at
offset16).
|
static String | valueOf(StringBuffer source, int offset16)
Convenience method corresponding to
StringBuffer.valueOf(codepoint at offset16).
|
static String | valueOf(char[] source, int start, int limit, int offset16)
Convenience method.
|
UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
bounds()
.
These values are chosen specifically so that it actually represents
the position of the character
[offset16 - (value >> 2), offset16 + (value & 3)]UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
bounds()
.
These values are chosen specifically so that it actually represents
the position of the character
[offset16 - (value >> 2), offset16 + (value & 3)]UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
bounds()
.
These values are chosen specifically so that it actually represents
the position of the character
[offset16 - (value >> 2), offset16 + (value & 3)]UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
Parameters: target the buffer to append to char32 value to append.
Returns: the updated StringBuffer
Throws: IllegalArgumentException thrown when char32 does not lie within the range of the Unicode codepoints
UNKNOWN: ICU 2.1
Parameters: target char array to be append with the new code point limit UTF16 offset which the codepoint will be appended. char32 code point to be appended
Returns: offset after char32 in the array.
Throws: IllegalArgumentException thrown if there is not enough space for the append, or when char32 does not lie within the range of the Unicode codepoints.
UNKNOWN: ICU 2.1
Parameters: target the buffer to append to cp the code point to append
Returns: the updated StringBuffer
Throws: IllegalArgumentException if cp is not a valid code point
UNKNOWN: ICU 3.0
Parameters: source text to analyse offset16 UTF-16 offset
Returns:
For bit-twiddlers, the return values for these are chosen so
that the boundaries can be gotten by:
[offset16 - (value >> 2), offset16 + (value & 3)].
Throws: IndexOutOfBoundsException if offset16 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source string buffer to analyse offset16 UTF16 offset
Returns:
For bit-twiddlers, the return values for these are chosen so that the
boundaries can be gotten by:
[offset16 - (value >> 2), offset16 + (value & 3)].
Throws: IndexOutOfBoundsException if offset16 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source char array to analyse start offset to substring in the source array for analyzing limit offset to substring in the source array for analyzing offset16 UTF16 offset relative to start
Returns:
For bit-twiddlers, the boundary values for these are chosen so that the
boundaries can be gotten by: [offset16 - (boundvalue >> 2), offset16
+ (boundvalue & 3)].
Throws: IndexOutOfBoundsException if offset16 is not within the range of start and limit.
UNKNOWN: ICU 2.1
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returnedParameters: source array of UTF-16 chars offset16 UTF-16 offset to the start of the character.
Returns: UTF-32 value for the UTF-32 value that contains the char at
offset16. The boundaries of that codepoint are the same as in
bounds32()
.
Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.
UNKNOWN: ICU 2.1
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returnedParameters: source array of UTF-16 chars offset16 UTF-16 offset to the start of the character.
Returns: UTF-32 value for the UTF-32 value that contains the char at
offset16. The boundaries of that codepoint are the same as in
bounds32()
.
Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.
UNKNOWN: ICU 2.1
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returnedParameters: source UTF-16 chars string buffer offset16 UTF-16 offset to the start of the character.
Returns: UTF-32 value for the UTF-32 value that contains the char at
offset16. The boundaries of that codepoint are the same as in
bounds32()
.
Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.
UNKNOWN: ICU 2.1
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returnedParameters: source array of UTF-16 chars start offset to substring in the source array for analyzing limit offset to substring in the source array for analyzing offset16 UTF-16 offset relative to start
Returns: UTF-32 value for the UTF-32 value that contains the char at
offset16. The boundaries of that codepoint are the same as in
bounds32()
.
Throws: IndexOutOfBoundsException thrown if offset16 is not within the range of start and limit.
UNKNOWN: ICU 2.1
UTF16.getCharCount()
, as well as random access. If a
validity check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is
not found the incomplete character will be returnedParameters: source UTF-16 chars string buffer offset16 UTF-16 offset to the start of the character.
Returns: UTF-32 value for the UTF-32 value that contains the char at
offset16. The boundaries of that codepoint are the same as in
bounds32()
.
Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source UTF16 string
Returns: number of codepoint in string
UNKNOWN: ICU 2.1
Parameters: source UTF16 string buffer
Returns: number of codepoint in string
UNKNOWN: ICU 2.1
Parameters: source UTF16 char array start offset of the substring limit offset of the substring
Returns: number of codepoint in the substring
Throws: IndexOutOfBoundsException if start and limit are not valid.
UNKNOWN: ICU 2.1
Parameters: target string buffer to remove codepoint from offset16 offset which the codepoint will be removed
Returns: a reference to target
Throws: IndexOutOfBoundsException thrown if offset16 is invalid.
UNKNOWN: ICU 2.1
Parameters: target string buffer to remove codepoint from limit end index of the char array, limit <= target.length offset16 offset which the codepoint will be removed
Returns: a new limit size
Throws: IndexOutOfBoundsException thrown if offset16 is invalid.
UNKNOWN: ICU 2.1
To find the UTF-32 length of a string, use:
len32 = countCodePoint(source, source.length());
Parameters: source text to analyse offset16 UTF-16 offset < source text length.
Returns: UTF-32 offset
Throws: IndexOutOfBoundsException if offset16 is out of bounds.
UNKNOWN: ICU 2.1
To find the UTF-32 length of a string, use:
len32 = countCodePoint(source);
Parameters: source text to analyse offset16 UTF-16 offset < source text length.
Returns: UTF-32 offset
Throws: IndexOutOfBoundsException if offset16 is out of bounds.
UNKNOWN: ICU 2.1
To find the UTF-32 length of a substring, use:
len32 = countCodePoint(source, start, limit);
Parameters: source text to analyse start offset of the substring limit offset of the substring offset16 UTF-16 relative to start
Returns: UTF-32 offset relative to start
Throws: IndexOutOfBoundsException if offset16 is not within the range of start and limit.
UNKNOWN: ICU 2.1
Parameters: source the UTF-16 string offset32 UTF-32 offset
Returns: UTF-16 offset
Throws: IndexOutOfBoundsException if offset32 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source the UTF-16 string buffer offset32 UTF-32 offset
Returns: UTF-16 offset
Throws: IndexOutOfBoundsException if offset32 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source the UTF-16 char array whose substring is to be analysed start offset of the substring to be analysed limit offset of the substring to be analysed offset32 UTF-32 offset relative to start
Returns: UTF-16 offset relative to start
Throws: IndexOutOfBoundsException if offset32 is out of bounds.
UNKNOWN: ICU 2.1
isLegal()
on
char32 before calling.Parameters: char32 the input codepoint.
Returns: 2 if is in supplementary space, otherwise 1.
UNKNOWN: ICU 2.1
isLegal()
on char32 before calling.Parameters: char32 the input character.
Returns: lead surrogate if the getCharCount(ch) is 2;
and 0 otherwise (note: 0 is not a valid lead surrogate).
UNKNOWN: ICU 2.1
isLegal()
on char32 before calling.Parameters: char32 the input character.
Returns: the trail surrogate if the getCharCount(ch) is 2;
otherwise
the character itself
UNKNOWN: ICU 2.1
Parameters: source The input string. number The number of code points in the string is compared against the 'number' parameter.
Returns: boolean value for whether the string contains more Unicode code points than 'number'.
UNKNOWN: ICU 2.4
Parameters: source array of UTF-16 chars start offset to substring in the source array for analyzing limit offset to substring in the source array for analyzing number The number of code points in the string is compared against the 'number' parameter.
Returns: boolean value for whether the string contains more Unicode code points than 'number'.
Throws: IndexOutOfBoundsException thrown when limit < start
UNKNOWN: ICU 2.4
Parameters: source The input string buffer. number The number of code points in the string buffer is compared against the 'number' parameter.
Returns: boolean value for whether the string buffer contains more Unicode code points than 'number'.
UNKNOWN: ICU 2.4
i
such that UTF16.charAt(source, i) ==
char32
is true.
If no such character occurs in this string, then -1 is returned.
Examples:
UTF16.indexOf("abc", 'a') returns 0
UTF16.indexOf("abc", 0x10000) returns 3
UTF16.indexOf("abc", 0xd800) returns -1
Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for
Returns: the index of the first occurrence of the codepoint in the argument Unicode string, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
If no such string str occurs in this source, then -1 is returned.
Examples:
UTF16.indexOf("abc", "ab") returns 0
UTF16.indexOf("abc", "") returns 3
UTF16.indexOf("abc", "") returns -1
Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for
Returns: the index of the first occurrence of the codepoint in the argument Unicode string, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
If no such character occurs in this string, then -1 is returned.
Examples:
UTF16.indexOf("abc", 'a', 1) returns -1
UTF16.indexOf("abc", 0x10000, 1) returns 3
UTF16.indexOf("abc", 0xd800, 1) returns -1
Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for fromIndex the index to start the search from.
Returns: the index of the first occurrence of the codepoint in the argument Unicode string at or after fromIndex, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
If no such string str occurs in this source, then -1 is returned.
Examples:
UTF16.indexOf("abc", "ab", 0) returns 0
UTF16.indexOf("abc", "", 0) returns 3
UTF16.indexOf("abc", "", 2) returns 3
UTF16.indexOf("abc", "", 0) returns -1
Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for fromIndex the index to start the search from.
Returns: the index of the first occurrence of the codepoint in the argument Unicode string, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.
The offset argument must be greater than or equal to 0, and less than or equal to the length of source.
Parameters: target string buffer to insert to offset16 offset which char32 will be inserted in char32 codepoint to be inserted
Returns: a reference to target
Throws: IndexOutOfBoundsException thrown if offset16 is invalid.
UNKNOWN: ICU 2.1
The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.
The offset argument must be greater than or equal to 0, and less than or equal to the limit.
Parameters: target char array to insert to limit end index of the char array, limit <= target.length offset16 offset which char32 will be inserted in char32 codepoint to be inserted
Returns: new limit size
Throws: IndexOutOfBoundsException thrown if offset16 is invalid.
UNKNOWN: ICU 2.1
Parameters: char16 the input character.
Returns: true iff the input character is a lead surrogate
UNKNOWN: ICU 2.1
Parameters: char16 the input character.
Returns: true iff the input character is a surrogate.
UNKNOWN: ICU 2.1
Parameters: char16 the input character.
Returns: true iff the input character is a trail surrogate.
UNKNOWN: ICU 2.1
Examples:
UTF16.lastIndexOf("abc", 'a') returns 0
UTF16.lastIndexOf("abc", 0x10000) returns 3
UTF16.lastIndexOf("abc", 0xd800) returns -1
source is searched backwards starting at the last character.
Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for
Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
Examples:
UTF16.lastIndexOf("abc", "a") returns 0
UTF16.lastIndexOf("abc", "") returns 3
UTF16.lastIndexOf("abc", "") returns -1
source is searched backwards starting at the last character.
Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for
Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex.
This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character.
source is searched backwards starting at the last character starting at the specified index.
Examples:
UTF16.lastIndexOf("abc", 'c', 2) returns 2
UTF16.lastIndexOf("abc", 'c', 1) returns -1
UTF16.lastIndexOf("abc", 0x10000, 5) returns 3
UTF16.lastIndexOf("abc", 0x10000, 3) returns 3
UTF16.lastIndexOf("abc", 0xd800) returns -1
Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for fromIndex the index to start the search from. There is no restriction on the value of fromIndex. If it is greater than or equal to the length of this string, it has the same effect as if it were equal to one less than the length of this string: this entire string may be searched. If it is negative, it has the same effect as if it were -1: -1 is returned.
Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex.
This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity. Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str.
See example below.
Examples:
UTF16.lastIndexOf("abc", "c", 2) returns 2
UTF16.lastIndexOf("abc", "c", 1) returns -1
UTF16.lastIndexOf("abc", "", 5) returns 3
UTF16.lastIndexOf("abc", "", 3) returns 3
UTF16.lastIndexOf("abc", "", 4) returns -1
source is searched backwards starting at the last character.
Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for fromIndex the index to start the search from. There is no restriction on the value of fromIndex. If it is greater than or equal to the length of this string, it has the same effect as if it were equal to one less than the length of this string: this entire string may be searched. If it is negative, it has the same effect as if it were -1: -1 is returned.
Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.
UNKNOWN: ICU 2.6
Parameters: source string offset16 UTF16 position to shift shift32 number of codepoints to shift
Returns: new shifted offset16
Throws: IndexOutOfBoundsException if the new offset16 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source string buffer offset16 UTF16 position to shift shift32 number of codepoints to shift
Returns: new shifted offset16
Throws: IndexOutOfBoundsException if the new offset16 is out of bounds.
UNKNOWN: ICU 2.1
Parameters: source char array start position of the subarray to be performed on limit position of the subarray to be performed on offset16 UTF16 position to shift relative to start shift32 number of codepoints to shift
Returns: new shifted offset16 relative to start
Throws: IndexOutOfBoundsException if the new offset16 is out of bounds with respect to the subarray or the subarray bounds are out of range.
UNKNOWN: ICU 2.1
Parameters: codePoints the code array offset the start of the text in the code point array count the number of code points
Returns: a String representing the code points between offset and count
Throws: IllegalArgumentException if an invalid code point is encountered IndexOutOfBoundsException if the offset or count are out of bounds.
UNKNOWN: ICU 3.0
Examples:
UTF16.replace("mesquite in your cellar", 'e', 'o');
returns "mosquito in your collar"
UTF16.replace("JonL", 'q', 'x');
returns "JonL" (no change)
UTF16.replace("Supplementary character ", 0x10000, '!');
returns "Supplementary character !"
UTF16.replace("Supplementary character ", 0xd800, '!');
returns "Supplementary character "
Parameters: source UTF16 format Unicode string which the codepoint replacements will be based on. oldChar32 non-zero old codepoint to be replaced. newChar32 the new codepoint to replace oldChar32
Returns: new String derived from source by replacing every occurrence of oldChar32 with newChar32, unless when no oldChar32 is found in source then source will be returned.
UNKNOWN: ICU 2.6
Examples:
UTF16.replace("mesquite in your cellar", "e", "o");
returns "mosquito in your collar"
UTF16.replace("mesquite in your cellar", "mesquite", "cat");
returns "cat in your cellar"
UTF16.replace("JonL", "q", "x");
returns "JonL" (no change)
UTF16.replace("Supplementary character ", "",
'!');
returns "Supplementary character !"
UTF16.replace("Supplementary character ", "", '!');
returns "Supplementary character "
Parameters: source UTF16 format Unicode string which the replacements will be based on. oldStr non-zero-length string to be replaced. newStr the new string to replace oldStr
Returns: new String derived from source by replacing every occurrence of oldStr with newStr. When no oldStr is found in source, then source will be returned.
UNKNOWN: ICU 2.6
Examples:
UTF16.reverse(new StringBuffer(
"Supplementary characters "))
returns " sretcarahc yratnemelppuS".
Parameters: source the source StringBuffer that contains UTF16 format Unicode string to be reversed
Returns: a modified source with reversed UTF16 format Unicode string.
UNKNOWN: ICU 2.6
Parameters: target stringbuffer offset16 UTF16 position to insert into char32 code point
UNKNOWN: ICU 2.1
Parameters: target char array limit numbers of valid chars in target, different from target.length. limit counts the number of chars in target that represents a string, not the size of array target. offset16 UTF16 position to insert into char32 code point
Returns: new number of chars in target that represents a string
Throws: IndexOutOfBoundsException if offset16 is out of range
UNKNOWN: ICU 2.1
Parameters: char32 the input character.
Returns: string value of char32 in UTF16 format
Throws: IllegalArgumentException thrown if char32 is a invalid codepoint.
UNKNOWN: ICU 2.1
Parameters: source the input string. offset16 the UTF16 index to the codepoint in source
Returns: string value of char32 in UTF16 format
UNKNOWN: ICU 2.1
Parameters: source the input string buffer. offset16 the UTF16 index to the codepoint in source
Returns: string value of char32 in UTF16 format
UNKNOWN: ICU 2.1
Parameters: source the input char array. start start index of the subarray limit end index of the subarray offset16 the UTF16 index to the codepoint in source relative to start
Returns: string value of char32 in UTF16 format
UNKNOWN: ICU 2.1