Class CmsEncoder
The methods in this class are substitutes for java.net.URLEncoder.encode()
and
java.net.URLDecoder.decode()
. Use the methods from this class in all OpenCms
core classes to ensure the encoding is always handled the same way.
The de- and encoding uses the same coding mechanism as JavaScript, special characters are
replaced with %hex
where hex is a two digit hex number.
Note: On the client side (browser) instead of using the deprecated escape
and unescape
JavaScript functions, always the use encodeURIComponent
and
decodeURIComponent
functions. Only these work properly with unicode characters.
- Since:
- 6.0.0
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
Non-alphanumeric characters used for Base64 encoding.static final String
Characters used as replacements for non-alphanumeric Base64 characters when using Base64 for request parameters.static final String
Constant for the standardISO-8859-1
encoding.static final String
Constant for the standardUS-ASCII
encoding.static final String
Constant for the standardUTF-8
encoding. -
Method Summary
Modifier and TypeMethodDescriptionstatic String
adjustHtmlEncoding
(String input, String encoding) Adjusts the given String by making sure all characters that can be displayed in the given charset are contained as chars, whereas all other non-displayable characters are converted to HTML entities.static byte[]
changeEncoding
(byte[] input, String oldEncoding, String newEncoding) Changes the encoding of a byte array that represents a String.static String
convertHostToPunycode
(String uriString) Converts the host of an URI to Punycode.static String
createString
(byte[] bytes, String encoding) Creates a String out of a byte array with the specified encoding, falling back to the system default in case the encoding name is not valid.static String
Decodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.static String
This method is a substitute forURLDecoder.decode()
.static String
decodeHtmlEntities
(String input) Decodes HTML entity references like€
.static String
decodeHtmlEntities
(String input, String encoding) Deprecated.static String
decodeParameter
(String input) Decodes a string used as parameter in an uri in a way independent of other encodings/decodings applied before.Decodes a parameter which has been encoded from a string list using encodeStringsAsBase64Parameter.static String
Encodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.static String
This method is a substitute forURLEncoder.encode()
.static String
encodeHtmlEntities
(String input, String encoding) Encodes all characters that are contained in the String which can not displayed in the given encodings charset with HTML entity references like€
.static String
encodeJavaEntities
(String input, String encoding) Encodes all characters that are contained in the String which can not displayed in the given encodings charset with Java escaping like€
.static String
encodeParameter
(String input) Encodes a string used as parameter in an uri in a way independent of other encodings/decodings applied later.static String
encodeStringsAsBase64Parameter
(List<String> strings) Encode a list of strings as base64 data to be used in a request parameter.static String
Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function, using "UTF-8" for character encoding encoding.static String
Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function.static String
escapeHtml
(String source) Escapes special characters in a HTML-String with their number-based entity representation, for example & becomes &.static String
escapeNonAscii
(String source) Escapes non ASCII characters in a HTML-String with their number-based entity representation, for example & becomes &.static String
A simple method to avoid injection.static String
escapeSqlLikePattern
(String pattern, char escapeChar) Escapes the wildcard characters in a string which will be used as the pattern for a SQL LIKE clause.static String
escapeWBlanks
(String source, String encoding) Encodes a String in a way similar JavaScript "encodeURIcomponent" function.static String
Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.static String
Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.static String
lookupEncoding
(String encoding, String fallback) Checks if a given encoding name is actually supported, and if so resolves it to it's canonical name, if not it returns the given fallback value.static String
redecodeUriComponent
(String input) Re-decodes a String that has not been correctly decoded and thus has scrambled character bytes.static String
Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function, using "UTF-8" for character encoding.static String
Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function.
-
Field Details
-
BASE64_EXTRA
Non-alphanumeric characters used for Base64 encoding.- See Also:
-
BASE64_EXTRA_REPLACEMENTS
Characters used as replacements for non-alphanumeric Base64 characters when using Base64 for request parameters.- See Also:
-
ENCODING_ISO_8859_1
Constant for the standardISO-8859-1
encoding.- See Also:
-
ENCODING_US_ASCII
Constant for the standardUS-ASCII
encoding.- See Also:
-
ENCODING_UTF_8
Constant for the standardUTF-8
encoding.Default encoding for JavaScript decodeUriComponent methods is
UTF-8
by w3c standard.- See Also:
-
-
Method Details
-
adjustHtmlEncoding
Adjusts the given String by making sure all characters that can be displayed in the given charset are contained as chars, whereas all other non-displayable characters are converted to HTML entities.Just calls
decodeHtmlEntities(String)
first and feeds the result toencodeHtmlEntities(String, String)
.- Parameters:
input
- the input to adjust the HTML encoding forencoding
- the charset to encode the result with\- Returns:
- the input with the decoded/encoded HTML entities
-
changeEncoding
Changes the encoding of a byte array that represents a String.- Parameters:
input
- the byte array to convertoldEncoding
- the current encoding of the byte arraynewEncoding
- the new encoding of the byte array- Returns:
- the byte array encoded in the new encoding
-
convertHostToPunycode
Converts the host of an URI to Punycode.This is needed when we want to do redirects to hosts with host names containing international characters like umlauts.
- Parameters:
uriString
- the URI- Returns:
- the converted URI
-
createString
Creates a String out of a byte array with the specified encoding, falling back to the system default in case the encoding name is not valid.Use this method as a replacement for
new String(byte[], encoding)
to avoid possible encoding problems.- Parameters:
bytes
- the bytes to decodeencoding
- the encoding scheme to use for decoding the bytes- Returns:
- the bytes decoded to a String
-
decode
Decodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.- Parameters:
source
- the String to decode- Returns:
- String the decoded source String
-
decode
This method is a substitute forURLDecoder.decode()
. Use this in all OpenCms core classes to ensure the encoding is always handled the same way.In case you don't know what encoding to use, set the value of the
encoding
parameter tonull
. This method will then default to UTF-8 encoding, which is probably the right one.- Parameters:
source
- The string to decodeencoding
- The encoding to use (if null, the system default is used)- Returns:
- The decoded source String
-
decodeHtmlEntities
Decodes HTML entity references like€
.- Parameters:
input
- the input to decode the HTML entities in- Returns:
- the input with the decoded HTML entities
- See Also:
-
decodeHtmlEntities
Deprecated.Decodes HTML entity references like€
that are contained in the String to a regular character, but only if that character is contained in the given encodings charset.- Parameters:
input
- the input to decode the HTML entities inencoding
- the charset to decode the input for- Returns:
- the input with the decoded HTML entities
- See Also:
-
decodeParameter
Decodes a string used as parameter in an uri in a way independent of other encodings/decodings applied before.- Parameters:
input
- the encoded parameter string- Returns:
- the decoded parameter string
- See Also:
-
decodeStringsFromBase64Parameter
Decodes a parameter which has been encoded from a string list using encodeStringsAsBase64Parameter.- Parameters:
data
- the data to decode- Returns:
- the list of strings
-
encode
Encodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.- Parameters:
source
- the String to encode- Returns:
- String the encoded source String
-
encode
This method is a substitute forURLEncoder.encode()
. Use this in all OpenCms core classes to ensure the encoding is always handled the same way.In case you don't know what encoding to use, set the value of the
encoding
parameter tonull
. This method will then default to UTF-8 encoding, which is probably the right one.- Parameters:
source
- the String to encodeencoding
- the encoding to use (if null, the system default is used)- Returns:
- the encoded source String
-
encodeHtmlEntities
Encodes all characters that are contained in the String which can not displayed in the given encodings charset with HTML entity references like€
.This is required since a Java String is internally always stored as Unicode, meaning it can contain almost every character, but the HTML charset used might not support all such characters.
- Parameters:
input
- the input to encode for HTMLencoding
- the charset to encode the result with- Returns:
- the input with the encoded HTML entities
- See Also:
-
encodeJavaEntities
Encodes all characters that are contained in the String which can not displayed in the given encodings charset with Java escaping like€
.This can be used to escape values used in Java property files.
- Parameters:
input
- the input to encode for Javaencoding
- the charset to encode the result with- Returns:
- the input with the encoded Java entities
-
encodeParameter
Encodes a string used as parameter in an uri in a way independent of other encodings/decodings applied later.Used to ensure that GET parameters are not wrecked by wrong or incompatible configuration settings. In order to ensure this, the String is first encoded with html entities for any character that cannot encoded in US-ASCII; additionally, the plus sign is also encoded to avoid problems with the white-space replacer. Finally, the entity prefix is replaced with characters not used as delimiters in urls.
- Parameters:
input
- the parameter string- Returns:
- the encoded parameter string
-
encodeStringsAsBase64Parameter
Encode a list of strings as base64 data to be used in a request parameter.- Parameters:
strings
- the strings to encode- Returns:
- the resulting base64 data
-
escape
Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function, using "UTF-8" for character encoding encoding.JavaScript "decodeURIcomponent" can decode Strings that have been encoded using this method.
Directly exposed for JSP EL, not through
CmsJspElFunctions
.- Parameters:
source
- The text to be encoded- Returns:
- The encoded string
- See Also:
-
escape
Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function.JavaScript "decodeURIcomponent" can decode Strings that have been encoded using this method, provided "UTF-8" has been used as encoding.
Directly exposed for JSP EL, not through
CmsJspElFunctions
.- Parameters:
source
- The text to be encodedencoding
- the encoding type- Returns:
- The encoded string
-
escapeHtml
Escapes special characters in a HTML-String with their number-based entity representation, for example & becomes &.A character
num
is replaced if
((ch != 32) && ((ch > 122) || (ch < 48) || (ch == 60) || (ch == 62)))
- Parameters:
source
- the String to escape- Returns:
- String the escaped String
- See Also:
-
escapeNonAscii
Escapes non ASCII characters in a HTML-String with their number-based entity representation, for example & becomes &.A character
num
is replaced if
(ch > 255)
- Parameters:
source
- the String to escape- Returns:
- String the escaped String
- See Also:
-
escapeSql
A simple method to avoid injection.Replaces all single quotes to double single quotes in the value parameter of the SQL statement.
- Parameters:
source
- the String to escape SQL from- Returns:
- the escaped value of the parameter source
-
escapeSqlLikePattern
Escapes the wildcard characters in a string which will be used as the pattern for a SQL LIKE clause.- Parameters:
pattern
- the patternescapeChar
- the character which should be used as the escape character- Returns:
- the escaped pattern
-
escapeWBlanks
Encodes a String in a way similar JavaScript "encodeURIcomponent" function.Multiple blanks are encoded _multiply_ with
%20
.- Parameters:
source
- The text to be encodedencoding
- the encoding type- Returns:
- The encoded String
-
escapeXml
Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.This method replaces the following characters in a String:
- < with <
- > with >
- & with &
- " with "
- Parameters:
source
- the string to escape- Returns:
- the escaped string
- See Also:
-
escapeXml
Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.This method replaces the following characters in a String:
- < with <
- > with >
- & with &
- " with "
- Parameters:
source
- the string to escapedoubleEscape
- iffalse
, all entities that already are escaped are left untouched- Returns:
- the escaped string
- See Also:
-
lookupEncoding
Checks if a given encoding name is actually supported, and if so resolves it to it's canonical name, if not it returns the given fallback value.Charsets have a set of aliases. For example, valid aliases for "UTF-8" are "UTF8", "utf-8" or "utf8". This method resolves any given valid charset name to it's "canonical" form, so that simple String comparison can be used when checking charset names internally later.
Please see http://www.iana.org/assignments/character-sets for a list of valid charset alias names.
- Parameters:
encoding
- the encoding to check and resolvefallback
- the fallback encoding scheme- Returns:
- the resolved encoding name, or the fallback value
-
redecodeUriComponent
Re-decodes a String that has not been correctly decoded and thus has scrambled character bytes.This is an equivalent to the JavaScript "decodeURIComponent" function. It converts from the default "UTF-8" to the currently selected system encoding.
- Parameters:
input
- the String to convert- Returns:
- String the converted String
-
unescape
Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function, using "UTF-8" for character encoding.This method can decode Strings that have been encoded in JavaScript with "encodeURIcomponent".
Directly exposed for JSP EL, not through
CmsJspElFunctions
.- Parameters:
source
- The String to be decoded- Returns:
- The decoded String
-
unescape
Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function.This method can decode Strings that have been encoded in JavaScript with "encodeURIcomponent", provided "UTF-8" is used as encoding.
Directly exposed for JSP EL, not through
CmsJspElFunctions
.- Parameters:
source
- The String to be decodedencoding
- the encoding type- Returns:
- The decoded String
-