Class CmsEncoder

java.lang.Object
org.opencms.i18n.CmsEncoder

public final class CmsEncoder extends Object
The OpenCms CmsEncoder class provides static methods to decode and encode data.

The methods in this class are substitutes for java.net.URLEncoder.encode() and java.net.URLDecoder.decode(). Use the methods from this class in all OpenCms core classes to ensure the encoding is always handled the same way.

The de- and encoding uses the same coding mechanism as JavaScript, special characters are replaced with %hex where hex is a two digit hex number.

Note: On the client side (browser) instead of using the deprecated escape and unescape JavaScript functions, always the use encodeURIComponent and decodeURIComponent functions. Only these work properly with unicode characters.

Since:
6.0.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Non-alphanumeric characters used for Base64 encoding.
    static final String
    Characters used as replacements for non-alphanumeric Base64 characters when using Base64 for request parameters.
    static final String
    Constant for the standard ISO-8859-1 encoding.
    static final String
    Constant for the standard US-ASCII encoding.
    static final String
    Constant for the standard UTF-8 encoding.
  • Method Summary

    Modifier and Type
    Method
    Description
    static String
    adjustHtmlEncoding(String input, String encoding)
    Adjusts the given String by making sure all characters that can be displayed in the given charset are contained as chars, whereas all other non-displayable characters are converted to HTML entities.
    static byte[]
    changeEncoding(byte[] input, String oldEncoding, String newEncoding)
    Changes the encoding of a byte array that represents a String.
    static String
    Converts the host of an URI to Punycode.
    static String
    createString(byte[] bytes, String encoding)
    Creates a String out of a byte array with the specified encoding, falling back to the system default in case the encoding name is not valid.
    static String
    decode(String source)
    Decodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.
    static String
    decode(String source, String encoding)
    This method is a substitute for URLDecoder.decode().
    static String
    Decodes HTML entity references like €.
    static String
    decodeHtmlEntities(String input, String encoding)
    Deprecated.
    static String
    Decodes a string used as parameter in an uri in a way independent of other encodings/decodings applied before.
    static List<String>
    Decodes a parameter which has been encoded from a string list using encodeStringsAsBase64Parameter.
    static String
    encode(String source)
    Encodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.
    static String
    encode(String source, String encoding)
    This method is a substitute for URLEncoder.encode().
    static String
    encodeHtmlEntities(String input, String encoding)
    Encodes all characters that are contained in the String which can not displayed in the given encodings charset with HTML entity references like &#8364;.
    static String
    encodeJavaEntities(String input, String encoding)
    Encodes all characters that are contained in the String which can not displayed in the given encodings charset with Java escaping like .
    static String
    Encodes a string used as parameter in an uri in a way independent of other encodings/decodings applied later.
    static String
    Encode a list of strings as base64 data to be used in a request parameter.
    static String
    escape(String source)
    Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function, using "UTF-8" for character encoding encoding.
    static String
    escape(String source, String encoding)
    Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function.
    static String
    Escapes special characters in a HTML-String with their number-based entity representation, for example & becomes &#38;.
    static String
    Escapes non ASCII characters in a HTML-String with their number-based entity representation, for example & becomes &#38;.
    static String
    escapeSql(String source)
    A simple method to avoid injection.
    static String
    escapeSqlLikePattern(String pattern, char escapeChar)
    Escapes the wildcard characters in a string which will be used as the pattern for a SQL LIKE clause.
    static String
    escapeWBlanks(String source, String encoding)
    Encodes a String in a way similar JavaScript "encodeURIcomponent" function.
    static String
    escapeXml(String source)
    Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.
    static String
    escapeXml(String source, boolean doubleEscape)
    Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.
    static String
    lookupEncoding(String encoding, String fallback)
    Checks if a given encoding name is actually supported, and if so resolves it to it's canonical name, if not it returns the given fallback value.
    static String
    Re-decodes a String that has not been correctly decoded and thus has scrambled character bytes.
    static String
    unescape(String source)
    Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function, using "UTF-8" for character encoding.
    static String
    unescape(String source, String encoding)
    Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Method Details

    • adjustHtmlEncoding

      public static String adjustHtmlEncoding(String input, String encoding)
      Adjusts the given String by making sure all characters that can be displayed in the given charset are contained as chars, whereas all other non-displayable characters are converted to HTML entities.

      Just calls decodeHtmlEntities(String) first and feeds the result to encodeHtmlEntities(String, String).

      Parameters:
      input - the input to adjust the HTML encoding for
      encoding - the charset to encode the result with\
      Returns:
      the input with the decoded/encoded HTML entities
    • changeEncoding

      public static byte[] changeEncoding(byte[] input, String oldEncoding, String newEncoding)
      Changes the encoding of a byte array that represents a String.

      Parameters:
      input - the byte array to convert
      oldEncoding - the current encoding of the byte array
      newEncoding - the new encoding of the byte array
      Returns:
      the byte array encoded in the new encoding
    • convertHostToPunycode

      public static String convertHostToPunycode(String uriString)
      Converts the host of an URI to Punycode.

      This is needed when we want to do redirects to hosts with host names containing international characters like umlauts.

      Parameters:
      uriString - the URI
      Returns:
      the converted URI
    • createString

      public static String createString(byte[] bytes, String encoding)
      Creates a String out of a byte array with the specified encoding, falling back to the system default in case the encoding name is not valid.

      Use this method as a replacement for new String(byte[], encoding) to avoid possible encoding problems.

      Parameters:
      bytes - the bytes to decode
      encoding - the encoding scheme to use for decoding the bytes
      Returns:
      the bytes decoded to a String
    • decode

      public static String decode(String source)
      Decodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.

      Parameters:
      source - the String to decode
      Returns:
      String the decoded source String
    • decode

      public static String decode(String source, String encoding)
      This method is a substitute for URLDecoder.decode(). Use this in all OpenCms core classes to ensure the encoding is always handled the same way.

      In case you don't know what encoding to use, set the value of the encoding parameter to null. This method will then default to UTF-8 encoding, which is probably the right one.

      Parameters:
      source - The string to decode
      encoding - The encoding to use (if null, the system default is used)
      Returns:
      The decoded source String
    • decodeHtmlEntities

      public static String decodeHtmlEntities(String input)
      Decodes HTML entity references like &#8364;.
      Parameters:
      input - the input to decode the HTML entities in
      Returns:
      the input with the decoded HTML entities
      See Also:
    • decodeHtmlEntities

      @Deprecated public static String decodeHtmlEntities(String input, String encoding)
      Deprecated.
      Decodes HTML entity references like &#8364; that are contained in the String to a regular character, but only if that character is contained in the given encodings charset.

      Parameters:
      input - the input to decode the HTML entities in
      encoding - the charset to decode the input for
      Returns:
      the input with the decoded HTML entities
      See Also:
    • decodeParameter

      public static String decodeParameter(String input)
      Decodes a string used as parameter in an uri in a way independent of other encodings/decodings applied before.

      Parameters:
      input - the encoded parameter string
      Returns:
      the decoded parameter string
      See Also:
    • decodeStringsFromBase64Parameter

      Decodes a parameter which has been encoded from a string list using encodeStringsAsBase64Parameter.

      Parameters:
      data - the data to decode
      Returns:
      the list of strings
    • encode

      public static String encode(String source)
      Encodes a String using UTF-8 encoding, which is the standard for http data transmission with GET ant POST requests.

      Parameters:
      source - the String to encode
      Returns:
      String the encoded source String
    • encode

      public static String encode(String source, String encoding)
      This method is a substitute for URLEncoder.encode(). Use this in all OpenCms core classes to ensure the encoding is always handled the same way.

      In case you don't know what encoding to use, set the value of the encoding parameter to null. This method will then default to UTF-8 encoding, which is probably the right one.

      Parameters:
      source - the String to encode
      encoding - the encoding to use (if null, the system default is used)
      Returns:
      the encoded source String
    • encodeHtmlEntities

      public static String encodeHtmlEntities(String input, String encoding)
      Encodes all characters that are contained in the String which can not displayed in the given encodings charset with HTML entity references like &#8364;.

      This is required since a Java String is internally always stored as Unicode, meaning it can contain almost every character, but the HTML charset used might not support all such characters.

      Parameters:
      input - the input to encode for HTML
      encoding - the charset to encode the result with
      Returns:
      the input with the encoded HTML entities
      See Also:
    • encodeJavaEntities

      public static String encodeJavaEntities(String input, String encoding)
      Encodes all characters that are contained in the String which can not displayed in the given encodings charset with Java escaping like .

      This can be used to escape values used in Java property files.

      Parameters:
      input - the input to encode for Java
      encoding - the charset to encode the result with
      Returns:
      the input with the encoded Java entities
    • encodeParameter

      public static String encodeParameter(String input)
      Encodes a string used as parameter in an uri in a way independent of other encodings/decodings applied later.

      Used to ensure that GET parameters are not wrecked by wrong or incompatible configuration settings. In order to ensure this, the String is first encoded with html entities for any character that cannot encoded in US-ASCII; additionally, the plus sign is also encoded to avoid problems with the white-space replacer. Finally, the entity prefix is replaced with characters not used as delimiters in urls.

      Parameters:
      input - the parameter string
      Returns:
      the encoded parameter string
    • encodeStringsAsBase64Parameter

      Encode a list of strings as base64 data to be used in a request parameter.

      Parameters:
      strings - the strings to encode
      Returns:
      the resulting base64 data
    • escape

      public static String escape(String source)
      Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function, using "UTF-8" for character encoding encoding.

      JavaScript "decodeURIcomponent" can decode Strings that have been encoded using this method.

      Directly exposed for JSP EL, not through CmsJspElFunctions.

      Parameters:
      source - The text to be encoded
      Returns:
      The encoded string
      See Also:
    • escape

      public static String escape(String source, String encoding)
      Encodes a String in a way similar to the JavaScript "encodeURIcomponent" function.

      JavaScript "decodeURIcomponent" can decode Strings that have been encoded using this method, provided "UTF-8" has been used as encoding.

      Directly exposed for JSP EL, not through CmsJspElFunctions.

      Parameters:
      source - The text to be encoded
      encoding - the encoding type
      Returns:
      The encoded string
    • escapeHtml

      public static String escapeHtml(String source)
      Escapes special characters in a HTML-String with their number-based entity representation, for example & becomes &#38;.

      A character num is replaced if
      ((ch != 32) && ((ch > 122) || (ch < 48) || (ch == 60) || (ch == 62)))

      Parameters:
      source - the String to escape
      Returns:
      String the escaped String
      See Also:
    • escapeNonAscii

      public static String escapeNonAscii(String source)
      Escapes non ASCII characters in a HTML-String with their number-based entity representation, for example & becomes &#38;.

      A character num is replaced if
      (ch > 255)

      Parameters:
      source - the String to escape
      Returns:
      String the escaped String
      See Also:
    • escapeSql

      public static String escapeSql(String source)
      A simple method to avoid injection.

      Replaces all single quotes to double single quotes in the value parameter of the SQL statement.

      Parameters:
      source - the String to escape SQL from
      Returns:
      the escaped value of the parameter source
    • escapeSqlLikePattern

      public static String escapeSqlLikePattern(String pattern, char escapeChar)
      Escapes the wildcard characters in a string which will be used as the pattern for a SQL LIKE clause.

      Parameters:
      pattern - the pattern
      escapeChar - the character which should be used as the escape character
      Returns:
      the escaped pattern
    • escapeWBlanks

      public static String escapeWBlanks(String source, String encoding)
      Encodes a String in a way similar JavaScript "encodeURIcomponent" function.

      Multiple blanks are encoded _multiply_ with %20.

      Parameters:
      source - The text to be encoded
      encoding - the encoding type
      Returns:
      The encoded String
    • escapeXml

      public static String escapeXml(String source)
      Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.

      This method replaces the following characters in a String:

      • < with &lt;
      • > with &gt;
      • & with &amp;
      • " with &quot;

      Parameters:
      source - the string to escape
      Returns:
      the escaped string
      See Also:
    • escapeXml

      public static String escapeXml(String source, boolean doubleEscape)
      Escapes a String so it may be printed as text content or attribute value in a HTML page or an XML file.

      This method replaces the following characters in a String:

      • < with &lt;
      • > with &gt;
      • & with &amp;
      • " with &quot;

      Parameters:
      source - the string to escape
      doubleEscape - if false, all entities that already are escaped are left untouched
      Returns:
      the escaped string
      See Also:
    • lookupEncoding

      public static String lookupEncoding(String encoding, String fallback)
      Checks if a given encoding name is actually supported, and if so resolves it to it's canonical name, if not it returns the given fallback value.

      Charsets have a set of aliases. For example, valid aliases for "UTF-8" are "UTF8", "utf-8" or "utf8". This method resolves any given valid charset name to it's "canonical" form, so that simple String comparison can be used when checking charset names internally later.

      Please see http://www.iana.org/assignments/character-sets for a list of valid charset alias names.

      Parameters:
      encoding - the encoding to check and resolve
      fallback - the fallback encoding scheme
      Returns:
      the resolved encoding name, or the fallback value
    • redecodeUriComponent

      public static String redecodeUriComponent(String input)
      Re-decodes a String that has not been correctly decoded and thus has scrambled character bytes.

      This is an equivalent to the JavaScript "decodeURIComponent" function. It converts from the default "UTF-8" to the currently selected system encoding.

      Parameters:
      input - the String to convert
      Returns:
      String the converted String
    • unescape

      public static String unescape(String source)
      Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function, using "UTF-8" for character encoding.

      This method can decode Strings that have been encoded in JavaScript with "encodeURIcomponent".

      Directly exposed for JSP EL, not through CmsJspElFunctions.

      Parameters:
      source - The String to be decoded
      Returns:
      The decoded String
    • unescape

      public static String unescape(String source, String encoding)
      Decodes a String in a way similar to the JavaScript "decodeURIcomponent" function.

      This method can decode Strings that have been encoded in JavaScript with "encodeURIcomponent", provided "UTF-8" is used as encoding.

      Directly exposed for JSP EL, not through CmsJspElFunctions.

      Parameters:
      source - The String to be decoded
      encoding - the encoding type
      Returns:
      The decoded String