Class CmsHtmlExtractor


  • public final class CmsHtmlExtractor
    extends java.lang.Object
    Extracts plain text from HTML.

    Since:
    6.0.0
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String extractText​(java.io.InputStream in, java.lang.String encoding)
      Extract the text from a HTML page.
      static java.lang.String extractText​(java.lang.String content, java.lang.String encoding)
      Extract the text from a HTML page.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • extractText

        public static java.lang.String extractText​(java.io.InputStream in,
                                                   java.lang.String encoding)
                                            throws org.htmlparser.util.ParserException,
                                                   java.io.UnsupportedEncodingException
        Extract the text from a HTML page.

        Parameters:
        in - the html content input stream
        encoding - the encoding of the content
        Returns:
        the extracted text from the page
        Throws:
        org.htmlparser.util.ParserException - if the parsing of the HTML failed
        java.io.UnsupportedEncodingException - if the given encoding is not supported
      • extractText

        public static java.lang.String extractText​(java.lang.String content,
                                                   java.lang.String encoding)
                                            throws org.htmlparser.util.ParserException,
                                                   java.io.UnsupportedEncodingException
        Extract the text from a HTML page.

        Parameters:
        content - the html content
        encoding - the encoding of the content
        Returns:
        the extracted text from the page
        Throws:
        org.htmlparser.util.ParserException - if the parsing of the HTML failed
        java.io.UnsupportedEncodingException - if the given encoding is not supported