Class CmsHtml2TextConverter

java.lang.Object
org.htmlparser.visitors.NodeVisitor
org.opencms.util.CmsHtmlParser
org.opencms.util.CmsHtml2TextConverter
All Implemented Interfaces:
I_CmsHtmlNodeVisitor

Extracts the HTML page content.

  • Constructor Details

  • Method Details

    • html2text

      public static String html2text(String html, String encoding) throws Exception
      Extracts the text from the given html content, assuming the given html encoding.

      Parameters:
      html - the content to extract the plain text from
      encoding - the encoding to use
      Returns:
      the text extracted from the given html content
      Throws:
      Exception - if something goes wrong
    • visitEndTag

      public void visitEndTag(org.htmlparser.Tag tag)
      Description copied from interface: I_CmsHtmlNodeVisitor
      Visitor method (callback) invoked when a closing Tag is encountered.

      Specified by:
      visitEndTag in interface I_CmsHtmlNodeVisitor
      Overrides:
      visitEndTag in class CmsHtmlParser
      Parameters:
      tag - the tag that is ended.
      See Also:
      • NodeVisitor.visitEndTag(org.htmlparser.Tag)
    • visitStringNode

      public void visitStringNode(org.htmlparser.Text text)
      Description copied from interface: I_CmsHtmlNodeVisitor
      Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.

      Specified by:
      visitStringNode in interface I_CmsHtmlNodeVisitor
      Overrides:
      visitStringNode in class CmsHtmlParser
      Parameters:
      text - the text that is visited.
      See Also:
      • NodeVisitor.visitStringNode(org.htmlparser.Text)
    • visitTag

      public void visitTag(org.htmlparser.Tag tag)
      Description copied from interface: I_CmsHtmlNodeVisitor
      Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.

      Specified by:
      visitTag in interface I_CmsHtmlNodeVisitor
      Overrides:
      visitTag in class CmsHtmlParser
      Parameters:
      tag - the tag that is visited.
      See Also:
      • NodeVisitor.visitTag(org.htmlparser.Tag)