Interface I_CmsHtmlNodeVisitor

All Known Implementing Classes:
CmsHtml2TextConverter, CmsHtmlDecorator, CmsHtmlParser, CmsLinkProcessor

public interface I_CmsHtmlNodeVisitor
Interface for a combination of a visitor of HTML documents along with the hook to start the parser / lexer that triggers the visit.

Since:
6.1.3
  • Method Summary

    Modifier and Type
    Method
    Description
    Returns the configuartion String of this visitor or the empty String if was not provided before.
    Returns the text extraction result.
    process(String html, String encoding)
    Extracts the text from the given html content, assuming the given html encoding.
    void
    setConfiguration(String configuration)
    Set a configuartion String for this visitor.
    void
    setNoAutoCloseTags(List<String> noAutoCloseTags)
    Sets a list of upper case tag names for which parsing / visitng should not correct missing closing tags.
    void
    visitEndTag(org.htmlparser.Tag tag)
    Visitor method (callback) invoked when a closing Tag is encountered.
    void
    visitRemarkNode(org.htmlparser.Remark remark)
    Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
    void
    visitStringNode(org.htmlparser.Text text)
    Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
    void
    visitTag(org.htmlparser.Tag tag)
    Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
  • Method Details

    • getConfiguration

      Returns the configuartion String of this visitor or the empty String if was not provided before.

      Returns:
      the configuartion String of this visitor - by this contract never null but an empty String if not provided.
      See Also:
    • getResult

      Returns the text extraction result.

      Returns:
      the text extraction result
    • process

      String process(String html, String encoding) throws org.htmlparser.util.ParserException
      Extracts the text from the given html content, assuming the given html encoding.

      Parameters:
      html - the content to extract the plain text from
      encoding - the encoding to use
      Returns:
      the text extracted from the given html content
      Throws:
      org.htmlparser.util.ParserException - if something goes wrong
    • setConfiguration

      void setConfiguration(String configuration)
      Set a configuartion String for this visitor.

      This will most likely be done with data from an xsd, custom jsp tag, ...

      Parameters:
      configuration - the configuration of this visitor to set.
    • setNoAutoCloseTags

      void setNoAutoCloseTags(List<String> noAutoCloseTags)
      Sets a list of upper case tag names for which parsing / visitng should not correct missing closing tags.

      This has to be used before process(String, String) is invoked to take an effect.

      Parameters:
      noAutoCloseTags - a list of upper case tag names for which parsing / visiting should not correct missing closing tags to set.
    • visitEndTag

      void visitEndTag(org.htmlparser.Tag tag)
      Visitor method (callback) invoked when a closing Tag is encountered.

      Parameters:
      tag - the tag that is ended.
      See Also:
      • NodeVisitor.visitEndTag(org.htmlparser.Tag)
    • visitRemarkNode

      void visitRemarkNode(org.htmlparser.Remark remark)
      Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.

      Parameters:
      remark - the remark Tag to visit.
      See Also:
      • NodeVisitor.visitRemarkNode(org.htmlparser.Remark)
    • visitStringNode

      void visitStringNode(org.htmlparser.Text text)
      Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.

      Parameters:
      text - the text that is visited.
      See Also:
      • NodeVisitor.visitStringNode(org.htmlparser.Text)
    • visitTag

      void visitTag(org.htmlparser.Tag tag)
      Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.

      Parameters:
      tag - the tag that is visited.
      See Also:
      • NodeVisitor.visitTag(org.htmlparser.Tag)