Class CmsHtmlParser

java.lang.Object
org.htmlparser.visitors.NodeVisitor
org.opencms.util.CmsHtmlParser
All Implemented Interfaces:
I_CmsHtmlNodeVisitor
Direct Known Subclasses:
CmsHtml2TextConverter, CmsHtmlDecorator, CmsLinkProcessor

public class CmsHtmlParser extends org.htmlparser.visitors.NodeVisitor implements I_CmsHtmlNodeVisitor
Base utility class for OpenCms NodeVisitor implementations, which provides some often used utility functions.

This base implementation is only a "pass through" class, that is the content is parsed, but the generated result is exactly identical to the input.

Since:
6.2.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected boolean
    Indicates if "echo" mode is on, that is all content is written to the result by default.
    protected List<String>
    List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.
    protected StringBuffer
    The buffer to write the out to.
    protected static final String[]
    The array of supported tag names.
    protected static final List<String>
    The list of supported tag names.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new instance of the html converter with echo mode set to false.
    CmsHtmlParser(boolean echo)
    Creates a new instance of the html converter.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected String
    collapse(String string)
    Collapse HTML whitespace in the given String.
    protected org.htmlparser.PrototypicalNodeFactory
    Internally degrades Composite tags that do have children in the DOM tree to simple single tags.
    Returns the configuartion String of this visitor or the empty String if was not provided before.
    Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.
    Returns the text extraction result.
    getTagHtml(org.htmlparser.Tag tag)
    Returns the HTML for the given tag itself (not the tag content).
    process(String html, String encoding)
    Extracts the text from the given html content, assuming the given html encoding.
    void
    setConfiguration(String configuration)
    Set a configuartion String for this visitor.
    void
    setNoAutoCloseTags(List<String> noAutoCloseTagList)
    Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.
    void
    visitEndTag(org.htmlparser.Tag tag)
    Visitor method (callback) invoked when a closing Tag is encountered.
    void
    visitRemarkNode(org.htmlparser.Remark remark)
    Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
    void
    visitStringNode(org.htmlparser.Text text)
    Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
    void
    visitTag(org.htmlparser.Tag tag)
    Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.

    Methods inherited from class org.htmlparser.visitors.NodeVisitor

    beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait