Package org.opencms.util
Class CmsHtmlParser
java.lang.Object
org.htmlparser.visitors.NodeVisitor
org.opencms.util.CmsHtmlParser
- All Implemented Interfaces:
I_CmsHtmlNodeVisitor
- Direct Known Subclasses:
CmsHtml2TextConverter
,CmsHtmlDecorator
,CmsLinkProcessor
public class CmsHtmlParser
extends org.htmlparser.visitors.NodeVisitor
implements I_CmsHtmlNodeVisitor
Base utility class for OpenCms
NodeVisitor
implementations, which provides some often used utility functions.
This base implementation is only a "pass through" class, that is the content is parsed, but the generated result is exactly identical to the input.
- Since:
- 6.2.0
-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
Indicates if "echo" mode is on, that is all content is written to the result by default.List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.protected StringBuffer
The buffer to write the out to.protected static final String[]
The array of supported tag names.The list of supported tag names. -
Constructor Summary
ConstructorDescriptionCreates a new instance of the html converter with echo mode set tofalse
.CmsHtmlParser
(boolean echo) Creates a new instance of the html converter. -
Method Summary
Modifier and TypeMethodDescriptionprotected String
Collapse HTML whitespace in the given String.protected org.htmlparser.PrototypicalNodeFactory
Internally degrades Composite tags that do have children in the DOM tree to simple single tags.Returns the configuartion String of this visitor or the empty String if was not provided before.Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.Returns the text extraction result.getTagHtml
(org.htmlparser.Tag tag) Returns the HTML for the given tag itself (not the tag content).Extracts the text from the given html content, assuming the given html encoding.void
setConfiguration
(String configuration) Set a configuartion String for this visitor.void
setNoAutoCloseTags
(List<String> noAutoCloseTagList) Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.void
visitEndTag
(org.htmlparser.Tag tag) Visitor method (callback) invoked when a closing Tag is encountered.void
visitRemarkNode
(org.htmlparser.Remark remark) Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.void
visitStringNode
(org.htmlparser.Text text) Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.void
visitTag
(org.htmlparser.Tag tag) Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.Methods inherited from class org.htmlparser.visitors.NodeVisitor
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf
-
Field Details
-
m_noAutoCloseTags
List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing. -
TAG_ARRAY
The array of supported tag names. -
TAG_LIST
The list of supported tag names. -
m_echo
Indicates if "echo" mode is on, that is all content is written to the result by default. -
m_result
The buffer to write the out to.
-
-
Constructor Details
-
CmsHtmlParser
public CmsHtmlParser()Creates a new instance of the html converter with echo mode set tofalse
. -
CmsHtmlParser
Creates a new instance of the html converter.- Parameters:
echo
- indicates if "echo" mode is on, that is all content is written to the result
-
-
Method Details
-
configureNoAutoCorrectionTags
Internally degrades Composite tags that do have children in the DOM tree to simple single tags. This allows to avoid auto correction of unclosed HTML tags.- Returns:
- A node factory that will not autocorrect open tags specified via
setNoAutoCloseTags(List)
-
getConfiguration
Description copied from interface:I_CmsHtmlNodeVisitor
Returns the configuartion String of this visitor or the empty String if was not provided before.- Specified by:
getConfiguration
in interfaceI_CmsHtmlNodeVisitor
- Returns:
- the configuartion String of this visitor - by this contract never null but an empty String if not provided.
- See Also:
-
getResult
Description copied from interface:I_CmsHtmlNodeVisitor
Returns the text extraction result.- Specified by:
getResult
in interfaceI_CmsHtmlNodeVisitor
- Returns:
- the text extraction result
- See Also:
-
getTagHtml
Returns the HTML for the given tag itself (not the tag content).- Parameters:
tag
- the tag to create the HTML for- Returns:
- the HTML for the given tag
-
process
Description copied from interface:I_CmsHtmlNodeVisitor
Extracts the text from the given html content, assuming the given html encoding.- Specified by:
process
in interfaceI_CmsHtmlNodeVisitor
- Parameters:
html
- the content to extract the plain text fromencoding
- the encoding to use- Returns:
- the text extracted from the given html content
- Throws:
org.htmlparser.util.ParserException
- if something goes wrong- See Also:
-
setConfiguration
Description copied from interface:I_CmsHtmlNodeVisitor
Set a configuartion String for this visitor.This will most likely be done with data from an xsd, custom jsp tag, ...
- Specified by:
setConfiguration
in interfaceI_CmsHtmlNodeVisitor
- Parameters:
configuration
- the configuration of this visitor to set.- See Also:
-
visitEndTag
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a closing Tag is encountered.- Specified by:
visitEndTag
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitEndTag
in classorg.htmlparser.visitors.NodeVisitor
- Parameters:
tag
- the tag that is ended.- See Also:
-
visitRemarkNode
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.- Specified by:
visitRemarkNode
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitRemarkNode
in classorg.htmlparser.visitors.NodeVisitor
- Parameters:
remark
- the remark Tag to visit.- See Also:
-
visitStringNode
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.- Specified by:
visitStringNode
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitStringNode
in classorg.htmlparser.visitors.NodeVisitor
- Parameters:
text
- the text that is visited.- See Also:
-
visitTag
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.- Specified by:
visitTag
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitTag
in classorg.htmlparser.visitors.NodeVisitor
- Parameters:
tag
- the tag that is visited.- See Also:
-
collapse
Collapse HTML whitespace in the given String.- Parameters:
string
- the string to collapse- Returns:
- the input String with all HTML whitespace collapsed
-
getNoAutoCloseTags
Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.- Returns:
- a List of upper case tag names for which parsing / visiting will not correct missing closing tags
-
setNoAutoCloseTags
Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.- Specified by:
setNoAutoCloseTags
in interfaceI_CmsHtmlNodeVisitor
- Parameters:
noAutoCloseTagList
- a list of upper case tag names for which parsing / visiting should not correct missing closing tags to set.
-