Package org.opencms.util
Class CmsHtml2TextConverter
java.lang.Object
org.htmlparser.visitors.NodeVisitor
org.opencms.util.CmsHtmlParser
org.opencms.util.CmsHtml2TextConverter
- All Implemented Interfaces:
I_CmsHtmlNodeVisitor
Extracts the HTML page content.
-
Field Summary
Fields inherited from class org.opencms.util.CmsHtmlParser
m_echo, m_noAutoCloseTags, m_result, TAG_ARRAY, TAG_LIST
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
Extracts the text from the given html content, assuming the given html encoding.void
visitEndTag
(org.htmlparser.Tag tag) Visitor method (callback) invoked when a closing Tag is encountered.void
visitStringNode
(org.htmlparser.Text text) Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.void
visitTag
(org.htmlparser.Tag tag) Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.Methods inherited from class org.opencms.util.CmsHtmlParser
collapse, configureNoAutoCorrectionTags, getConfiguration, getNoAutoCloseTags, getResult, getTagHtml, process, setConfiguration, setNoAutoCloseTags, visitRemarkNode
Methods inherited from class org.htmlparser.visitors.NodeVisitor
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf
-
Constructor Details
-
CmsHtml2TextConverter
public CmsHtml2TextConverter()Creates a new instance of the html converter.
-
-
Method Details
-
html2text
Extracts the text from the given html content, assuming the given html encoding.- Parameters:
html
- the content to extract the plain text fromencoding
- the encoding to use- Returns:
- the text extracted from the given html content
- Throws:
Exception
- if something goes wrong
-
visitEndTag
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a closing Tag is encountered.- Specified by:
visitEndTag
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitEndTag
in classCmsHtmlParser
- Parameters:
tag
- the tag that is ended.- See Also:
-
NodeVisitor.visitEndTag(org.htmlparser.Tag)
-
visitStringNode
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.- Specified by:
visitStringNode
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitStringNode
in classCmsHtmlParser
- Parameters:
text
- the text that is visited.- See Also:
-
NodeVisitor.visitStringNode(org.htmlparser.Text)
-
visitTag
Description copied from interface:I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.- Specified by:
visitTag
in interfaceI_CmsHtmlNodeVisitor
- Overrides:
visitTag
in classCmsHtmlParser
- Parameters:
tag
- the tag that is visited.- See Also:
-
NodeVisitor.visitTag(org.htmlparser.Tag)
-