Package org.opencms.util
Class CmsHtml2TextConverter
java.lang.Object
org.htmlparser.visitors.NodeVisitor
org.opencms.util.CmsHtmlParser
org.opencms.util.CmsHtml2TextConverter
- All Implemented Interfaces:
I_CmsHtmlNodeVisitor
Extracts the HTML page content.
-
Field Summary
Fields inherited from class org.opencms.util.CmsHtmlParser
m_echo, m_noAutoCloseTags, m_result, TAG_ARRAY, TAG_LIST -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringExtracts the text from the given html content, assuming the given html encoding.voidvisitEndTag(org.htmlparser.Tag tag) Visitor method (callback) invoked when a closing Tag is encountered.voidvisitStringNode(org.htmlparser.Text text) Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.voidvisitTag(org.htmlparser.Tag tag) Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.Methods inherited from class org.opencms.util.CmsHtmlParser
collapse, configureNoAutoCorrectionTags, getConfiguration, getNoAutoCloseTags, getResult, getTagHtml, process, setConfiguration, setNoAutoCloseTags, visitRemarkNodeMethods inherited from class org.htmlparser.visitors.NodeVisitor
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf
-
Constructor Details
-
CmsHtml2TextConverter
public CmsHtml2TextConverter()Creates a new instance of the html converter.
-
-
Method Details
-
html2text
Extracts the text from the given html content, assuming the given html encoding.- Parameters:
html- the content to extract the plain text fromencoding- the encoding to use- Returns:
- the text extracted from the given html content
- Throws:
Exception- if something goes wrong
-
visitEndTag
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a closing Tag is encountered.- Specified by:
visitEndTagin interfaceI_CmsHtmlNodeVisitor- Overrides:
visitEndTagin classCmsHtmlParser- Parameters:
tag- the tag that is ended.- See Also:
-
visitStringNode
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a remark Tag (HTML comment) is encountered.- Specified by:
visitStringNodein interfaceI_CmsHtmlNodeVisitor- Overrides:
visitStringNodein classCmsHtmlParser- Parameters:
text- the text that is visited.- See Also:
-
visitTag
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a starting Tag (HTML comment) is encountered.- Specified by:
visitTagin interfaceI_CmsHtmlNodeVisitor- Overrides:
visitTagin classCmsHtmlParser- Parameters:
tag- the tag that is visited.- See Also:
-