Package org.opencms.util
Class CmsHtmlExtractor
java.lang.Object
org.opencms.util.CmsHtmlExtractor
Extracts plain text from HTML.
- Since:
- 6.0.0
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
extractText
(InputStream in, String encoding) Extract the text from a HTML page.static String
extractText
(String content, String encoding) Extract the text from a HTML page.
-
Method Details
-
extractText
public static String extractText(InputStream in, String encoding) throws org.htmlparser.util.ParserException, UnsupportedEncodingException Extract the text from a HTML page.- Parameters:
in
- the html content input streamencoding
- the encoding of the content- Returns:
- the extracted text from the page
- Throws:
org.htmlparser.util.ParserException
- if the parsing of the HTML failedUnsupportedEncodingException
- if the given encoding is not supported
-
extractText
public static String extractText(String content, String encoding) throws org.htmlparser.util.ParserException, UnsupportedEncodingException Extract the text from a HTML page.- Parameters:
content
- the html contentencoding
- the encoding of the content- Returns:
- the extracted text from the page
- Throws:
org.htmlparser.util.ParserException
- if the parsing of the HTML failedUnsupportedEncodingException
- if the given encoding is not supported
-