Package org.opencms.util
Class CmsHtmlExtractor
java.lang.Object
org.opencms.util.CmsHtmlExtractor
Extracts plain text from HTML.
- Since:
- 6.0.0
-
Method Summary
Modifier and TypeMethodDescriptionstatic StringextractText(InputStream in, String encoding) Extract the text from a HTML page.static StringextractText(String content, String encoding) Extract the text from a HTML page.
-
Method Details
-
extractText
public static String extractText(InputStream in, String encoding) throws org.htmlparser.util.ParserException, UnsupportedEncodingException Extract the text from a HTML page.- Parameters:
in- the html content input streamencoding- the encoding of the content- Returns:
- the extracted text from the page
- Throws:
org.htmlparser.util.ParserException- if the parsing of the HTML failedUnsupportedEncodingException- if the given encoding is not supported
-
extractText
public static String extractText(String content, String encoding) throws org.htmlparser.util.ParserException, UnsupportedEncodingException Extract the text from a HTML page.- Parameters:
content- the html contentencoding- the encoding of the content- Returns:
- the extracted text from the page
- Throws:
org.htmlparser.util.ParserException- if the parsing of the HTML failedUnsupportedEncodingException- if the given encoding is not supported
-