Package org.opencms.util
Class CmsHtmlExtractor
- java.lang.Object
-
- org.opencms.util.CmsHtmlExtractor
-
public final class CmsHtmlExtractor extends java.lang.Object
Extracts plain text from HTML.- Since:
- 6.0.0
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
extractText(java.io.InputStream in, java.lang.String encoding)
Extract the text from a HTML page.static java.lang.String
extractText(java.lang.String content, java.lang.String encoding)
Extract the text from a HTML page.
-
-
-
Method Detail
-
extractText
public static java.lang.String extractText(java.io.InputStream in, java.lang.String encoding) throws org.htmlparser.util.ParserException, java.io.UnsupportedEncodingException
Extract the text from a HTML page.- Parameters:
in
- the html content input streamencoding
- the encoding of the content- Returns:
- the extracted text from the page
- Throws:
org.htmlparser.util.ParserException
- if the parsing of the HTML failedjava.io.UnsupportedEncodingException
- if the given encoding is not supported
-
extractText
public static java.lang.String extractText(java.lang.String content, java.lang.String encoding) throws org.htmlparser.util.ParserException, java.io.UnsupportedEncodingException
Extract the text from a HTML page.- Parameters:
content
- the html contentencoding
- the encoding of the content- Returns:
- the extracted text from the page
- Throws:
org.htmlparser.util.ParserException
- if the parsing of the HTML failedjava.io.UnsupportedEncodingException
- if the given encoding is not supported
-
-