Class CmsHtmlStripper
All tags that are not explicitly allowed via invocation of one of the
addPreserve...
methods will be missing in the result of the method
.stripHtml(String)
Instances are reusable but not shareable (multithreading). If configuration should be changed
between subsequent invocations of
method
stripHtml(String)
has to be called.
reset()
- Since:
- 6.9.2
-
Constructor Summary
ConstructorDescriptionDefault constructor that turns echo on and uses the settings for replacing tags.CmsHtmlStripper
(boolean useTidy) Creates an instance with control whether tidy is used. -
Method Summary
Modifier and TypeMethodDescriptionboolean
addPreserveTag
(String tagName) Adds a tag that will be preserved by
.stripHtml(String)
void
addPreserveTagList
(List<String> preserveTags) Convenience method for adding several tags to preserve.void
addPreserveTags
(String tagList, char separator) Convenience method for adding several tags to preserve in form of a delimiter-separated String.void
reset()
Resets the configuration of the tags to preserve.Extracts the text from the given html content, assuming the given html encoding.
-
Constructor Details
-
CmsHtmlStripper
public CmsHtmlStripper()Default constructor that turns echo on and uses the settings for replacing tags. -
CmsHtmlStripper
Creates an instance with control whether tidy is used.- Parameters:
useTidy
- if true tidy will be used
-
-
Method Details
-
addPreserveTag
Adds a tag that will be preserved by
.stripHtml(String)
- Parameters:
tagName
- the name of the tag to keep (case insensitive)- Returns:
- true if the tagName was added correctly to the internal engine
-
addPreserveTagList
Convenience method for adding several tags to preserve.- Parameters:
preserveTags
- aList<String>
with the case-insensitive tag names of the tags to preserve- See Also:
-
addPreserveTags
Convenience method for adding several tags to preserve in form of a delimiter-separated String.The String will be
withCmsStringUtil.splitAsList(String, char, boolean)
tagList
as the first argument,separator
as the second argument and the third argument set to true (trimming - support).- Parameters:
tagList
- a delimiter-separated String with case-insensitive tag names to preserve bystripHtml(String)
separator
- the delimiter that separates tag names in thetagList
argument- See Also:
-
reset
Resets the configuration of the tags to preserve.This is called from the constructor and only has to be called if this instance is reused with a differen configuration (of tags to keep).
-
stripHtml
Extracts the text from the given html content, assuming the given html encoding.Additionally tags are replaced / removed according to the configuration of this instance.
Please note:
There are static process methods in the superclass that will not do the replacements / removals. Don't mix them up with this method.- Parameters:
html
- the content to extract the plain text from.- Returns:
- the text extracted from the given html content.
- Throws:
org.htmlparser.util.ParserException
- if something goes wrong.
-