Class CmsHtmlParser

    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean m_echo
      Indicates if "echo" mode is on, that is all content is written to the result by default.
      protected java.util.List<java.lang.String> m_noAutoCloseTags
      List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.
      protected java.lang.StringBuffer m_result
      The buffer to write the out to.
      protected static java.lang.String[] TAG_ARRAY
      The array of supported tag names.
      protected static java.util.List<java.lang.String> TAG_LIST
      The list of supported tag names.
    • Constructor Summary

      Constructors 
      Constructor Description
      CmsHtmlParser()
      Creates a new instance of the html converter with echo mode set to false.
      CmsHtmlParser​(boolean echo)
      Creates a new instance of the html converter.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.lang.String collapse​(java.lang.String string)
      Collapse HTML whitespace in the given String.
      protected org.htmlparser.PrototypicalNodeFactory configureNoAutoCorrectionTags()
      Internally degrades Composite tags that do have children in the DOM tree to simple single tags.
      java.lang.String getConfiguration()
      Returns the configuartion String of this visitor or the empty String if was not provided before.
      java.util.List<java.lang.String> getNoAutoCloseTags()
      Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.
      java.lang.String getResult()
      Returns the text extraction result.
      java.lang.String getTagHtml​(org.htmlparser.Tag tag)
      Returns the HTML for the given tag itself (not the tag content).
      java.lang.String process​(java.lang.String html, java.lang.String encoding)
      Extracts the text from the given html content, assuming the given html encoding.
      void setConfiguration​(java.lang.String configuration)
      Set a configuartion String for this visitor.
      void setNoAutoCloseTags​(java.util.List<java.lang.String> noAutoCloseTagList)
      Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.
      void visitEndTag​(org.htmlparser.Tag tag)
      Visitor method (callback) invoked when a closing Tag is encountered.
      void visitRemarkNode​(org.htmlparser.Remark remark)
      Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
      void visitStringNode​(org.htmlparser.Text text)
      Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
      void visitTag​(org.htmlparser.Tag tag)
      Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
      • Methods inherited from class org.htmlparser.visitors.NodeVisitor

        beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • m_noAutoCloseTags

        protected java.util.List<java.lang.String> m_noAutoCloseTags
        List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.
      • TAG_ARRAY

        protected static final java.lang.String[] TAG_ARRAY
        The array of supported tag names.
      • TAG_LIST

        protected static final java.util.List<java.lang.String> TAG_LIST
        The list of supported tag names.
      • m_echo

        protected boolean m_echo
        Indicates if "echo" mode is on, that is all content is written to the result by default.
      • m_result

        protected java.lang.StringBuffer m_result
        The buffer to write the out to.
    • Constructor Detail

      • CmsHtmlParser

        public CmsHtmlParser()
        Creates a new instance of the html converter with echo mode set to false.

      • CmsHtmlParser

        public CmsHtmlParser​(boolean echo)
        Creates a new instance of the html converter.

        Parameters:
        echo - indicates if "echo" mode is on, that is all content is written to the result