Package org.opencms.search.extractors
Class CmsExtractorPdf
java.lang.Object
org.opencms.search.extractors.A_CmsTextExtractor
org.opencms.search.extractors.CmsExtractorPdf
- All Implemented Interfaces:
- I_CmsTextExtractor
Extracts the text from a PDF document.
- Since:
- 6.0.0
- 
Method SummaryModifier and TypeMethodDescriptionExtracts the text and meta information from the document on the input stream.static I_CmsTextExtractorReturns an instance of this text extractor.Methods inherited from class org.opencms.search.extractors.A_CmsTextExtractorcombineContentItem, extractText, extractText, extractText, extractText, removeControlChars
- 
Method Details- 
getExtractorReturns an instance of this text extractor.- Returns:
- an instance of this text extractor
 
- 
extractTextDescription copied from interface:I_CmsTextExtractorExtracts the text and meta information from the document on the input stream.The encoding of the input stream is either not required (the document type may have one common default encoding) or the extractor is able to divine the encoding from the provided input stream automatically. Delivers is the same result as calling I_CmsTextExtractor.extractText(InputStream, String)String == null.- Specified by:
- extractTextin interface- I_CmsTextExtractor
- Overrides:
- extractTextin class- A_CmsTextExtractor
- Parameters:
- in- the input stream for the document to extract the text from
- Returns:
- the extracted text and meta information
- Throws:
- Exception- if the text extration fails
- See Also:
 
 
-