Interface DocumentExtractor


public interface DocumentExtractor
Extract a document.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String[]
    'autumo Documents' module document all fields.
    static final String[]
    'autumo Documents' module document source fields.
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.lucene.document.Document
    extract(File file)
    Extract document to the lucene fields DOCUMENTS_SOURCE_FIELDS
  • Field Details

    • DOCUMENTS_SOURCE_FIELDS

      static final String[] DOCUMENTS_SOURCE_FIELDS
      'autumo Documents' module document source fields.
    • DOCUMENTS_ALL_FIELDS

      static final String[] DOCUMENTS_ALL_FIELDS
      'autumo Documents' module document all fields. Additional: file name and file extension before source fields.
  • Method Details

    • extract

      org.apache.lucene.document.Document extract(File file) throws Exception
      Extract document to the lucene fields DOCUMENTS_SOURCE_FIELDS
      Parameters:
      file - file
      Returns:
      extracted document
      Throws:
      Exception - if extraction fails