Apache POI - Encryption support
Overview
Apache POI contains support for reading few variants of encrypted office files:
- XLS - RC4 Encryption
- XML-based formats (XLSX, DOCX and etc) - AES and Agile Encryption
Some "write-protected" files are encrypted with build-in password, POI can read that files too.
XLS
When HSSF receive encrypted file, it tries to decode it with MSOffice build-in password.
Use static method setCurrentUserPassword(String password) of org.apache.poi.hssf.record.crypto.Biff8EncryptionKey to
set password. It sets thread local variable. Do not forget to reset it to null after text extraction.
XML-based formats
XML-based formats are stored in OLE-package stream "EncryptedPackage". Use org.apache.poi.poifs.crypt.Decryptor
to decode file:
EncryptionInfo info = new EncryptionInfo(filesystem);
Decryptor d = Decryptor.getInstance(info);
try {
if (!d.verifyPassword(password)) {
throw new RuntimeException("Unable to process: document is encrypted");
}
InputStream dataStream = d.getDataStream(filesystem);
// parse dataStream
} catch (GeneralSecurityException ex) {
throw new RuntimeException("Unable to process encrypted document", ex);
}
If you want to read file encrypted with build-in password, use Decryptor.DEFAULT_PASSWORD.
by Maxim Valyanskiy