Tuesday, April 14, 2015

Managing encrypted files

Imagine that we have a system which generates a certain amount of small .pdf or .xlsx reports and stores them on disk. The requirement is to encrypt those reports and store them as encrypted files. Original file must never hit the disk and all the encryption must be done in memory.

In general workflow will be the following: some external method generates the file as byte array. This byte array passed to the module, which generates random token for this file, stores the token in database, encrypts the file and writes it to disk. Decryption will consist of reading bytes from file on disk, retrieving token for this file from database and decrypting byte array based on this token:

file encryption workflow

This is how it looks in database:

encrypted files in database

And this is what we get on disk:

encrypted files on disk

If user has access both to database and the source code it will take nothing to decrypt the file. However, if user will only have database or file storage access he won't do much with file tokens.The main goal here is to hide information from people who has access to the file storage.

Basically, to achieve this we need tools to:
1. Encrypt / Decrypt byte array
2. Encrypt / Decrypt string (for file names)
3. Generate random string
4. Generate file token

For encrypting process let's use AesManaged (System.Security.Cryptography). This is a managed implementation of the Advanced Encryption Standard symmetric algorithm - a block cipher algorithm with 128 bit block size.

MSDN says that this particular implementation will throw CryptographicException in case if FIPS-mode is enabled in Windows. FIPS is US Federal Information Processing Standard which defines a bunch of approved cryptographic algorithms. However, Microsoft itself is not sure anymore whether FIPS-mode should or should not be enabled.

First of all we need to implement some sort of a token which will be used by aes-encryptor:

 public enum AesKeySize
    {
        Key128 = 128,
        Key192 = 192,
        Key256 = 256
    }

    public enum OperationType
    {
        Encrypt,
        Decrypt
    }

public class AesToken
{
        // 32 byte salt for generating key and vector bytes
        private readonly byte[] _salt = { 237, 244, 226, 149, 139, 150, 201, 43, 51, 33, 215, 13, 129, 2, 148, 136, 93, 216, 58, 152, 44, 143, 46, 182, 254, 97, 134, 60, 5, 255, 42, 12 };

        // number of iterations used to generate key/vector bytes
        private const int _iterations = 1000;

        // 128 bit is only legal block size for AES
        public int BlockSize = 128;

        public CipherMode CipherMode = CipherMode.CBC;

        public AesKeySize KeySize { get; private set; }
        public byte[] Key { get; private set; }
        public byte[] Vector { get; private set; }

        /// <summary>
        /// Generates token with Key and IV for AES encryption
        /// </summary>
        /// <param name="token">Unique token used to encrypt/decrypt data (i.e. if encrypting files: token should be randomly generated for each file)</param>
        /// <param name="keySize">AES encryption key length</param>
        public AesToken(string token, AesKeySize keySize = AesKeySize.Key256)
        {
            KeySize = keySize;
            var keyGenerator = new Rfc2898DeriveBytes(token, _salt, _iterations);
            Key = keyGenerator.GetBytes((int)keySize / 8);
            Vector = keyGenerator.GetBytes(BlockSize / 8);
        }
}

This token contains aes-encryption settings along with encryption mode, key, IV and a bit of logic to generate key/vector bytes based on unique string (password).

CBC (Cipher Block Chaining) cipher mode means that each block of data will be XOR-ed with the previous ciphered block before being encrypted, which basically means stronger encryption since it is hard to decode each block individually.

IV (initialization vector) XOR-ed with the first block of input data in order to make each encrypted sequence unique.

Now let's implement encryption module which consumes our token and byte array and performs encryption or decryption:

public class AesEncryptor
    {
        private static byte[] ProcessData(OperationType operationType, byte[] source, AesToken token)
        {
            bool isEncrypt = operationType == OperationType.Encrypt;

            byte[] result;

            using (var aes = new AesManaged())
            {
                aes.Mode = token.CipherMode;
                aes.KeySize = (int) token.KeySize;
                aes.Key = token.Key;
                aes.IV = token.Vector;
                aes.BlockSize = token.BlockSize;

                var encryptor = isEncrypt ? aes.CreateEncryptor(aes.Key, aes.IV) : aes.CreateDecryptor(aes.Key, aes.IV);

                using (var ms = new MemoryStream())
                {
                    using (var cs = new CryptoStream(ms, encryptor, CryptoStreamMode.Write))
                    {

                        cs.Write(source, 0, source.Length);

                    }
                    result = ms.ToArray();
                }
            }

            return result;
        }

        public static byte[] Encrypt(byte[] source, AesToken token)
        {
            byte[] result = ProcessData(OperationType.Encrypt, source, token);

            return result;
        }

        public static byte[] Decrypt(byte[] source, AesToken token)
        {
            byte[] result = ProcessData(OperationType.Decrypt, source, token);

            return result;
        }
    }

Finally we need a couple of helpers for string encryption and generation of tokens and random strings.

public static class EncryptHelper
    {
        // no need for strong encryption
        private static readonly AesToken token = new AesToken("f9LWDBQFX7nw4t2ilWud3aj8gTEpAxneZzmqHRfkXnRDLBnGoyst3etFZNzu7kFk");

        public static string EncryptString(string source)
        {
            var bytes = Encoding.UTF8.GetBytes(source);
            var encBytes = AesEncryptor.Encrypt(bytes, token);

            // replace '/' in case this string will be used as file name
            var result = Convert.ToBase64String(encBytes).Replace('/','_');

            return result;
        }

        public static string DecryptString(string source)
        {
            var bytes = Convert.FromBase64String(source.Replace('_', '/'));
            var decBytes = AesEncryptor.Decrypt(bytes, token);

            var result = Encoding.UTF8.GetString(decBytes);

            return result;
        }

        public static string GenerateRandomString()
        {
            return Guid.NewGuid().ToString();
        }

        public static string GenerateFileToken(string fileName)
        {
            var random = GenerateRandomString();
            var fileToken = EncryptString(random + fileName);
            return fileToken;
        }
    }

Once again this is about encrypting small files, like finance reports or medical cards. Large files will take a lot of time to process, and very large - will throw out of memory exception.

Here some performance results:

Operation File (MB) Count Time (seconds)
Encrypt 1 1 0.05
Encrypt 6 1 0.299
Encrypt 100 1 5.082
Decrypt 1 1 0.059
Decrypt 6 1 0.413
Decrypt 100 1 5.889
Encrypt 1 100 5.245
Encrypt 6 100 30.823
Encrypt 100 100 510.987
Decrypt 1 100 5.832
Decrypt 6 100 34.693
Decrypt 100 100 569.736

As you can see it takes a lot of time to process one hundred of 100MB files. However, processing 1MB files is relatively fast. And in this particular case 1MB was considered a large file.

Source code (with a couple of unit tests for workflow) is available here.

No comments :

Post a Comment