# data-dispersal

Space-efficient and privacy-preserving data dispersal algorithms.

Latest on Hackage: | 1.0.0.2 |

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

**Peter Robinson**

**peter.robinson@monoid.at**

Given a ByteString of length `D`

, we encode the ByteString as a list of `n`

`Fragment`

s, each containing a ByteString
of length `O(D/m)`

. Then, each fragment could be stored on a separate
machine to obtain fault-tolerance:
Even if all but `m`

of these machines crash, we can still reconstruct the original
ByteString out of the remaining `m`

fragments.
Note that the total space requirement of the `m`

fragments is `m * O(D/m)=O(D),`

which is clearly space-optimal.
The total space required for the n fragments is `O((n/m)*D)`

.
Note that `m`

and `n`

can be chosen to be of the same order, so the
asymptotic storage overhead for getting good fault-tolerance increases only by
a constant factor.

*GHCi Example:*

```
> :m + Data.IDA
> let msg = Data.ByteString.Char8.pack "my really important data"
> let fragments = encode 5 15 msg
-- Now we could distributed the fragments on different sites to add some
-- fault-tolerance.
> let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed
-- Let's look at the 5 fragments that we have left:
> mapM_ (Prelude.putStrLn . show) frags'
(6,[273,771,899,737,285])
(7,[289,939,612,285,936])
(8,[424,781,1001,322,788])
(9,[143,657,790,157,423])
(10,[314,674,418,888,423])
-- Space-efficiency: Note that the length of each of the 5 fragments is 5
-- and our original message has length 24.
> decode frags'
"my really important data"
```

*Encrypted Fragments:*

The module `Data.IDA`

contains an information dispersal algorithm that produces
space-optimal fragments. However, the knowledge of 1 or more fragments might
allow an adversary to deduce some information about the original data.
The module `Crypto.IDA`

combines information dispersal with
secret sharing: the knowledge of up to `m-1`

fragments does not leak any
information about the original data.

This could be useful in scenarios where we need to store data at untrusted
storage sites: To this end, we store one encrypted fragment at each site.
If at most `m-1`

of these untrusted sites collude, they will still
be unable to obtain any information about the original data.
The added security comes at the price of a slightly
increased fragment size (by an additional constant 32 bytes) and an
additional overhead in the running time of the encoding/decoding process.
The algorithm is fully described in module `Crypto.IDA`

.

*Fault-Tolerance:*

Suppose that we have `N`

machines and encode our data as `2log(N)`

fragments
with reconstruction threshold m = `log(N)`

.
Let's assume that we store each fragment on a separate machine and each
machine fails (independently) with probability at most 0.5.

What is the probability of our data being safe?

`Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).`

What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size

`O(D/m)`

, so the total space is`O(n D/ m) = 2D.`

In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.

This library is based on the following works:

"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.

"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.

"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146