Outline ======= - Symmetric-key crypto - MACs - Encryption: One-time pad - Generating random bits - Stream ciphers: RC4 Symmetric-key cryptography ========================== Typical setting: communication between two parties, Alice and Bob. Adversary in the network: passive eavesdropper or active tampering. Alice ------?--------------------?----> Bob | ^ | | \----> Adversary ----/ Goal: communicate securely (secrecy, integrity, ..) despite adversary. Approach: Alice and Bob know some secret that adversary doesn't know. For now, will talk about symmetric crypto: Alice and Bob share same secret. MAC === Assume that the adversary can observe and modify messages on the network. How can Bob determine if some message really came from Alice? Near-fit: hash functions. Secret is the hash function itself. Construction often called a Message Authentication Code (MAC). Looks like (but important differences from) digital signatures: Both parties can compute MAC, verify it's equal. Only one party usually can compute digital signature. Does a MAC guarantee that Alice sent message? No: beware of replays. Goal: prevent adversary from computing MAC for message Alice never sent. A hash function alone is not practical. Recall: open design principle. Hard to design a hash function for every pair of parties. Hard to design a new hash function when the old one has been discovered. Much more practical: secrets are keys: strings of bits. Where do the keys fit into a hash function? If we had a random oracle, can compute MAC(k, m) = h(k || m) Need TCR, NM. OW not important (adversary knows m). Why is this dangerous in practice? Recall: Merkle-Damgard construction, hash functions are chained Can be easy to append new input blocks without knowing earlier input blocks Thus, attacker could generate valid MAC for m' = m || x The other way around, h(m||k), suffers from other problems: any collision in h means collision in MAC just by appending key so h has to be CR, not just TCR HMAC tries to alleviate these problems HMAC(k, m) = h(k1 || h(k2 || m)) k1 = k XOR opad k2 = k XOR ipad opad/ipad are constants that make k1 and k2 different in many bits Encryption ========== Goal is now secrecy: prevent adversary from learning secret messages Alice and Bob generate a secret key: Keygen(length) -> k [randomized] Alice sends c=E_k(m) to Bob [randomized or deterministic] Bob can decrypt m=D_k(c) [deterministic] Adversary shouldn't be able to tell E_k(m1) from E_k(m2), assuming |m1| = |m2| Many possible attack scenarios / threat models: known ciphertext known plaintext (and ciphertext) chosen plaintext chosen ciphertext (latter three assume key reuse) One-Time Pad ============ Simple and provably secure encryption mechanism. Invented by Gilbert Vernam in 1917 (using paper tapes). Suppose we have a long pad of random bits, same length as our message; then: Message = 1001110... Pad = 0101101... ------------------------ c = m XOR p = 1100011... Pad = 0101101... ------------------------ m = c XOR p = 1001110... Requirements: pad is random, secret, and used only once. Allegedly used by spies; military comm. systems using identical turntable "vinyl records" with random noise that's added at one end and subtracted at the other end. Provably secure: P(m) = apriori probability that m was sent (i.e., adversary's guess) P(pad) = probability of a given pad sequence = 2^-n for n-bit pad P(c | m) = prob. of observing ciphertext c, given message m = prob. that pad is exactly c XOR m = 2^-n P(c) = probability of observing a given ciphertext = Sum_{all m} P(c | m) * P(m) = Sum_{all m} 2^-n * P(m) = 2^-n * Sum{all_m} P(m) = 2^-n => ciphertexts are uniform -- any ciphertext is equally likely P(m | c) = prob. of message m given ciphertext c = P(c | m) * P(m) / P(c) [ Bayes' rule ] = 2^-n * P(m) / 2^-n = P(m) => attacker learns nothing from ciphertext One-time pad is unconditionally secure: no assumptions about adversary information-theoretically secure: brute force cannot help compare: computationally secure: assume computationally-limited adversary hash functions, MACs, most other cryptosystems Why one-time? Two ciphertexts with a reused pad can be XORed to produce XOR of plaintexts XOR of plaintexts is a lot of information Why not use OTP in practice? Requires a continuous stream of secret, random bits shared with right party Encryption + MAC ================ OTP is malleable: attacker can XOR ciphertext bits, leading to XOR in plaintext Should we encrypt or MAC first? Encrypt-then-MAC is provably secure MAC-then-Encrypt can be insecure: chosen-ciphertext attacks e.g. consider strange encryption scheme: first map 0 -> 00, 1 -> 01 or 10 (random) then encrypt with a one-time pad (twice the size) attacker flips two adjacent bits; if message is rejected, bit was 0 Generating random bits ====================== Where do you get random bits from? single user coin flip, dice radioactive sources microphone disk speed variations hardware RNGs: intel i82802, IBM crypto processor, TPM user inputs: mouse, keyboard timing lava lamp anonymous communication: alpern & schneider suppose adversary cannot tell who sent message Alice and Bob randomly send bits they know who sent each bit, but adversary does not can generate a shared secret based on that information e.g. pad = sent-bit if Alice sent it, or !sent-bit if Bob sent it satellite-based satellite streaming a high-bandwidth source of random bits Eve might not have enough memory to remember all the random bits Alice and Bob record some subset of random bits, later find overlap quantum key distribution alice sends "qubits" in one of 4 possible orientations: | / - \ bob measures using a + filter (vertical/horizontal) or X (45/135 degree) sent bit + filter X filter | | noise - - noise / noise / \ noise \ afterwards, Bob announces all of the measurement choices, uses non-noise one way to imagine: stronger version of the satellite-based system physics prevents adversary from learning/remembering all information pseudo-randomness risky proposition commonly used to pump up the (expensive) initial randomness Stream ciphers: RC4 =================== takes a fixed-size seed (key) input, generates arbitrary # of pseudo-random bits effectively a cryptographically-secure PRNG goal: adversary can't distinguish truly random one-time pad from stream cipher RC4: takes as input a 256-bit key internal state is a 256-entry table S, which is a permutation of 0..255 table S initialized from key PRNG stream generation (1 byte at a time): byte S[256] byte i = 0 byte j = 0 ## i, j are indexes into table S while True: i = i + 1 ## all addition mod 256 j = j + S[i] swap S[i], S[j] output S[S[i] + S[j]] widely used, fast (400MB/sec on my laptop) setup of S from key is weak, discard first 1024 bytes same warnings apply as OTP: reusing seed (key) can lead to problems, since it deterministically generates the same pseudo-random pad