Cryptography for Penetration Testers – OWASP AppSec NYC 2008
This presentation was on “Cryptography for Penetration Testers” and was by Chris Eng, the Senior Director of Security Research at VeraCode.
The Premise
How much do you really have to know about cryptography in order to detect and exploit crypto weaknesses in web apps.
Goals
- Learn basic techniques for identifying and analyzing cryptographic data
- Learn black-box heauristics for recorgnizing weak crypto implementation
- Apply techniques
The Crypto that Matters in 6 Short Slides
Types of Ciphers
- Block Ciphers: Operates on fixed-length groups of bits, called blocks. Block sizes vary depending on the algorithm. Several different modes of operation for encrypting messages longer than the basic block size. Example ciphers include DES, 3DES, Blowfish, AES
- Stream Ciphers: Operates on plaintext one bit at a time
Block Ciphers: Electronic Code Book (ECB) Mode
- Fixed-size blocks of plaintext are encrypted independently
- Each plaintext block is substituted with ciphertext block, like a codebook
- Weaknesses: Structure in plaintext is reflected in ciphertext. Ciphertext blocks can be modified without detection.
Bliock Ciphers: Cipher Block Chaining (CBC) Mode
- Each block of plaintext is XORed with the previous ciphertext block before being encrypted
- Change of message affects all following ciphertext blocks
- Initialization Vector (IV) is used to encrypt first block
Stream Ciphers
- Plaintext message is processed byte by byte (as a stream)
- Key scheduler algorithm generates a keystream using a key and an Initialization Vector (IV combined (XOR) with plaintext bit by bit
- Encrypt by XORing plaintext with the generated keystream
Common Crypto Mistakes
- Insecure cipher mode (usually ECB)
- Inappropriate key reuse
- Poor key selection
- Insufficient key length
- Insecure random number generation
- Proprietary or home-grown encryption algorithms (Don’t do this ever!)
Analysis Techniques
Dealing with Gibberish Data
What do you do when you are pen testing a web application and you encounter data that is not easy to interpret?
- Cookies
- Hidden fields
- Query string parameters
- POST parameters
How random is it?
- Output of cryptographic algorithms should be evenly distributed, given a sufficiently large sample size.
- Tools such as ENT (http://www.fourmilab.ch/random) will calculate entropy per byte, chi-square distribution, arithmetic mean, serial correlation, etc
Observe Characteristics
Is the length a multiple of a common block size?
- Indicates that the application may be using a block cipher
Is the length the same as a known hash algorithm?
- For example, MD5 is usually represented as 32 hex characters
- May also indicate the presence of an HMAC
- Still may be worthwhile to hash various permutations of known data in case a simple unkeyed hash is being used
Stimulus, Response
Does the length of the token change based on the length of some value that you can supply?
For a block cipher, you can determine the block size by incrementing input one byte at a time and observing when the encrypted output length jumps by multiple bytes (ie, the block size)
How does the token change in response to user-supplied data?
- Figure out how changing different parts of the input affects the output
- Is more than one block affected by a single character change in the input?
Deeper Block Cipher Inspection
Are there any blocks of data that seem to repeat in the same token or over multiple tokens?
- Possibly ECB mode, this doesn’t just happen by coincidence
EXAMPLE
Context: A public-facing web portal for a large ISP. Used an encrypted cookie to authenticate identity. A new cookie is issued on each request. Base64 decoded EE cookies. Divided by 8 and found 8 byte blocks. Noticed some repetition in the same position. The only variable blocks are the last two (possibly a “last accessed” timestamp or similar timeout mechanism). Register a new account with a username of ‘c’ x 32, the maximum length permitted, and observe the value of the EE cookie.
‘c’ x 32 is Perl notation for “cccccccccccccccccccccccccccccccc”
The token is longer, meaning the username is probably stored in the cookie. Still noticed repition in same position. Register another account with a username of ‘c’ x 16 and compare to the EE cookie generated in the previous step. Didn’t see two identical blocks for ‘c’ x 16 and four identical blocks for ‘c’ x 32. Reason is padding. The username doesn’t align perfectly with the block offset. Want to figure out what position in the cookie the usernaem is located. Additional user accounts were created with specific usernames in order to determine if there is any initial padding in the first block. Now you know where the username is in the ciphertext.
Able to successfully subvert the authentication mechanism without any knowledge of the algorithm or the key, based solely on observed patterns in the ciphertext. The root cause was the insecure cipher mode and the lack of a verification mechanism. ECB mode shoul dnot be used (use CBC instead).
EXAMPLE
Token values observed in URLs. Changed every time we logged on to the application. Never the same for any two sessions or any two users. Base64 decoded values for several different “stmt” tokens. Statement numbers were displayed in the browser. Looked for correlations between statement number and cipher-text. Conclusion: It looks like a stream cipher. Use XOR to calculate 10 bytes of the keystream based on the known plain-text (ie. the statement number). Now try the same things against one of the other collected tokens, such as the one called “Ctxt”. Get ASCII text that allows you to infer what it would say. Expand it out more and more to get the keystream. Repeat over and over until you have enough of the key to figure out anything in the application.
Through this iterative process, we can obtain the entire keystream (or rather, a sufficient amount of the keystream to encrypt and decrypt all of the cipher-text we encounter). Can replace the statement number with another valid statement number and view the contents.
Able to subvert the encryption mechanism without any knowledge of the algorithm or the key based solely on observed patterns in the ciphertext. They were using RC4 with a unique key generated for each user session. Root cause of the vulnerability is the re-use of the keystream.