Hashing vs. Encryption

Here’s something that’s bugged me for a long time… the overuse of encryption when hashing will work just fine.

For example, let’s say that I build a redirect URL that contains something in the URL that isn’t particularly sensitive information, but I don’t want anyone messing with. Let’s say I generate a coupon code and an expiration date.

http://waa/woo/cashin?coupon-code=BSRMSIN&expiration=2011/01/31

Maybe that’s a silly example, but oh well.

But I don’t want to send it like that, because someone could intercept the URL, change the date to 2211/01/31 or something, and reuse that code for the next 200 years.

Now, the knee-jerk reaction is to say, “Encrypt it!” But that adds a lot of headaches. Encryption is slow. Encryption requires some sort of key management. Encryption bloats the size of your messages.

Instead, I can just take my generated URL, run it through a salted SHA1 hash, and encode some piece of the digest as ASCII and add that to the URL. Or you could encode the whole digest, but that’s probably more bytes than you need.

http://waa/woo/cashin?coupon-code=BSRMSIN&expiration=2011/01/31&signature=AE832F1

SHA-1 is fast. Too fast, as it turns out, to be useful for some cryptographic tasks like protecting passwords. But in this case, fast is good.

The server generates the hash using the input string plus a “salt” string, which is just a secret phrase that only the client and server know. A man in the middle can’t change the values and regenerate the hash, because he doesn’t know the salt. But the server does know the salt, so he can calculate the hash using the salt, and compare to see if he got the same signature as before. Done!

It just drives me a little batty whenever a team wants to send information around, and become convinced that encryption is an absolute requirement. No. Think about where it makes sense, and where you don’t need to hide the data, and you’re better off just matching hashes on both ends.

Here’s more information on using a cryptographic salt string:

http://en.wikipedia.org/wiki/Cryptographic_salt

Advertisements

8 Comments

Filed under opinionizing, utility

8 responses to “Hashing vs. Encryption

  1. PM

    Came across your article when I was trying to understand the difference and funnily enough I was looking at encrypting a URL however having read your points, makes sense to use hashing. One thing that escapes me what you meant by “take some encode some piece of the digest as ASCII and add that to the URL”. What do you mean by “take some encode some piece of the digest”? Also if I am taking a piece of the digest, how does that help especially if I wish to compare the hashes?

    • roby2358

      lol whoops… I edited it to correct that sentence, and added a comment: “Or you could encode the whole digest, but that’s probably more bytes than you need.” Digests are pretty wide, and unless you’re worried about someone doing a brute-force search of possible hash values, you’re probably OK just encoding 10 or 20 bytes of the digest. I forget how long it is altogether.

      There was one cheesy application for a promotional offer I worked on where I just grabbed enough bytes to encode 5 characters for the hash code. But if it was more sensitive, or there were bigger dollars involved, then doing more, or even the whole digest, would make more sense.

      Thank you for your comment — I’m really glad the post simplified things for you!

      • PM

        Thanks for clarifying that but am still confused as to why you would only encode a portion of the digest when you need the entire digest to confirm whether a value the user has inputted matches. Am I missing something? Secondly what do you mean by “grabbed enough bytes to encode 5 characters for the hash code”

      • roby2358

        But you don’t need the whole digest… after you’ve hashed, the digest is just an arbitrary string of bytes that you hope is more or less uniformly distributed over possible values. (You can’t say it’s “random”, because there are special tests to determine how random something is, but you can say it’s arbitrary.)

        Now, if two hash digests match, then any subset of the bytes will match, too. So hash1[0:4] will equal hash2[0:4]. Or every other byte in the first hash, will match every other byte in the second. You don’t need to compare the entire thing — you can just throw big chunks of it away.

        A hash is just a throw of the dice anyway… you’re hoping that the hash doesn’t collide with another hash. If you keep 1/4 of the digest and throw the rest away, you’re increasing the chance of a collision. But that chance is still vanishingly small.

        What I did on that project is I just grabbed every other byte until I had 4 or so and encoded them as Base 32 characters. The chance of collision was much greater than if I’d used the whole thing, but in that case I didn’t really care, since I was just generating limited-time, throw-away coupons that were tracked separately anyway.

        Also, I was just comparing the encoded bytes as a string rather than the raw digest.

  2. PM

    Thanks. Sorry for being a n00b but can you explain the following?
    1. When you say a digest “just an arbitrary string of bytes that you hope is more or less uniformly distributed over possible values”, what do you mean exactly?

    2. I didn’t quite follow what you meant in the second paragraph.

    3. When you say you grabbed every other byte until you had 4 or so and encoded them as Base32 characters, why would you? Also why would there be a greater chance of collision if you had used the entire string when using a subset of the string yields a greater chance of a match for a set of hashed values or digests?

    • roby2358

      No worries! This is a good discussion!

      Let me dash out a longer answer later, but the short version is:

      If the whole digest matches, then any piece of the digest will match, too. So we can just take a piece of the digest and compare it (to the equivalent piece) and spare transmitting the extra bits.

      There might not be a lot of practical benefit to that, but it shrinks things down a little.

      Ya that increases the chance of a collision, but it’s along the lines of: the chance of an asteroid hitting the earth in the next 1000 years as opposed to hitting the earth in the next million years 😉 That is: still vanishingly small.

      Let me get some head above water on the stuff I’m working on, and I’ll post a more better reply.

      • PM

        Thanks. Sorry took me a while to get back with a response. Be great if you could elaborate with real world examples and explain each step with reasons.

  3. jimcolv

    This is some good information. I’m an IT instructor. Currently I teach a few certification courses (Sec+, Net+, A+, CCNA). The course material we use for most of the courses leaves out some essential info and also doesn’t 100% cover the objectives on the exams. I want to start my own tech blog that covers a lot of the concepts that are missed in the materials we use (already have a personal blog here on WordPress).

    Just wanted to come by and say good job. I’ll be following your posts from now on.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s