Tuesday, December 04, 2012

How to create random readable strings for .Net application

Why would I want to be random?

If you need a random string, I assume you know why you're here. However there are some common uses for random strings I want to list out for the Google juice factor:
  1. CAPTCHA codes when not using something cool like reCAPTCHA
  2. Email verification codes.
  3. Nonce values for challenge/response.
  4. salt values to increase entropy on password hashes.
  5. Registration codes.

How should I generate them in .Net?

There is a simple answer, really. In .Net you can just make a call to RNGCryptoServiceProvider's GetNonZeroBytes method and convert those bytes to characters.
var random = new byte[16];           // whatever size you want
var rng = new RNGCryptoServiceProvider();
rng.GetNonZeroBytes(random);         // Fill with non-zero random bytes

return Convert.ToBase64String(random);  // convert to a string.
If you have the MVC 4 package available, you can use the convenient Crypto.GenerateSalt method as a shorthand as it essentially does the above code.

This, of course, limits the returned string to the Base-64 characters.

When should I care about the contents?

In general, you don't care about the contents of the random string. The one generated by logic above is pretty useful as it is a wide set of all-ASCII characters that will not get you in trouble when crossing code-pages.

The biggest downside of this approach is that the string is only using the a 64 character set, so you're excluding a lot of other possible characters, but in most applications that isn't a problem.  In fact, quite the opposite is true. In many cases, we might want to avoid specific characters like the + character  because this might be used in a URL In other cases, you might want to generate a fuller character set (or a specific set  like an all-emoji string).

A more common need, though, would be if you need to put something on screen for a user to type (such as a registration code) that should not be easy to mistake characters.  In some fonts, the characters 1, l and I or 0, o and 0 are very easily mistaken. For such cases, you can use a function like the following to generate a reasonably readable string
namespace Silly
{
    using System.Security.Cryptography;

    public static partial class Helpers
    {
        public static string RandomReadableString(int length)
        {
            return "23456789ABCDEFGHJKMNPQRSTUVWXYZabcdefghijkmnpqrstuvwxyz".RandomString(length);
        }

        public static string RandomString(this string characterSet, int length)
        {
            var rng = new RNGCryptoServiceProvider();
            var random = new byte[length];
            rng.GetNonZeroBytes(random);

            var buffer = new char[length];
            var usableChars = characterSet.ToCharArray();
            var usableLength = usableChars.Length;

            for (int index = 0; index < length; index++)
            {
                buffer[index] = usableChars[random[index] % usableLength];
            }

            return new string(buffer);
        }
    }
}
You can call the second function against any string of characters. For example I'm using the RandomReadableString method to generate email confirmation codes that can easily be typed if needed.

Boring! Spice it up...

For even more fun, here's some Emoji sequences that can be used for eye charts or stupid pet code tricks.
// Emoji fun
// random weather "☀☁☂☃"
// random finger pointers "☜☝☞☟"
// random zodiac "♈♉♊♋♌♍♎♏♐♑♒♓"
// random chess pieces "♔♕♖♗♘♙♚♛♜♝♞♟"
// random music notation "♩♪♫♬♭♯"
// random trigrams "☰☱☲☳☴☵☶☷"
// random planets "♃♄♅♆♇"

5 comments:

Adnan Hussain said...

I quite like your RandomReadableString() extension method.

Although is the string guaranteed to be unique?

I would have thought email validation codes, for example, would need to be unique.

Marc Brooks said...

Well, there's "absolutely unique" and "statistically unique". If you generate a long enough string, it's going to "for all practical purposes" be unique. If you NEED it to be a guarantee, then you put it in a database/store and put a unique index on the value... if you get a collision (not likely), you just loop back and generate another one.

As for EVCs being unique, you can make them be fairly long, linked to the user, and time-sensitive and that'll cure most ills...

The Director said...

Ooh, pretty pretty bad strings.

Anonymous said...

It is nice, but it couldn't pass a statistical random test:
- you crate a randomized sequence with RNGCrypto, which can contain 255 different values
- later you use a MOD operator, which projects this value into a shorter range (the longer case is not important now)
- if the length is not divider of 255, there will be a segment at the begin of the usable character range which will be more frequent than the others

btw. it is a good function to create almost random values :)

Anonymous said...

It is nice, but it couldn't pass a statistical random test:
- you crate a randomized sequence with RNGCrypto, which can contain 255 different values
- later you use a MOD operator, which projects this value into a shorter range (the longer case is not important now)
- if the length is not divider of 255, there will be a segment at the begin of the usable character range which will be more frequent than the others

btw. it is a good function to create almost random values :)