The system works by generating a big number, which is a shared secret (aka key). This is encoded into the QR code you scan, and saved on your phone. Now the phone and the server share a secret.
The token which is shown and entered during login is derived by hashing the secret with the time rounded to the nearest 30 seconds (well most people use 30 seconds).
Both sides generate the token, and as long as their clocks are synchronized, the tokens will match, and you can proceed.
The hashing is such that knowing a sequence of tokens doesn’t allow you to derive the secret any faster than trying every possible value of the secret. By recommending that the secret be at least a 160 bit number (about 48 decimal digits long) trying every value is impossible in reasonable time.
Thus the scheme is resistant to someone who knows your username, password, and all the historic tokens, as long as they don’t get your current token value, they can’t login to your account.
I have found various common implementation errors with TOTP
The authors of the RFC 4226 assumed that the shared secret would be embedded in hardware security devices designed to keep their secrets. Whilst mobile phones and computers do these days typically have chips that perform this function, most of the apps and server side implementations don’t use them for TOTP. From the description above you will see the secret key has to be recoverable to plaintext on both ends. The encryption key (you are going to store the keys encrypted right?), and the plain text of the secret, need to be handled with the upmost care, as it is logically equivalent to a password (assuming a properly hashed password is the other factor in authentication).
RFC 4226 section 4 Requirement 6 requires a minimum key length of 128 bits, and a recommend of 160 bits.
The key is represented as a base32 string, which stores 5 bits per character, so the string in the QR code should be at least 26 characters long (128/5 rounded up), preferably 32 characters.
A number of implementations have used 80 bit keys. If you had a sequence of tokens from a device, perhaps from filming the user using their phone (some mobile authenticator apps, Duo’s for example, only have one code shown at a time, this seems a wise precaution against being overlooked), then the attack on the secret is brute force calculating which keys might produce those tokens at those times.
Even with the relatively cheap hash functions used in TOTP, 2^80 is unrealistic for most people to brute force, but it is of a similar magnitude to the RC5 64 bit crack given 20 years of Moore’s law, so it is bordering on possible with current hardware. Pick 160 bits key length now, save yourself revisiting too soon.
Poor Key generation
When you generate the key use a source of randomness which is cryptographic-ally sound. Do not use a cryptographic hash functions on a pseudo-random number (unless you really know what you are doing in conditioning random numbers when you wouldn’t be reading this advice).
Idiomatically I see a lot of “sha1(rand())” or “md5(rand())” around github and elsewhere (not mentioning sample code from well known security device manufacturer). Some of those code snippets have a rand() function which only produces 32 bits of output, so can only generate 2^32 values (easily brute forced, and only needs a couple of captured tokens to figure out which value is right) even if the resulting keys appear to have good randomness, and appear to have the right length.
This is a cataclysmic error, as to any black box testing that doesn’t have the first 2 billion hashes prerecorded this looks good.
[although if you think it might be 2^32 you could use the Birthday problem, any repetition of a key within the first “2^(32/2) * some safety factor” should raise the alarm as collisions in 2^160 will take ~2^80 which you aren’t going to see by chance. Always use large sample for your black box randomness testing to spot this]
Token can be reused
This is the commonest fault by far.
The RFC 4226 assumes that token values can be observed. If you assume an attacker can see or derive the token (say by key logger, or video camera watching the keyboard), they can compromise the password and the TOTP token, defeating the authentication system. See also the distinction between two step and two factor authentication.
Now this is a fairly small set of situations that are addressed. The attacker has username, password, and needs the token. They can see the token that is entered. Can they see the token before it has been entered?
Clearly the correct implementation to get the maximum benefits of TOTP is to:
Accept a valid token only once.
Issue a clear message if someone attempts to re-use a token. So that in an attack the legitimate user is aware their token has just been used seconds ahead of them.
Even then if the attacker can interfere with the network connection between the victim and the server, he may be able to see the token value (key logger), use the token first, and then stop the network connection for a short period (e.g. 30 or 60 seconds) so that the user never sees the message that the token was used previously (We can’t stop indefinitely powerful attackers, but consider U2F, or other proper two factor schemes).
Correctly maintaining the state of successful uses of TOTP tokens at scale has been reported as challenging in some discussions. However the synchronization only needs to be of the account identified and the time period the stamp was use (nothing particularly sensitive here!), and this data is only valid for at most 90 seconds in most implementations, so maybe Google and Facebook might have a challenge synchronizing it in a timely fashion, you won’t. Also this is only on authentication, and if you care about usability and the security constraints are not too onerous, that may not be on every login just new devices, or say every 30 days.
Lack of Rate Limiting
Should be obvious, but if we have a 6 digit token, it only requires a few thousand guesses every 30 seconds to get in fairly swiftly, if you already have a valid username or password. So some sort of lockout, or rate limiting is needs to be implemented such that the token can’t be brute forced.
I’ve seen TOTP rate limiting missing in systems that you really would expect to have got it right.