Top Questions on Stack Overflow

List of Tags

Here is a piece of C++ code that shows some very peculiar performance. For some strange reason, sorting the data miraculously speeds up the code by almost 6x:

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! With this, the next loop runs faster
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;
}
  • Without std::sort(data, data + arraySize);, the code runs in 11.54 seconds.
  • With the sorted data, the code runs in 1.93 seconds.

Initially I thought this might be just a language or compiler anomaly. So I tried it in Java:

import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i < 100000; ++i)
        {
            // Primary loop
            for (int c = 0; c < arraySize; ++c)
            {
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

with a similar but less extreme result.


My first thought was that sorting brings the data into cache, but my next thought was how silly that is because the array was just generated.

What is going on? Why is a sorted array faster than an unsorted array? The code is summing up some independent terms, the order should not matter.

Answered By: Mysticial ( 6432)

You are the victim of branch prediction fail.


What is Branch Prediction?

Consider a railroad junction:

Image by Mecanismo, from Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Entroncamento_do_Transpraia.JPG

Now for the sake of argument, suppose this is back in the 1800s - before long distance or radio communication.

You are the operator of a junction and you hear a train coming. You have no idea which way it will go. You stop the train to ask the captain which direction he wants. And then you set the switch appropriately.

Trains are heavy and have a lot of momentum. So they take forever to start up and slow down.

Is there a better way? You guess which direction the train will go!

  • If you guessed right, it continues on.
  • If you guessed wrong, the captain will stop, back up, and yell at you to flip the switch. Then it can restart down the other path.

If you guess right every time, the train will never have to stop.
If you guess wrong too often, the train will spend a lot of time stopping, backing up, and restarting.


Consider an if-statement: At the processor level, it is a branch instruction:

enter image description here

You are a processor and you see a branch. You have no idea which way it will go. What do you do? You halt execution and wait until the previous instructions are complete. Then you continue down the correct path.

Modern processors are complicated and have long pipelines. So they take forever to "warm up" and "slow down".

Is there a better way? You guess which direction the branch will go!

  • If you guessed right, you continue executing.
  • If you guessed wrong, you need to flush the pipeline and roll back to the branch. Then you can restart down the other path.

If you guess right every time, the execution will never have to stop.
If you guess wrong too often, you spend a lot of time stalling, rolling back, and restarting.


This is branch prediction. I admit it's not the best analogy since the train could just signal the direction with a flag. But in computers, the processor doesn't know which direction a branch will go until the last moment.

So how would you strategically guess to minimize the number of times that the train must back up and go down the other path? You look at the past history! If the train goes left 99% of the time, then you guess left. If it alternates, then you alternate your guesses. If it goes one way every 3 times, you guess the same...

In other words, you try to identify a pattern and follow it. This is more or less how branch predictors work.

Most applications have well-behaved branches. So modern branch predictors will typically achieve >90% hit rates. But when faced with unpredictable branches with no recognizable patterns, branch predictors are virtually useless.

Further Reading: http://en.wikipedia.org/wiki/Branch_predictor


As hinted from above, the culprit is this if-statement:

if (data[c] >= 128)
    sum += data[c];

Notice that the data is evenly distributed between 0 and 255. When the data is sorted, roughly the first half of the iterations will not enter the if-statement. After that, they will all enter the if-statement.

This is very friendly to the branch predictor since the branch consecutively goes the same direction many times. Even a simple saturating counter will correctly predict the branch except for the few iterations after it switches direction.

Quick visualization:

T = branch taken
N = branch not taken

data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 250, 251, 252, ...
branch = N  N  N  N  N  ...   N    N    T    T    T  ...   T    T    T  ...

       = NNNNNNNNNNNN ... NNNNNNNTTTTTTTTT ... TTTTTTTTTT  (easy to predict)

However, when the data is completely random, the branch predictor is rendered useless because it can't predict random data. Thus there will probably be around 50% misprediction. (no better than random guessing)

data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118,  14, 150, 177, 182, 133, ...
branch =   T,   T,   N,   T,   T,   T,   T,  N,   T,   N,   N,   T,   T,   T,   N  ...

       = TTNTTTTNTNNTTTN ...   (completely random - hard to predict)

So what can be done?

If the compiler isn't able to optimize the branch into a conditional move, you can try some hacks if you are willing to sacrifice readability for performance.

Replace:

if (data[c] >= 128)
    sum += data[c];

with:

int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

This eliminates the branch and replaces it with some bitwise operations.

(Note that this hack is not strictly equivalent to the original if-statement. But in this case, it's valid for all the input values of data[].)

Benchmarks: Core i7 920 @ 3.5 GHz

C++ - Visual Studio 2010 - x64 Release

//  Branch - Random
seconds = 11.777

//  Branch - Sorted
seconds = 2.352

//  Branchless - Random
seconds = 2.564

//  Branchless - Sorted
seconds = 2.587

Java - Netbeans 7.1.1 JDK 7 - x64

//  Branch - Random
seconds = 10.93293813

//  Branch - Sorted
seconds = 5.643797077

//  Branchless - Random
seconds = 3.113581453

//  Branchless - Sorted
seconds = 3.186068823

Observations:

  • With the Branch: There is a huge difference between the sorted and unsorted data.
  • With the Hack: There is no difference between sorted and unsorted data.
  • In the C++ case, the hack is actually a tad slower than with the branch when the data is sorted.

A general rule of thumb is to avoid data-dependent branching in critical loops. (such as in this example)


Update :

  • GCC 4.6.1 with -O3 or -ftree-vectorize on x64 is able to generate a conditional move. So there is no difference between the sorted and unsorted data - both are fast.

  • VC++ 2010 is unable to generate conditional moves for this branch even under /Ox.

  • Intel Compiler 11 does something miraculous. It interchanges the two loops, thereby hoisting the unpredictable branch to the outer loop. So not only is it immune the mispredictions, it is also twice as fast as whatever VC++ and GCC can generate! In other words, ICC took advantage of the test-loop to defeat the benchmark...

  • If you give the Intel Compiler the branchless code, it just out-right vectorizes it... and is just as fast as with the branch (with the loop interchange).

This goes to show that even mature modern compilers can vary wildly in their ability to optimize code...

2571
Oli

I've been messing around with JSON for some time, just pushing it out as text and it hasn't hurt anybody (that I know of), but I'd like to start doing things properly.

I have seen so many purported "standards" for the JSON content type:

application/json
application/x-javascript
text/javascript
text/x-javascript
text/x-json

But which is correct, or best? I gather that there are security and browser support issues varying between them.

I know there's a similar question, What MIME type if JSON is being returned by a REST API?, but I'd like a slightly more targeted answer.

Answered By: Gumbo ( 2916)

RFC 4627:

The MIME media type for JSON text is application/json.

2521
Laurie Young

I stupidly did a Git commit while half asleep, and wrote the totally wrong thing in the commit message. How do I change the commit message?

I have not yet pushed the commit to anyone.

Answered By: bueno ( 2708)
git commit --amend -m "New commit message"

Used to amend the tip of the current branch. Prepare the tree object you would want to replace the latest commit as usual (this includes the usual -i/-o and explicit paths), and the commit log editor is seeded with the commit message from the tip of the current branch. The commit you create replaces the current tip -- if it was a merge, it will have the parents of the current tip as parents -- so the current top commit is discarded.

It is a rough equivalent for:

$ git reset --soft HEAD^
$ ... do something else to come up with the right tree ...
$ git commit -c ORIG_HEAD

but can be used to amend a merge commit.

I'm trying to amass a list of programming books that are freely available on the Internet. The books can be about a particular programming language or about computers in general.

What are some freely available programming books on the Internet?

Answered By: George Stocker ( 2487)

Meta-Lists

Graphics Programming

Language Agnostic

Android

Autotools

ASP.NET MVC

Assembly Language

Bash

C / C++

C#

  • See .NET below

Clojure

CoffeeScript

ColdFusion

DB2

Delphi / Pascal

Django

Emacs

Erlang

Flex

F#

Forth

Git

Go

Grails

Haskell

HTML / CSS

Java

JavaScript

JavaScript (Node.js specific)

LaTeX

Linux

Lisp

Lua

Mathematica

Maven

Mercurial

Nemerle

  • See .NET below

.NET (C# / VB / Nemerle / Visual Studio)

NoSQL

Oberon

Objective-C

OCaml

Oracle Server

Oracle PL/SQL

Parrot / Perl 6

Perl

PHP

PowerShell

Prolog

PostgreSQL

Python

R

Ruby

Ruby on Rails

Scala

Scheme

Sed

Smalltalk

Subversion

SQL (implementation agnostic)

Teradata

Vim

Websphere

Windows Phone

2278
grepsedawk

This question attempts to collect the few pearls among the dozens of bad C++ books that are released every year.

Unlike many other programming languages, which are often picked up on the go from tutorials found on the Internet, few are able to quickly pick up C++ without studying a good C++ book. It is way too big and complex for doing this. In fact, it is so big and complex, that there are very many very bad C++ books out there. And we are not talking about bad style, but things like sporting glaringly obvious factual errors and promoting abysmally bad programming styles. And it's even worse with online tutorials. (There is a reason nobody bothered to setup a similar question for online tutorials.)

Please provide quality books and an approximate skill level — preferably after discussing your addition in the C++ chat room. (The regulars might mercilessly undo your work if they disagree with a recommendation.) Add a short blurb/description about each book that you have personally read/benefited from. Feel free to debate quality, headings, etc. Books that meet the criteria will be added to the list. Books that have reviews by the Association of C and C++ Users (ACCU) have links to the review.

To spell it out bluntly: There is no need to add a 75th answer to this question. If you feel like a book should be added, suggest it to the community and let's discuss it.


Note: FAQs and other resources can be found in the C++ tag info and under . There is also a similar post for C: The Definitive C Book Guide and List


Reference Style - All Levels

  1. The C++ Programming Language (Bjarne Stroustrup) (soon to be updated for C++11) The classic introduction to C++ by its creator. Written to parallel the classic K&R, this indeed reads very much alike it and covers just about everything from the core language to the standard library, to programming paradigms to the language's philosophy. (Thereby making the latest editions break the 1k page barrier.) [Review] The fourth edition (to be released in 2013) will cover C++11.

  2. C++ Standard Library Tutorial and Reference (Nicolai Josuttis) (updated for C++11) The introduction and reference for the C++ Standard Library. The second edition (released on April 9, 2012) covers C++11. [Review]

  3. The C++ IO Streams and Locales (Angelika Langer and Klaus Kreft) There's very little to say about this book except that, if you want to know anything about streams and locales, then this is the one place to find definitive answers. [Review]

C++ 11 References:

  1. The C++ Standard (INCITS/ISO/IEC 14882-2011) This, of course, is the final arbiter of all that is or isn't C++. Be aware, however, that it is intended purely as a reference for experienced users willing to devote considerable time and effort to its understanding. As usual, the first release was quite expensive ($300+ US), but it has now been released in electronic form for $30US -- probably the least expensive of the reference books listed here.

  2. Overview of the New C++ (C++11) By Scott Meyers, who's a highly respected author on C++. Even though the list of items is short, the quality is high.


Beginner

Introductory

If you are new to programming or if you have experience in other languages and are new to C++, these books are highly recommended.

  1. C++ Primer† (Stanley Lippman, Josée Lajoie, and Barbara E. Moo) (updated for C++11) Coming at 1k pages, this is a very thorough introduction into C++ that covers just about everything in the language in a very accessible format and in great detail. The fifth edition (released August 16, 2012) covers C++11. [Review]

  2. Accelerated C++ (Andrew Koenig and Barbara Moo) This basically covers the same ground as the C++ Primer, but does so on a fourth of its space. This is largely because it does not attempt to be an introduction to programming, but an introduction to C++ for people who've previously programmed in some other language. It has a steeper learning curve, but, for those who can cope with this, it is a very compact introduction into the language. (Historically, it broke new ground by being the first beginner's book using a modern approach at teaching the language.) [Review]

  3. Thinking in C++ (Bruce Eckel) Two volumes; second is more about standard library, but still very good

  4. Programming: Principles and Practice Using C++ (Bjarne Stroustrup) An introduction to programming using C++ by the creator of the language. A good read, that assumes no previous programming experience, but is not only for beginners.

† Not to be confused with C++ Primer Plus (Stephen Prata), with a significantly less favorable review.

Best practices

  1. Effective C++ (Scott Meyers) This was written with the aim of being the best second book C++ programmers should read, and it succeeded. Earlier editions were aimed at programmers coming from C, the third edition changes this and targets programmers coming from languages like Java. It presents ~50 easy-to-remember rules of thumb along with their rationale in a very accessible (and enjoyable) style. [Review]

  2. Effective STL (Scott Meyers) This aims to do the same to the part of the standard library coming from the STL what Effective C++ did to the language as a whole: It presents rules of thumb along with their rationale. [Review]


Intermediate

  1. More Effective C++ (Scott Meyers) Even more rules of thumb than Effective C++. Not as important as the ones in the first book, but still good to know.

  2. Exceptional C++ (Herb Sutter) Presented as a set of puzzles, this has one of the best and thorough discussions of the proper resource management and exception safety in C++ through Resource Acquisition is Initialization (RAII) in addition to in-depth coverage of a variety of other topics including the pimpl idiom, name lookup, good class design, and the C++ memory model. [Review]

  3. More Exceptional C++ (Herb Sutter) Covers additional exception safety topics not covered in Exceptional C++, in addition to discussion of effective object oriented programming in C++ and correct use of the STL. [Review]

  4. Exceptional C++ Style (Herb Sutter) Discusses generic programming, optimization, and resource management; this book also has an excellent exposition of how to write modular code in C++ by using nonmember functions and the single responsibility principle. [Review]

  5. C++ Coding Standards (Herb Sutter and Andrei Alexandrescu) "Coding standards" here doesn't mean "how many spaces should I indent my code?" This book contains 101 best practices, idioms, and common pitfalls that can help you to write correct, understandable, and efficient C++ code. [Review]

  6. C++ Templates: The Complete Guide (David Vandevoorde and Nicolai M. Josuttis) This is the book about C++ templates. It covers everything from the very basics to some of the most advanced template metaprogramming and explains every detail of how templates work (both conceptually and at how they are implemented) and discusses many common pitfalls. Has excellent summaries of the One Definition Rule (ODR) and overload resolution in the appendices. [Review]


Above Intermediate

  1. Modern C++ Design (Andrei Alexandrescu) A groundbreaking book on advanced generic programming techniques. Introduces policy-based design, type lists, and fundamental generic programming idioms then explains how many useful design patterns (including small object allocators, functors, factories, visitors, and multimethods) can be implemented efficiently, modularly, and cleanly using generic programming. [Review]

  2. C++ Template Metaprogramming (David Abrahams and Aleksey Gurtovoy)

  3. C++ Concurrency In Action (Anthony Williams) A book covering C++11 concurrency support including the thread library, the atomics library, the C++ memory model, locks and mutexes, as well as issues of designing and debugging multithreaded applications.

  4. Advanced C++ Metaprogramming (Davide Di Gennaro) A pre-C++11 manual of TMP techniques, focused more on practice than theory. There are a ton of snippets in this book, some of which are made obsolete by typetraits, but the techniques, are nonetheless, useful to know. If you can put up with the quirky formatting/editing, it is easier to read than Alexandrescu, and arguably, more rewarding. For more experienced developers, there is a good chance that you may pick up something about a dark corner of C++ (a quirk) that usually only comes about through extensive experience.


Classics / Older

Note: Some information contained within these books may not be up to date or no longer considered best practice.

  1. The Design and Evolution of C++ (Bjarne Stroustrup) If you want to know why the language is the way it is, this book is where you find answers. This covers everything before the standardization of C++.

  2. Ruminations on C++ - (Andrew Koenig and Barbara Moo) [Review]

  3. Advanced C++ Programming Styles and Idioms (James Coplien) A predecessor of the pattern movement, it describes many C++-specific "idioms". It's certainly a very good book and still worth a read if you can spare the time, but quite old and not up-to-date with current C++.

  4. Large Scale C++ Software Design (John Lakos) Lakos explains techniques to manage very big C++ software projects. Certainly a good read, if it only was up to date. It was written long before C++98, and misses on many features (e.g. namespaces) important for large scale projects. If you need to work in a big C++ software project, you might want to read it, although you need to take more than a grain of salt with it. There's been the rumor that Lakos is writing an up-to-date edition of the book for years.

  5. Inside the C++ Object Model (Stanley Lippman) If you want to know how virtual member functions are commonly implemented and how base objects are commonly laid out in memory in a multi-inheritance scenario, and how all this affects performance, this is where you will find thorough discussions of such topics.

This question has historical significance, but is not a good example of an appropriate question. Read and learn from this post, but please do not use it as evidence that you can ask similar questions.

See the FAQ for more info.

Answered By: Johannes Schaub - litb ( 379)

About C++ Templates The Complete Guide

I can only say good things about it. Written by two top experts. And you really notice that. The examples are well written, explained and placed.

They start by teaching you the basics on a few pages. Then they talk about all the pitfalls and hidden danger there is in template programming and the cool stuff that can be done with them. At the end, they explain in detail the overload resolution and the one definition rule in Appendix A and B.

Strongly recommended :)

2170
Hamza Yerlikaya

I accidentally added the wrong directory containing my files in Git. Instead of adding a .java file, I added the directory containing the .class file. How can I undo this action?

Answered By: Esko Luontola ( 2020)

http://git-scm.com/docs/git-reset

Undo a commit and redo

$ git commit ...              (1)
$ git reset --soft HEAD^      (2)
$ edit                        (3)
$ git add ....                (4)
$ git commit -c ORIG_HEAD     (5)
  1. This is what you want to undo

  2. This is most often done when you remembered what you just committed is incomplete, or you misspelled your commit message, or both. Leaves working tree as it was before "reset".

  3. Make corrections to working tree files.

  4. Stage changes for commit.

  5. "reset" copies the old head to .git/ORIG_HEAD; redo the commit by starting with its log message. If you do not need to edit the message further, you can give -C option instead.

1814
GManNickG

After reading "Hidden Features and Dark Corners of C++/STL" on comp.lang.c++.moderated, I was completely surprised that it compiled and worked in both Visual Studio 2008 and G++ 4.4. The code:

#include <stdio.h>
int main()
{
     int x = 10;
     while( x --> 0 ) // x goes to 0
     {
       printf("%d ", x);
     }
}

Where in the standard is this defined, and where did it come from?

I'd assume C, since it works in GCC as well, but I put C++ on there just in case C++ has more to mention on it. On a more subjective note, I've never heard of this before, had anybody else? Is it worth using?

Answered By: Charles Salvia ( 1680)

That's not an operator -->. That's two separate operators, -- and >.

Your condition code is decrementing x, while returning xs original (not decremented) value, and then comparing the original value with 0 using the > operator.

To better understand, the statement could be as follows:

while( (x--) > 0 )
1760
Michiel de Mare

Form based authentication for websites

We believe that Stack Overflow should not just be a resource for very specific technical questions, but also for general guidelines on how to solve variations on common problems. "Form based authentication for websites" should be a fine topic for such an experiment.

It should include topics such as:

  • How to log in
  • How to remain logged in
  • How to store passwords
  • Using secret questions
  • Forgotten password functionality
  • OpenID
  • "Remember me" checkbox
  • Browser autocompletion of usernames and passwords
  • Secret URLs (public URLs protected by digest)
  • Checking password strength
  • E-mail validation
  • and much more ...

It should not include things like:

  • roles and authorization
  • HTTP basic authentication

Please help us by

  1. Suggesting subtopics
  2. Submitting good articles about this subject
  3. Editing the official answer
Answered By: Jens Roland ( 1215)

PART I: How To Log In

  1. As a rule, CAPTCHAs should be a last resort. They tend to be annoying, often aren't human-solvable, most of them are ineffective against bots, all of them are ineffective against cheap third-world labor (according to OWASP, the current sweatshop rate is $12 per 500 tests), and some implementations are technically illegal in some countries (see link number 1 from the MUST-READ list). If you must use a CAPTCHA, use reCAPTCHA, since it is OCR-hard by definition (since it uses already OCR-misclassified book scans).

  2. It is possible to prevent browsers from storing/retrieving a password with the autocomplete tag for forms/input fields. However in the real world, your customers will have many accounts on different systems; it compromises their security if they use the same password for every site. Can you expect them to remember different passwords for every site? There are some good password managers out there, however there are also bad ones - which will become a target for attackers.

  3. The only (currently practical) way to protect against login interception (packet sniffing) during login is by using a certificate-based encryption scheme (for example, SSL) or a proven & tested challenge-response scheme (for example, the Diffie-Hellman-based SRP). Any other method can be easily circumvented by an eavesdropping attacker. On that note: hashing the password client-side (for example, with JavaScript) is useless unless it is combined with one of the above - that is, either securing the line with strong encryption or using a tried-and-tested challenge-response mechanism (if you don't know what that is, just know that it is one of the most difficult to prove, most difficult to design, and most difficult to implement concepts in digital security). Hashing the password is effective against password disclosure, but not against replay attacks, Man-In-The-Middle attacks / hijackings, or brute-force attacks (since we are handing the attacker both username, salt and hashed password).

  4. After sending the authentication tokens, the system needs a way to remember that you have been authenticated - this fact should only ever be stored serverside in the session data. A cookie can be used to reference the session data. Wherever possible, the cookie should have the secure and HTTP Only flags set when sent to the browser. The httponly flag provides some protection against the cookie being read by a XSS attack. The secure flag ensures that the cookie is only sent back via HTTPS, and therefore protects against network sniffing attacks. The value of the cookie should not be predictable. Where a cookie referencing a non-existent session is presented, its value should be replaced immediately to prevent session fixation.

PART II: How To Remain Logged In - The Infamous "Remember Me" Checkbox

Persistent Login Cookies ("remember me" functionality) are a danger zone; on the one hand, they are entirely as safe as conventional logins when users understand how to handle them; and on the other hand, they are an enormous security risk in the hands of most users, who use them on public computers, forget to log out, don't know what cookies are or how to delete them, etc.

Personally, I want my persistent logins for the web sites I visit on a regular basis, but I know how to handle them safely. If you are positive that your users know the same, you can use persistent logins with a clean conscience. If not - well, then you're more like me; subscribing to the philosophy that users who are careless with their login credentials brought it upon themselves if they get hacked. It's not like we go to our user's houses and tear off all those facepalm-inducing Post-It notes with passwords they have lined up on the edge of their monitors, either. If people are idiots, then let them eat idiot cake.

Of course, some systems can't afford to have any accounts hacked; for such systems, there is no way you can justify having persistent logins.

If you DO decide to implement persistent login cookies, this is how you do it:

  1. First, follow Charles Miller's 'Best Practices' article Do not get tempted to follow the 'Improved' Best Practices linked at the end of his article. Sadly, the 'improvements' to the scheme are easily thwarted (all an attacker has to do when stealing the 'improved' cookie is remember to delete the old one. This will require the legitimate user to re-login, creating a new series identifier and leaving the stolen one valid).

  2. And DO NOT STORE THE PERSISTENT LOGIN COOKIE (TOKEN) IN YOUR DATABASE, ONLY A HASH OF IT! The login token is Password Equivalent, so if an attacker got his hands on your database, he/she could use the tokens to log in to any account, just as if they were cleartext login-password combinations. Therefore, use strong salted hashing (bcrypt / phpass) when storing persistent login tokens.

PART III: Using Secret Questions

Don't. Never ever use 'secret questions'. Read the paper from link number 5 from the MUST-READ list. You can ask Sarah Palin about that one, after her Yahoo! email account got hacked during the presidential campaign because the answer to her 'security' question was... (wait for it) ... "Wasilla High School"!

Even with user-specified questions, it is highly likely that most users will choose either:

  • A 'standard' secret question like mother's maiden name or favourite pet

  • A simple piece of trivia that anyone could lift from their blog, LinkedIn profile, or similar

  • Any question that is easier to answer than guessing their password. Which, for any decent password, is every question conceivable.

In conclusion, security questions are inherently insecure in all their forms and variations, and should never be employed in an authentication scheme for any reason.

A secondary question is often considered as adequate for fulfilling a requirement for two-factor authentication. While capturing some of the response via clicks rather than typing in theory provides protection against keylogger attacks, it is still just an extension to the password mechanism - and when users are presented with a text box instead of drop-downs on a phishing site, they rarely perceive this as abnormal. Note that you may be able to fulfill your two-factor obligations by using a long-lasting cookie (granted on submission of multiple authentication questions) in place of a security question asked each and every time - but at the expense of user convenience.

The only reason anyone still uses security questions by choice is that is saves the cost of a few support calls from users who can't remember their email passwords to get to their reactivation codes. At the expense of security and Sara Palin's reputation, that is. Worth it? You be the judge.

PART IV: Forgotten Password Functionality

I already mentioned why you should never use security questions for handling forgotten/lost user passwords. There are at least two more all-too-common pitfalls to avoid in this field:

  1. Don't RESET user's passwords no matter what - 'reset' passwords are harder for the user to remember, which means he/she MUST either change it OR write it down - say, on a bright yellow Post-It on the edge of his monitor. Instead, just let users pick a new one right away - which is what they want to do anyway.

  2. Always hash the lost password code/token in the database. AGAIN, this code is another example of a Password Equivalent, so it MUST be hashed in case an attacker got his hands on your database. When a lost password code is requested, send the plaintext code to the user's email address, then hash it, save the hash in your database -- and throw away the original. Just like a password or a persistent login token.

A final note: always make sure your interface for entering the 'lost password code' is at least as secure as your login form itself, or an attacker will simply use this to gain access instead. Making sure you generate very long 'lost password codes' (for example, 16 case sensitive alphanumeric characters) is a good start, but consider adding the same throttling that you do for logins.

PART V: Checking Password Strength

First, you'll want to read this small article for a reality check: The 500 most common passwords

Okay, so maybe the list isn't the canonical list of most common passwords on any system anywhere ever, but it's a good indication of how poorly people will choose their passwords when there is no enforced policy in place. Plus, the list looks frighteningly close to home when you compare it to the publicly available analyses of 40.000+ recently stolen MySpace passwords.

Well, enough MySpace-bashing for now. Moving on..

So: With no minimum password strength requirements, 2% of users use one of the top 20 most common passwords. Meaning: if an attacker gets just 20 attempts, 1 in 50 accounts on your website will be crackable.

Thwarting this requires calculating the entropy of a password and then applying a threshold. The National Institute of Standards and Technology (NIST) Special Publication 800-63 has a set of very good suggestions. That, when combined with a dictionary and keyboard layout analysis (for example, 'qwertyuiop' is a bad password), can reject 99% of all poorly selected passwords at a level of 18 bits of entropy. Simply calculating password strength and showing a visual strength meter to a user is insufficient. Unless it is enforced, users will ignore it.

PART VI: Much More - Or: Preventing Rapid-Fire Login Attempts

First, have a look at the numbers: Password Recovery Speeds - How long will your password stand up

If you don't have the time to look through the tables in that link, here's the gist of them:

  1. It takes virtually no time to crack a weak password, even if you're cracking it with an abacus

  2. It takes virtually no time to crack an alphanumeric 9-character password, if it is case insensitive

  3. It takes virtually no time to crack an intricate, symbols-and-letters-and-numbers, upper-and-lowercase password, if it is less than 8 characters long (a desktop PC can search the entire keyspace up to 7 characters in a matter of days or even hours)

  4. It would, however, take an inordinate amount of time to crack even a 6-character password, if you were limited to one attempt per second!

So what can we learn from these numbers? Well, lots, but we can focus on the most important part: the fact that preventing large numbers of rapid-fire successive login attempts (ie. the brute force attack) really isn't that difficult. But preventing it right isn't as easy as it seems.

Generally speaking, you have three choices that are all effective against brute-force attacks (and dictionary attacks, but since you are already employing a strong passwords policy, they shouldn't be an issue):

  • Present a CAPTCHA after N failed attempts (annoying as hell and often ineffective -- but I'm repeating myself here)

  • Locking accounts and requiring email verification after N failed attempts (this is a DoS attack waiting to happen)

  • And finally, login throttling: that is, setting a time delay between attempts after N failed attempts (yes, DoS attacks are still possible, but at least they are far less likely and a lot more complicated to pull off).

Best practice #1: A short time delay that increases with the number of failed attempts, like:

  • 1 failed attempt = no delay
  • 2 failed attempts = 2 sec delay
  • 3 failed attempts = 4 sec delay
  • 4 failed attempts = 8 sec delay
  • 5 failed attempts = 16 sec delay
  • etc.

DoS attacking this scheme would be very impractical, but on the other hand, potentially devastating, since the delay increases exponentially. A DoS attack lasting a few days could suspend the user for weeks.

To clarify: The delay is not a delay before returning the response to the browser. It is more like a timeout or refractory period during which login attempts to a specific account or from a specific IP address will not be accepted or evaluated at all. That is, correct credentials will not return in a successful login, and incorrect credentials will not trigger a delay increase.

Best practice #2: A medium length time delay that goes into effect after N failed attempts, like:

  • 1-4 failed attempts = no delay
  • 5 failed attempts = 15-30 min delay

DoS attacking this scheme would be quite impractical, but certainly doable. Also, it might be relevant to note that such a long delay can be very annoying for a legitimate user. Forgetful users will dislike you.

Best practice #3: Combining the two approaches - either a fixed, short time delay that goes into effect after N failed attempts, like:

  • 1-4 failed attempts = no delay
  • 5+ failed attempts = 20 sec delay

Or, an increasing delay with a fixed upper bound, like:

  • 1 failed attempt = 5 sec delay
  • 2 failed attempts = 15 sec delay
  • 3+ failed attempts = 45 sec delay

This final scheme was taken from the OWASP best-practices suggestions (link 1 from the MUST-READ list), and should be considered best practice, even if it is admittedly on the restrictive side.

As a rule of thumb however, I would say: the stronger your password policy is, the less you have to bug users with delays. If you require strong (case-sensitive alphanumerics + required numbers and symbols) 9+ character passwords, you could give the users 2-4 non-delayed password attempts before activating the throttling.

DoS attacking this final login throttling scheme would be very impractical. And as a final touch, always allow persistent (cookie) logins (and/or a CAPTCHA-verified login form) to pass through, so legitimate users won't even be delayed while the attack is in progress. That way, the very impractical DoS attack becomes an extremely impractical attack.

Additionally, it makes sense to do more aggressive throttling on admin accounts, since those are the most attractive entry points

PART VII: Distributed Brute Force Attacks

Just as an aside, more advanced attackers will try to circumvent login throttling by 'spreading their activities':

  • Distributing the attempts on a botnet to prevent IP address flagging

  • Rather than picking one user and trying the 50.000 most common passwords (which they can't, because of our throttling), they will pick THE most common password and try it against 50.000 users instead. That way, not only do they get around maximum-attempts measures like CAPTCHAs and login throttling, their chance of success increases as well, since the number 1 most common password is far more likely than number 49.995

  • Spacing the login requests for each user account, say, 30 seconds apart, to sneak under the radar

Here, the best practice would be logging the number of failed logins, system-wide, and using a running average of your site's bad-login frequency as the basis for an upper limit that you then impose on all users.

Too abstract? Let me rephrase:

Say your site has had an average of 120 bad logins per day over the past 3 months. Using that (running average), your system might set the global limit to 3 times that -- ie. 360 failed attempts over a 24 hour period. Then, if the total number of failed attempts across all accounts exceeds that number within one day (or even better, monitor the rate of acceleration and trigger on a calculated treshold), it activates system-wide login throttling - meaning short delays for ALL users (still, with the exception of cookie logins and/or backup CAPTCHA logins).

I also posted a question with more details and a really good discussion of how to avoid tricky pitfals in fending off distributed brute force attacks

PART VIII: Two-Factor Authentication and Authentication Providers

Credentials can be compromised, whether by exploits, passwords being written down and lost, laptops with keys being stolen, or users entering logins into phishing sites. Logins can be protected with two-factor authentication, which use out-of-band factors such as single-use codes received from a phone call, SMS message, or dongle. Several providers offer two-factor authentication services.

Authentication can be completely delegated to a single-sign-on service such as OAuth, OpenID or Persona (nee BrowserID), where another provider handles collecting credentials. This pushes the problem to a trusted third party. Twitter is an example of an OAuth provider, while Facebook provides a similar proprietary solution.

MUST-READ LINKS About Web Authentication

I have a Flash project, and it has many source files. I have a fairly heavily-used class, call it Jenine. I recently (and, perhaps, callously) relocated Jenine from one namespace to another. I thought we were ready - I thought it was time. The new Jenine was better in every way - she had lost some code bloat, she had decoupled herself from a few vestigial class relationships, and she had finally come home to the namespace that she had always secretly known in her heart was the one she truly belonged to. She was among her own kind.

Unfortunately, Flash would have none of that. Perhaps it had formed an attachment. Perhaps it didn't want Jenine to be decoupled. Either way, it clung to the old, perfect version of Jenine in its memory. It refused to move on. It ignored her (function) calls. It tried to forget her new, public interfaces. Instead, every instance of Jenine that it constructed was always a copy of the old version, down to its classpath:

var jenineInstance:Jenine = new Jenine();
trace( getQualifiedClassName(jenineInstance));
// Should print: com.newnamespace.subspace::Jenine
// Prints: com.oldnamespace.subspace::Jenine
// Ah, young love!

We fought. I'm not proud of some of the things I said or did. In the end, in a towering fit of rage, I deleted all references of Jenine completely. She was utterly, completely erased from the system. My cursor fell upon the "Empty Trash" menu option like the cold lid of a casket.

I don't think Flash ever recovered. To this day it still clings to the memory of Jenine. Her old, imperfect definitions still float through my project like abandoned ghosts. Whenever I force Flash to compile, it still lovingly inserts her into my movie, nestling her definition in amongst the other, living classes, like a small shrine. I wonder if they can see her.

Flash and I don't really talk anymore. I write my code, it compiles it. There's a new girl in town named Summer who looks almost identical to Jenine, as if someone had just copied her source-code wholesale into a new class, but Flash hasn't shown any interest. Most days it just mopes around and writes bad poetry in my comments when it thinks I'm not looking.

I hope no one else has had a similar experience, that this is just a singular, painful ripple in the horrifying dark lagoon that is the Flash code-base. If, by some fluke chance you have, or you have any idea how to erase whatever damn cache the compiler is using, please, please help.

Answered By: murzeb ( 448)

Flash still has the ASO file, which is the compiled byte code for your classes. On Windows, you can see the ASO files here:

C:\Documents and Settings\username\Local Settings\Application Data\Adobe\Flash CS4\en\Configuration\Classes\aso

On a Mac, the directory structure is similar in /Users/username/Library/Application Support/


You can remove those files by hand, or in Flash you can select Control->Delete ASO files to remove them.

1610
Philip Morton

In jQuery, suppose you have an element of some kind that you're hiding and showing, using .hide(), .show() or .toggle(). How do you test to see if that element is currently hidden or visible on the screen?

Answered By: Tsvetomir Tsonev ( 2263)

As, the question refers to a single element, this code might be more suitable:

$(element).is(":visible") // Checks for display:[none|block], ignores visible:[true|false]

Same as twernt's suggestion, but applied to a single element.