Top sorting Questions

List of Tags

Yesterday I was pairing the socks from the clean laundry, and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.

As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came into mind to achieve an O(NlogN) solution.

Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).

So, the question is basically:

Given a pile of n pairs of socks, containing 2n elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)

I will appreciate an answer that addresses the following aspects:

  • A general theoretical solution for a huge number of socks.
  • The actual number of socks is not that large, I don't believe me and my spouse have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers, can this be utilized as well?)
  • Is it equivalent to the element distinctness problem?
Answered By: usr ( 613)

Sorting solutions have been proposed but sorting is a little too much: We don't need order, we just need equality groups.

So hashing would be enough (and faster).

  1. For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.
  2. Iterate over each pile and distribute it by some other metric (e.g. pattern) into a second set of piles
  3. Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.

If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).

The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)


What about parallelism? Can multiple humans match the socks faster?

  1. The simplest parallization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much - imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!).
  2. It scales nearly indefinitely if each worker has its own set of piles. Workers can then take from the input basket big chunks of socks (very little contention as they are doing it rarely) and they do not need to sync when distributing the socks at all (because they have thread-local piles). At the end all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).

Clearly, one cannot go faster than O(N) so we have reached the optimal lower bound.

Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.

Favourite Chigozie Onwuemene

I have a computer with 1M of RAM and no other local storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over another TCP connection. The list of numbers may contain duplicates, which I must not discard. The code will be placed in ROM, so I need not subtract the size of my code from the 1M. I already have code to drive the ethernet port and handle TCP/IP connections, and it requires 2k for its state data, including a 1k buffer via which the code will read and write data. Is there a solution to this problem?

Sources Of Question And Answer:

Answered By: preshing ( 330)

Here's some working C++ code which solves the problem.

Proof that the memory constraints are satisified:

typedef unsigned int u32;

namespace WorkArea
    static const u32 circularSize = 253250;
    u32 circular[circularSize] = { 0 };         // consumes 1013000 bytes

    static const u32 stageSize = 8000;
    u32 stage[stageSize];                       // consumes 32000 bytes


Together, these two arrays take 1045000 bytes of storage. That leaves 1048576 - 1045000 - 2×1024 = 1528 bytes for remaining variables and stack space.

It runs in about 23 seconds on my Xeon W3520. You can verify that the program works using the following Python script, assuming a program name of sort1mb.exe.

from subprocess import *
import random

sequence = [random.randint(0, 99999999) for i in xrange(1000000)]

sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE)
for value in sequence:
    sorter.stdin.write('%08d\n' % value)

result = [int(line) for line in sorter.stdout]
print('OK!' if result == sorted(sequence) else 'Error!')

A detailed explanation of the algorithm can be found in the following series of posts:

dark fader

I have a dictionary of values read from 2 fields in a database: a string field and a numeric field. The string field is unique so that is the key of the dictionary.

I can sort on the keys, but how can I sort based on the values?

Note: I have read this post and probably could change my code to have a list of dictionaries but since I do not really need a list of dictionaries I wanted to know if there a simpler solution.

Answered By: Devin Jeanpierre ( 485)

It is not possible to sort a dict, only to get a representation of a dict that is sorted. Dicts are inherently orderless, but other types, such as lists and tuples, are not. So you need a sorted representation, which will be a list—probably a list of tuples. For instance,

import operator
x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
sorted_x = sorted(x.iteritems(), key=operator.itemgetter(1))

sorted_x will be a list of tuples sorted by the second element in each tuple. dict(sorted_x) == x.

What I want to do seems pretty simple, but I can't find any answers on the web. I have an NSMutableArray of objects, let's say they are 'Person' objects. I want to sort the NSMutableArray by Person.birthDate which is an NSDate.

I think it has something to do with this method:

NSArray *sortedArray = [drinkDetails sortedArrayUsingSelector:@selector(???)];

In Java I would make my object implement Comparable, or use Collections.sort with an inline custom on earth do you do this in Objective-C?

Answered By: Georg Schölly ( 720)

Compare method

Either you implement a compare-method for your object:

- (NSComparisonResult)compare:(Person *)otherObject {
    return [self.birthDate compare:otherObject.birthDate];

NSArray *sortedArray;
sortedArray = [drinkDetails sortedArrayUsingSelector:@selector(compare:)];

NSSortDescriptor (better)

or usually even better:

NSSortDescriptor *sortDescriptor;
sortDescriptor = [[[NSSortDescriptor alloc] initWithKey:@"birthDate"
                                              ascending:YES] autorelease];
NSArray *sortDescriptors = [NSArray arrayWithObject:sortDescriptor];
NSArray *sortedArray;
sortedArray = [drinkDetails sortedArrayUsingDescriptors:sortDescriptors];

You can easily sort by multiple keys by adding more than one to the array. Using custom comparator-methods is possible as well. Have a look at the documentation.

Blocks (shiny!)

There's also the possibility of sorting with a block since Mac OS X 10.6 and iOS 4:

NSArray *sortedArray;
sortedArray = [drinkDetails sortedArrayUsingComparator:^NSComparisonResult(id a, id b) {
    NSDate *first = [(Person*)a birthDate];
    NSDate *second = [(Person*)b birthDate];
    return [first compare:second];

I often have a Dictionary of keys & values and need to sort it by value. For example, I have a hash of words and their frequencies, and want to order them by frequency.

There's SortedList which is good for a single value (frequency), but I want to map it back to the word.

SortedDictionary orders by key, not value. Some resort to a custom class, but what's the cleanest way?

Answered By: Leon Bambrick ( 172)
List<KeyValuePair<string, string>> myList = aDictionary.ToList();

    delegate(KeyValuePair<string, string> firstPair,
    KeyValuePair<string, string> nextPair)
        return firstPair.Value.CompareTo(nextPair.Value);

Since you're targeting .net 2.0 or above, you can simplify this into lambda syntax -- it's equivalent but shorter. If you're targeting .net 2.0 you can only use this syntax if you're using the compiler from vs2008 (or above).

List<KeyValuePair<string, string>> myList = aDictionary.ToList();

myList.Sort((firstPair,nextPair) =>
        return firstPair.Value.CompareTo(nextPair.Value);

I got a list of dictionaries and want that to be sorted by a value of that dictionary.


[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]

sorted by name, should become

[{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]
Answered By: Mario Fernandez ( 286)

It may look cleaner using a key instead a cmp:

newlist = sorted(list_to_be_sorted, key=lambda k: k['name']) 

or as J.F.Sebastian and others suggested,

from operator import itemgetter
newlist = sorted(list_to_be_sorted, key=itemgetter('name')) 
SaaS Developer

I'm using C# on Framework 3.5. I'm looking to quickly sort a Generic List<>. For the sake of this example lets say I have a List of a Person type with a property of lastname. How would I sort this List using a lambda expression?

List<Person> people = PopulateList();
people.OrderBy(???? => ?????)
Answered By: Marc Gravell ( 271)

If you mean an in-place sort (i.e. the list is updated):

people.Sort((x, y) => string.Compare(x.LastName, y.LastName));

If you mean a new list:

var newList = people.OrderBy(x=>x.LastName).ToList(); // ToList optional

Answering to another StackOverflow question (this one) I stumbled upon an interesting sub-problem. What is the fastest way to sort an array of 6 ints ?

As the question is very low level:

  • we can't assume libraries are available (and the call itself has its cost), only plain C
  • to avoid emptying instruction pipeline (that has a very high cost) we should probably minimize branches, jumps, and every other kind of control flow breaking (like those hidden behind sequence points in && or ||).
  • room is constrained and minimizing registers and memory use is an issue, ideally in place sort is probably best.

Really this question is a kind of Golf where the goal is not to minimize source length but execution time. I call it 'Zening` code as used in the title of the book Zen of Code optimization by Michael Abrash and its sequels.

As for why it is interesting, there is several layers:

  • the exemple is simple and easy to understand and measure, not much C skill involved
  • it show effects of choice of a good algorithm for the problem, but also effects of the compiler and underlying harware.

Here is my reference (naive, not optimized) implementation and my test set.

#include <stdio.h>

static __inline__ int sort6(int * d){

    char j, i, imin;
    int tmp;
    for (j = 0 ; j < 5 ; j++){
        imin = j;
        for (i = j + 1; i < 6 ; i++){
            if (d[i] < d[imin]){
                imin = i;
        tmp = d[j];
        d[j] = d[imin];
        d[imin] = tmp;

static __inline__ unsigned long long rdtsc(void)
  unsigned long long int x;
     __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
     return x;

int main(int argc, char ** argv){
    int i;
    int d[6][5] = {
        {1, 2, 3, 4, 5, 6},
        {6, 5, 4, 3, 2, 1},
        {100, 2, 300, 4, 500, 6},
        {100, 2, 3, 4, 500, 6},
        {1, 200, 3, 4, 5, 600},
        {1, 1, 2, 1, 2, 1}

    unsigned long long cycles = rdtsc();
    for (i = 0; i < 6 ; i++){
         * printf("d%d : %d %d %d %d %d %d\n", i,
         *  d[i][0], d[i][6], d[i][7],
         *  d[i][8], d[i][9], d[i][10]);
    cycles = rdtsc() - cycles;
    printf("Time is %d\n", (unsigned)cycles);

Raw results

As number of variants is becoming large, I gathered them all in a test suite that can be found here. The actual tests used are a bit less naive than those showed above, thanks to Kevin Stock. You can compile and execute it in your own environment. I'm quite interested by behavior on different target architecture/compilers. (OK guys, put it in answers, I will +1 every contributor of a new resultset).

I gave the answer to Daniel Stutzbach (for golfing) one year ago as he was at the source of the fastest solution at that time (sorting networks).

Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O2

  • Direct call to qsort library function : 689.38
  • Naive implementation (insertion sort) : 285.70
  • Insertion Sort (Daniel Stutzbach) : 142.12
  • Insertion Sort Unrolled : 125.47
  • Rank Order : 102.26
  • Rank Order with registers : 58.03
  • Sorting Networks (Daniel Stutzbach) : 111.68
  • Sorting Networks (Paul R) : 66.36
  • Sorting Networks 12 with Fast Swap : 58.86
  • Sorting Networks 12 reordered Swap : 53.74
  • Sorting Networks 12 reordered Simple Swap : 31.54
  • Reordered Sorting Network w/ fast swap : 31.54
  • Reordered Sorting Network w/ fast swap V2 : 33.63
  • Inlined Bubble Sort (Paolo Bonzini) : 48.85
  • Unrolled Insertion Sort (Paolo Bonzini) : 75.30

Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O1

  • Direct call to qsort library function : 705.93
  • Naive implementation (insertion sort) : 135.60
  • Insertion Sort (Daniel Stutzbach) : 142.11
  • Insertion Sort Unrolled : 126.75
  • Rank Order : 46.42
  • Rank Order with registers : 43.58
  • Sorting Networks (Daniel Stutzbach) : 115.57
  • Sorting Networks (Paul R) : 64.44
  • Sorting Networks 12 with Fast Swap : 61.98
  • Sorting Networks 12 reordered Swap : 54.67
  • Sorting Networks 12 reordered Simple Swap : 31.54
  • Reordered Sorting Network w/ fast swap : 31.24
  • Reordered Sorting Network w/ fast swap V2 : 33.07
  • Inlined Bubble Sort (Paolo Bonzini) : 45.79
  • Unrolled Insertion Sort (Paolo Bonzini) : 80.15

I included both -O1 and -O2 results because surprisingly for several programs O2 is less efficient than O1. I wonder what specific optimization has this effect ?

Comments on proposed solutions

Insertion Sort (Daniel Stutzbach)

As expected minimizing branches is indeed a good idea.

Sorting Networks (Daniel Stutzbach)

Better than insertion sort. I wondered if the main effect was not get from avoiding the external loop. I gave it a try by unrolled insertion sort to check and indeed we get roughly the same figures (code is here).

Sorting Networks (Paul R)

The best so far. The actual code I used to test is here. Don't know yet why it is nearly two times as fast as the other sorting network implementation. Parameter passing ? Fast max ?

Sorting Networks 12 SWAP with Fast Swap

As suggested by Daniel Stutzbach, I combined his 12 swap sorting network with branchless fast swap (code is here). It is indeed faster, the best so far with a small margin (roughly 5%) as could be expected using 1 less swap.

It is also interesting to notice that the branchless swap seems to be much (4 times) less efficient than the simple one using if on PPC architecture.

Calling Library qsort

To give another reference point I also tried as suggested to just call library qsort (code is here). As expected it is much slower : 10 to 30 times slower... as it became obvious with the new test suite, the main problem seems to be the initial load of the library after the first call, and it compares not so poorly with other version. It is just between 3 and 20 times slower on my Linux. On some architecture used for tests by others it seems even to be faster (I'm really surprised by that one, as library qsort use a more complex API).

Rank order

Rex Kerr proposed another completely different method : for each item of the array compute directly its final position. This is efficient because computing rank order do not need branch. The drawback of this method is that it takes three times the amount of memory of the array (one copy of array and variables to store rank orders). The performance results are very surprising (and interesting). On my reference architecture with 32 bits OS and Intel Core2 Quad E8300, cycle count was slightly below 1000 (like sorting networks with branching swap). But when compiled and executed on my 64 bits box (Intel Core2 Duo) it performed much better : it became the fastest so far. I finally found out the true reason. My 32bits box use gcc 4.4.1 and my 64bits box gcc 4.4.3 and the last one seems much better at optimising this particular code (there was very little difference for other proposals).

Sorting Networks 12 with reordered Swap

The amazing efficiency of the Rex Kerr proposal with gcc 4.4.3 made me wonder : how could a program with 3 times as much memory usage be faster than branchless sorting networks? My hypothesis was that it had less dependencies of the kind read after write, allowing for better use of the superscalar instruction scheduler of the x86. That gave me an idea: reorder swaps to minimize read after write dependencies. More simply put: when you do SWAP(1, 2); SWAP(0, 2); you have to wait for the first swap to be finished before performing the second one because both access to a common memory cell. When you do SWAP(1, 2); SWAP(4, 5);the processor can execute both in parallel. I tried it and it works as expected, the sorting networks is running about 10% faster.

Sorting Networks 12 with Simple Swap

One year after the original post Steinar H. Gunderson suggested, that we should not try to outsmart the compiler and keep the swap code simple. It's indeed a good idea as the resulting code is about 40% faster! He also proposed a swap optimized by hand using x86 inline assembly code that can still spare some more cycles. The most surprising (it says volumes on programmer's psychology) is that one year ago none of used tried that version of swap. Code I used to test is here. Others suggested other ways to write a C fast swap, but it yields the same performances as the simple one with a decent compiler.

The "best" code is now as follow:

static inline void sort6_sorting_network_simple_swap(int * d){
#define min(x, y) (x<y?x:y)
#define max(x, y) (x<y?y:x) 
#define SWAP(x,y) { const int a = min(d[x], d[y]); const int b = max(d[x], d[y]); d[x] = a; d[y] = b;}
    SWAP(1, 2);
    SWAP(4, 5);
    SWAP(0, 2);
    SWAP(3, 5);
    SWAP(0, 1);
    SWAP(3, 4);
    SWAP(1, 4);
    SWAP(0, 3);
    SWAP(2, 5);
    SWAP(1, 3);
    SWAP(2, 4);
    SWAP(2, 3);
#undef SWAP
#undef min
#undef max

If we believe our test set (and, yes it is quite poor, it's mere benefit is being short, simple and easy to understand what we are measuring), the average number of cycles of the resulting code for one sort is below 40 cycles (6 tests are executed). That put each swap at an average of 4 cycles. I call that amazingly fast. Any other improvements possible ?

Answered By: Daniel Stutzbach ( 68)

For any optimization, it's always best to test, test, test. I would try at least sorting networks and insertion sort. If I were betting, I'd put my money on insertion sort based on past experience.

Do you know anything about the input data? Some algorithms will perform better with certain kinds of data. For example, insertion sort performs better on sorted or almost-sorted dat, so it will be the better choice if there's an above-average chance of almost-sorted data.

The algorithm you posted is similar to an insertion sort, but it looks like you've minimized the number of swaps at the cost of more comparisons. Comparisons are far more expensive than swaps, though, because branches can cause the instruction pipeline to stall.

Here's an insertion sort implementation:

static __inline__ int sort6(int *d){
        int i, j;
        for (i = 1; i < 6; i++) {
                int tmp = d[i];
                for (j = i; j >= 1 && tmp < d[j-1]; j--)
                        d[j] = d[j-1];
                d[j] = tmp;

Here's how I'd build a sorting network. First, use this site to generate a minimal set of SWAP macros for a network of the appropriate length. Wrapping that up in a function gives me:

static __inline__ int sort6(int * d){
#define SWAP(x,y) if (d[y] < d[x]) { int tmp = d[x]; d[x] = d[y]; d[y] = tmp; }
    SWAP(1, 2);
    SWAP(0, 2);
    SWAP(0, 1);
    SWAP(4, 5);
    SWAP(3, 5);
    SWAP(3, 4);
    SWAP(0, 3);
    SWAP(1, 4);
    SWAP(2, 5);
    SWAP(2, 4);
    SWAP(1, 3);
    SWAP(2, 3);
#undef SWAP

I read about sorting ArrayLists using a Comparator but in all of the examples people used compareTo which according to some research is a method for Strings.

I wanted to sort an ArrayList of custom objects by one of their properties: a Date object (getStartDay()). Normally I compare them by item1.getStartDate().before(item2.getStartDate()) so I was wondering whether I could write something like:

public class customComparator {
    public boolean compare(Object object1, Object object2) {
        return object1.getStartDate().before(object2.getStartDate());

public class randomName {
    Collections.sort(Database.arrayList, new customComparator);

I just started with Java so please forgive my ignorance.

Answered By: Michael Myers ( 206)

Since Date implements Comparable, it has a compareTo method just like String does.

So your custom comparator could look like this:

public class CustomComparator implements Comparator<MyObject> {
    public int compare(MyObject o1, MyObject o2) {
        return o1.getStartDate().compareTo(o2.getStartDate());

(The compare() method must return an int, so you couldn't directly return a boolean like you were planning to anyway.)

Your sorting code would be just about like you wrote:

Collections.sort(Database.arrayList, new CustomComparator());

A couple of smaller points which are not directly related to the question:

  1. By convention, classes start with an upper-case letter while methods and variables start with a lower-case letter. That's why I changed the name of the comparator to CustomComparator.
  2. Use the Javadocs. They will be invaluable if you keep working with Java.
Christopher DuBois

I want to sort a data.frame by multiple columns in R. For example, with the data.frame below I would like to sort by column z (descending) then by column b (ascending):

dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), 
      levels = c("Low", "Med", "Hi"), ordered = TRUE),
      x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
      z = c(1, 1, 1, 2))
    b x y z
1  Hi A 8 1
2 Med D 3 1
3  Hi A 9 1
4 Low C 9 2
Answered By: Dirk Eddelbuettel ( 193)

You can use the order() function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order) code:

R> dd[with(dd, order(-z, b)), ]
    b x y z
4 Low C 9 2
2 Med D 3 1
1  Hi A 8 1
3  Hi A 9 1

Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:

R> dd[ order(-dd[,4], dd[,1]), ]
    b x y z
4 Low C 9 2
2 Med D 3 1
1  Hi A 8 1
3  Hi A 9 1

rather than using the name of the column (and with() for easier/more direct access).