From: "Dann Corbit" Newsgroups: sci.math Subject: Re: Searchin for good sorting alrorithm Date: Wed, 7 Aug 1996 09:44:55 -0700 Heap Sort (average & worst case behavior O(nlogn)) generally slower than quick sort, on average. Quick Sort (average case behavior O(nlogn) ) Worst case behavior O(n^2) Singleton's Sort - ACM Algorithm 347 (average case behavior O(nlogn) ) generally faster than quick sort. Counting Sort (average and worst case behavior O(n)) Considerations: If you have a long key or a small data set, ACM 347 will probably be fastest. If you have a short key or a huge data set, counting sort will probably be fastest. For enormous data sets, Counting sort will always be faster for some large n. If you have only one or two bits in your key, you might use Radix sort instead of counting sort. > Timir Faizullov wrote in article <3208F4AD.6862@netvision.net.il>... > In my application I need to sort ASCII (or may be UNICODE) > strings in some order, lexicographical for example. > > Can anybody jelp me to find reference to high-performance > algorithm for doing this? > > =========================================================== > Timir Faizullov > > artel@netvision.net.il > ============================================================================== From: dak@neuroinformatik.ruhr-uni-bochum.de (David Kastrup) Newsgroups: sci.math Subject: Re: Searchin for good sorting alrorithm Date: 08 Aug 1996 12:08:24 +0200 In article <01bb8480.e4693100$fbe3369d@v-cnadc1> "Dann Corbit" writes: Heap Sort (average & worst case behavior O(nlogn)) generally slower than quick sort, on average. Quick Sort (average case behavior O(nlogn) ) Worst case behavior O(n^2) Singleton's Sort - ACM Algorithm 347 (average case behavior O(nlogn) ) generally faster than quick sort. Counting Sort (average and worst case behavior O(n)) You forgot heap sort: average *and* worst case behaviour O(n log n), perfectly suited for linked list sorting (good for variable sized data). A pity most implementations are suboptimal, but you could try this one: --- Cut here --- This sort routine of mine sorts linked lists in place (only pointers are copied, not elements itself) and has the following advantages over quicksort: sort is stable, order of elements comparing as equal is preserved. worst case comparisons are merely ceil(lg n) n - 2^(ceil(lg n)) + 1, which for powers of 2 becomes 1-n+n lg n, which beats quicksort's average cases for bigger n. average case is only slightly better than worst case, however, already sorted lists (reverse or normal) are done with about half the comparisons. recursion depth is absolutely limited to lg n. no additional space or data structures are needed as for quicksort. In addition, the routine has very little organizational overhead in addition to the comparisons and the link adjustments, and so is *really* fast. Besides, it is so simple that it will fit most insctruction caches. Technique is basically a mergesort, where traversal to the second sublist is done on the fly while sorting the first sublist. Details are in the code. ---- Follows listsort.h /* Include only if you use listsort.c as standalone sort binary. Otherwise include listsort.c itself. See description in listsort.c */ extern int listlink; extern int (*listcompare)(void *p1, void *p2); extern void **listsort(void **head, unsigned n); ---- Follows listsort.c /* listsort.c * Copyright (c) 1992 David Kastrup, Goethestra"se~20/22, D-52064~Aachen, * Germany * You are allowed to use this software in any form, even * in commercial software, as long as you do not restrain the right of * those using your software to obtain this code. That is, you must inform * your customer that this piece of code is in your program, and must provide * the unmodified source to him at request, at not more than a moderate * copying charge. You can save yourself this work if you include this in * source in your distribution. It is small enough. * * Other than that, you are free to use this software at will. */ /* Usage: As an include file: Define listptr via typedef as a pointer to the data structure you want to sort. Define getlink(adr) as a mechanism to fetch the link of the pointer adr, for example define getlink(adr) ((adr)->next) Define listleq(p1,p2) as a comparison operator returning true if the data to p1 compares as less than or equal to the data to p2. Something like define listleq(p1,p2) ((p1)->data <= (p2)->data) THEN include listsort.c As standalone: compile listsort.c as a standalone routine. include listsort.h in your application. Define a comparison routine with two void pointers as arguments, returning a value <= 0 for less than or equal comparison. Put a pointer to it in listcompare. Put the offset of the link-field of your data-structure into listlink. Both: Call listsort with a pointer to your head-pointer of the list. Second argument is the number of items in the list you want to habe sorted. The rest of the list is left intact. listsort returns a pointer to the new head-pointer of the rest of the list. */ /* listptr must be defined to be a pointer to an element in the list. If it is not defined by the user, it is defined here as void *, thus making possible a generic listsort routine. getlink(adr) must return a pointer to the element following the one pointed to by adr. getlink(adr) must return an lvalue. If the user does not define such a mechanism, this file uses a global integer listlink in order to determine the byte offset of the linkfield in the structure pointed to by adr. You must either define both listptr and getadr, or none. */ #ifndef getlink typedef void *listptr; int listlink; #define getlink(adr) (*(void**)((char*)(adr)+listlink)) #endif /* listleq is a comparison operator checking for the first argument of type listptr comparing as less or equal to the second argument. If you do not define such a boolean macro, this file defines it as a call via the global pointer to such a comparison routine listcompare */ #ifndef listleq int (*listcompare)(listptr p1, listptr p2); #define listleq(p1,p2) ((*listcompare)(p1,p2)<=0) #endif /* The sort routine. Arguments are a pointer to the head pointer of a list to be sorted, as well as the number of elements to sort. Only n elements will be sorted, the rest of the list will not be disturbed. listsort returns a pointer to the head pointer of the rest of the list, located in the last element of the sorted part of the list. Thus if listsort calls itself recursively to sort the first half of a list, this call returns the head pointer of the second half to be sorted, list traversal thus being done on the fly. */ listptr *listsort(listptr *head, unsigned n) { register listptr p1, p2; listptr *h2, *t2; unsigned m; switch (n) { case 0: return head; /* The trivial case of 0 was included, so that you may say for any accumulated list of n elements that is not yet NULL-ended something like: *listsort(&head, n) = NULL; even if the list is yet empty. */ case 1: return &getlink(*head); /* Sorting one element must be provided, or recursion will fail. This is still sort of trivial */ case 2: p2 = getlink(p1 = *head); /* p1 points now to first element, p2 to second */ if (listleq(p1, p2)) return &getlink(p2); /* if they were in order, return the tail link of the second */ getlink(p1) = getlink(*head=p2); /* let head point to the second, and the first to the tail of the second */ return &getlink(getlink(p2) = p1); /* and let the second point to the first, returning the taillink of the first as tail */ /* Sorting two elements is provided for efficiency reasons. You could provide more cases fixed-coded as well, but test them out completely: they should preserve order of equal elements! AND they should work cleanly. And if you provide too much cases, chances are that you LOSE efficiency because the gains do not outweigh the disadvantage that the code does no longer fit in the processors code cache. */ } /* Sorry that the default case appears outside of the switch. */ n -= m = n / 2; /* n now has length of first sublist, m of second one */ t2 = listsort(h2 = listsort(head, n), m); /* first n elements are sorted in *head, remaining m elements in *h2, rest of list hangs at *t2 */ if (listleq(p1 = *head, p2 = *h2)) { do { if (!--n) return *h2 = p2, t2; } while (listleq(p1=*(head=&getlink(p1)), p2)); } /* The above caters efficiently for the condition that some or all of the first sublist may be smaller than the second sublist */ /* The rest does a straight merge on the rest, starting with the inclusion of the first element of the second sublist which has tested as being smaller than the rest of the first sublist. */ for (;;) { *head = p2; do { if (!--m) return *h2 = *t2, *t2 = p1, h2; } while (!listleq(p1, p2=*(head=&getlink(p2)))); *head = p1; do { if (!--n) return *h2 = p2, t2; } while (listleq(p1=*(head=&getlink(p1)), p2)); } } -- David Kastrup Institut fuer Neuroinformatik, Ruhr-Universitaet Bochum Email: dak@neuroinformatik.ruhr-uni-bochum.de Telephon: +49-234-700-5570 ============================================================================== From: "Daniel Giaimo" Newsgroups: sci.math Subject: Re: Quicker!?sort Date: Fri, 13 Mar 1998 20:30:00 -0800 tric@yyy.or.jp wrote in message <6e552t$nnj$1@nnrp1.dejanews.com>... >If we sort the data clumsily, we must compare two data n(n+1)/2 times to >check their order (where n is the total of the data). Taking some smarter >methods for sort, we can reduce the order of conparisons into n log n. Now >this method can reduce the order of comparisons into almost n, namely this >sort is quicker than any other sorts which have the order of comparisons n >log n such as bubble sort or quick sort! ---really? There is a well-known set of sorting algorithms called radix sorts that have worst-case performance O(n). The problem with these sorts is that it is extremely difficult to write an efficient, general-purpose radix sort, and even when you have, it still isn't faster than quicksort unless you have enormous arrays to sort. Are you saying that you've found a way to make a *general-purpose* sort that has worst-case performance O(n) that is faster than quicksort? If so, I'd like to see your algorithm. Also, bubble sort and quicksort both have worst-case performance O(n^2), and bubble sort has average case performance O(n^2). The only sorting algorithms that I know of that have O(n*log(n)) worst-case performance are treesort and heapsort. -- --Daniel Giaimo Remove nospam. from my address to e-mail me. | rgiaimo@(nospam.)ix.netcom.com ^^^^^^^^^<-(Remove) -------------------------------------------------------------------------------- "In a race between a rock and a pig, don't varnish your clams." --A Wise Elbonian ============================================================================== From: David Kastrup Newsgroups: sci.math Subject: Re: Quicker!?sort Date: 16 Mar 1998 10:14:43 +0100 "Daniel Giaimo" writes: > Also, bubble sort and quicksort both have worst-case performance > O(n^2), and bubble sort has average case performance O(n^2). The > only sorting algorithms that I know of that have O(n*log(n)) > worst-case performance are treesort and heapsort. You forgot mergesort, the natural choice of a sort program for linked lists. It's O(n log n) in comparisons as well (worst case), best case of about half of that for basically presorted data, no data needs to be copied (only pointer changes) and you can combine sorting one sublist with scanning for the head of the next sublist, making the stuff pretty efficient. And stack depth is strictly limited to O(log n). With a bit of housekeeping logic you can even sort without explicit recursion. I have a dead-simple implementation of it that I use that beats most library qsorts. -- David Kastrup Phone: +49-234-700-5570 Email: dak@neuroinformatik.ruhr-uni-bochum.de Fax: +49-234-709-4209 Institut für Neuroinformatik, Universitätsstr. 150, 44780 Bochum, Germany