NOTES on the complexity of Binary Search CS 3100 John Sarraille *************************************************************** Here are some things from page 135 of the second edition of Stubbs and Webre: First we have the declarations for a list data type: *************************************************************** CONST maxSize = ..... ; TYPE position = 1 .. maxSize ; count = 0 .. maxSize ; sortedList = ^listInstance ; listInstance = RECORD current : count ; n : count ; list : array [1..maxSize] OF stdElement END *************************************************************** Next comes the code for the binary search procedure, BinSearch. *************************************************************** PROCEDURE BinSearch(sl: sortedList; tkey: keyType; var lo, hi: count); {results: binary search of the sublist from positions lo to hi for an element with key value = tkey; if successful, then hi (=lo) is the position of the found element; if not, then hi (=lo) is the postion of the next smaller key value, or possibly lo-1.} VAR mid : count ; BEGIN WITH sl^ DO BEGIN WHILE lo < hi DO BEGIN mid := (lo + hi + 1) div 2 ; IF tkey < list[mid].key THEN hi := mid - 1 ELSE lo := mid END END END ; ************************************************** Finally we have the caller of BinSearch, FindKey: ************************************************** PROCEDURE FindKey(sl: sortedList; tkey: keyType; var found: boolean); VAR low, high : count ; BEGIN WITH sl^ DO BEGIN low := 0 ; high := n ; BinSearch(sl, tkey, low, high) ; IF (high <> 0) AND (tkey = list[high].key) THEN BEGIN current := high ; found : true END ELSE found : false ; END END ; ***************************************************************** If the list to be searched is ----------------------------------------------------- | | | | | | | | | | | | | | ----------------------------------------------------- 1 2 3 ................................ N then the FindKey procedure initializes low=0 and high=N. Thus the algorithm works basically like the space to search is: --------------------------------------------------------- | X | | | | | | | | | | | | | | --------------------------------------------------------- 0 1 2 3 ................................ N (not used) Now let's see what "mid" turns out to be (as calculated by BinSearch) for lists of various sizes: --------- mid = (lo + hi + 1) div 2 | |mid| = (lo + lo + 1 + 1) div 2 --------- = 2(lo + 1) div 2 lo hi = lo + 1 = hi ------------- mid = (lo + hi + 1) div 2 | |mid| | = (lo + lo + 2 + 1) div 2 ------------- = [2(lo + 1) + 1] div 2 lo hi = lo + 1 ----------------- mid = (lo + hi + 1) div 2 | | |mid| | = (lo + lo + 3 + 1) div 2 ----------------- = [2(lo + 2)] div 2 lo hi = lo + 2 --------------------- mid = (lo + hi + 1) div 2 | | |mid| | | = (lo + lo + 4 + 1) div 2 --------------------- = [2(lo + 2) + 1] div 2 lo hi = lo + 2 ------------------------- mid = (lo + hi + 1) div 2 | | | |mid| | | = (lo + lo + 5 + 1) div 2 ------------------------- = [2(lo + 3)] div 2 lo hi = lo + 3 According to the code of BinSearch, when tkey >= list[mid], we continue the search in the sublist that runs from mid to hi (inclusive). When tkey < list[mid], we continue the search in the sublist that runs from lo to mid-1 (inclusive). As we can see from our calculations above, the sublist from mid to hi is larger than the list from lo to mid-1 when the size of the list to be searched is an odd number. It should be clear to the reader that searches like: ------------------------------------- | X | | | |mid| | | | | ------------------------------------- 0 1 2 3 4 5 6 7 8 --------------------- | | |mid| | | --------------------- 4 5 6 7 8 ------------- | |mid| | ------------- 6 7 8 --------- | |mid| --------- 7 8 ----- |mid| ----- 8 are the "worst case searches" for this binary search. Such searches result in a "bigger half" sublist every time the search space is divided, until the very end, when it is no longer possible to have a "bigger half". Because these searches do the least to reduce the size of the search space, they require the most probes. In such searches, N=2^k (2 to the power k) for some positive integer k. The search space initially has effective size 2^k + 1, because low is initialized to 0. If tkey is the last element of the list, or is larger than the last element of the list, then the sizes of the successive sublists searched are: 2^k+1, 2^(k-1)+1, 2^(k-2)+1, ..., 2^1+1, 2^0+1=2. Thus k+1 probes are done in getting the size of the search space down to 1. Finally, FindKey probes the single array element one last time to check to see if it matches tkey. That brings the count to k+2 probes. Since N=2^k, we see that k+2 = log(N)+2, where the logarithm is to the base 2. It is now easy to see that ALL binary searches performed by BinSearch are O(log(N)). *****************************************************************