What is algorithm notation

Randomized Algorithms

Transcript

1 Randomized Algorithms Mentored Work Sabrina Wiedersheim Supervision: Prof Dr J Hromkovic Zurich, 13 July 008

2Notation of algorithms (if and for loops, input, output,) and first experience of runtime analyzes (O (n), O (log (n)), O (n),) are advantageous, but can also be circumvented Analysis: The notation of sums using the summation sign should be known to the students Knowing the harmonic number H n would be an advantage The topics of sorting algorithms and matrices are introduced in this thesis, but only very briefly. It is therefore advantageous if the students already have certain knowledge or if they receive additional support in these introductory sections

3 TABLE OF CONTENTS 1 Table of contents 1 Introduction to the code breaker 3 3 Quicksort 5 31 Sorting algorithms 5 3 Deterministic quicksort algorithm 8 33 Randomized quicksort algorithm 1 4 Matrix multiplication test 41 Introduction / repetition of matrices 4 Verification of matrix multiplications 9

4 1 INTRODUCTION 1 Introduction What is an algorithm? An algorithm is a process that gives us (or the computer) step-by-step instructions on what to do. Such instructions do not only exist in mathematics and computer science, but we also encounter them constantly in everyday life, be it step by step Installation instructions for the new computer game, the baking recipe for grandmother's favorite cake or the construction instructions for father's office shelf - everywhere we are confronted with instructions that explain step-by-step how to achieve a certain goal can be formulated in different detail! What happens if certain decisions and steps are chosen at random in such procedures? - When you no longer prescribe the entire procedure exactly from start to finish, but let chance rule in between? Incorporating a random choice between 100g sugar or 100g salt in a cake recipe would probably not be an advantageous idea.

5 CODEKNACKER 3 Code breaker In this section we will see a first example of a randomized procedure and its advantages. We consider the following situation: Anna writes her diary on the computer and her little brother Tim wants to spy on her, but Anna is aware of this, protecting her her document with a 4-digit numeric password Exercise 1 (a) How many possible such 4-digit passwords with the digits {0, 1 ,,, 9} can Anna choose from? (b) Assuming the probability of choosing your password is the same for every possible password. What is the probability that Tim will find your password if he only has three attempts? Now the password protection of Anna's Word document is not quite as good as that of an ATM. There is no automatic lock after a certain number of attempts. So Tim can try any number of passwords. How does he have to proceed to crack Anna's code safely? Since he does not know any preferences for Anna's passwords, he has no choice but to try out one password after the other. And so that he does not forget a password by mistake, he has to decide on a certain order, such as: 0000, 0001, 000, 0003,, 9999 Exercise Assuming Anna's password is as in exercise 1 (b), it happens that Tim has enough time and is very patient: On Wednesday afternoon he sits in front of Anna's computer for 4 hours and enters passwords. He can do 0 every minute Testing passwords What is the probability that he will have cracked Anna's code in 4 hours? But now Anna knows her little brother well. She knows how Tim will proceed to find out her password. So she no longer chooses her password randomly, but just enters the last password from Tim's order. The unsuspecting Tim has no chance anymore. He will be given Sitting in front of the computer for 4 hours Because the probability that he will crack the password is 0 in this case (he cannot test all codes in the 4 hours and thus never gets to the last password in his sequence) Can we help Tim? What's his problem? Tim has a rather limited approach. Before he starts testing, he knows exactly in which order he will go through the passwords

6 CODEKNACKER 4 becomes and thus enables Anna (opponent) to make the task as difficult as possible. His procedure is deterministic (before the start of the procedure, it is clear how it will run) Tim's possible solution is to build in chance: every time he is behind Anna's password goes, should he randomly put together a new sequence of all possible sequences and test the passwords in this way Task 3 (a) From how many possible password sequences can Tim choose one at random? (b) What is the probability that any password, such as 593, is the first password in its sequence? And what is the probability that it will be the last? (c) Tim is working on Anna's computer for 4 hours again (0 passwords per minute) Anna chose the password 9999. What is the probability that Tim will find her password? How big would it be if Anna had chosen 0000? With this built-in chance, Tim can turn the tables again Anna can no longer see through his approach The probability that Tim will find her password in the 4 hours is again the same as in the case when Anna chose her code randomly and Tim tested the password according to a certain sequence This method of randomization is also called outsmarting the opponent. With a deterministic solution method / algorithm, an opponent can make the problem difficult for you. After the randomization, however, the opponent can no longer see through the procedure! We shall see another such example in the next chapter

7 3 QUICKSORT 5 3 Quicksort 31 Sorting algorithms The sorting of elements happens again and again in our everyday life: It ranges from sorting cards in hand when playing in the afternoon to sorting receipts and invoices for accounting The sorting of data records also plays a major role in IT It gives you an overview of a large amount of data and also enables you to quickly find data. What use would a telephone book be if these numbers weren't sorted by place and name? For practical applications it is of course important to have methods that can sort large amounts of data as quickly as possible.In this chapter we will deal with such methods (sorting algorithms) and optimize their runtime. Definition 1 A sorting algorithm is a method that creates a list of Sorting elements from a set It is important that this set has an order (eg a ranking (numbers), an alphabetical order)! The input of a sorting algorithm is therefore a mixed list of elements that have an order. The sorting algorithm calculates this order and outputs a sorted list of elements. So that our lists of elements are not too complex, we limit ourselves to lists of natural numbers in the following All other possible lists, such as playing cards or addresses, can be traced back to lists of natural numbers. Here is an example for the input and output of a sorting algorithm: A: Sorting algorithm B: Figure 1: Example for input and output of a sorting algorithm

8 3 QUICKSORT 6 Exercise 4 The following list of elements is given: a = [] (a) What does the sorted list look like? (b) How did you go about determining this sorted list? (c) Write an algorithm that determines the smallest element from a list a = [a 1, a ,, a 10]. If you have solved this first problem correctly, you will have noticed that it is not possible to find a smallest number out of 10 to determine any number without going through all 10 numbers at least once. The fastest algorithm goes through all numbers exactly once: First you choose the first number in the list as a candidate for the smallest number Then you go through all the elements of the list As soon as you find a smaller number this becomes the new smallest number Until you really have the smallest of all 10 numbers at the end of the 10 numbers Number in the list = 1 candidate 7 is the first number in the list that is less than 3 6 is the first number that is even less than 7 And because you won't find a smaller number even when going through the remaining numbers, 6 has to be the smallest! Figure: List of numbers with 3 candidates for a smallest number Exercise 5 a 1 = [] a = [] a 3 = [] (a) How many numbers are candidates for the smallest numbers during the execution of the algorithm described above? (b) How many such candidates can a list of n numbers have at most? And how many does it have to be at least? (write down an example for both cases) (c) write down an example of a list that has exactly two candidates for the smallest numbers and where the 0 is in the 5th position

9 3 QUICKSORT 7 Exercise 6 The numbers {0, 1 ,, 3, 4, 5, 6, 7, 8, 9} are placed in a random order Each order is equally likely (a) How many possible orders are there? (b) How many orders are there with a 0 in the first position? (c) What is the probability that the algorithm mentioned above only needs exactly one candidate for a smallest number? (d) What is the probability that he needs 10 candidates?

10

11

12 3 QUICKSORT 10 Exercise 9 (a) Go through the quicksort algorithm for a 1 = [] a = [] and sketch the tree representation of the sequence (b) Which numbers are compared with the number 4 when a 1 is run through? And which ones are compared to the number 7? (c) How many comparisons does Quicksort make in these two examples a 1 and a in total? (d) In addition, sketch the tree representation of the flow of Quicksort for a 3 = [] Why are the numbers 3 and 5 compared with each other for a 3 and not for a 1? Are there good and bad lists for quicksort? As you may have already noticed in Exercise 6, the quicksort algorithm needed a lot more comparisons for sorting a than for sorting a 1 In general, the quicksort algorithm is fastest if and only if the two sublists it compares with If, however, the list of n numbers has already been sorted, then one list of the deterministic quicksort algorithm does not contain a single number when splitting and the other does not contain a single number except for the pivot.In this bad case, the deterministic quicksort does how we'll do the math in a moment, a lot more comparisons! In other words, the algorithm runs much slower in the bad case.Let's examine these two cases a little more closely: In a good case for Quicksort, the lengths of the sublists are always at least halved from level to level (for illustration see Figure 4) In other words, in the first A sub-list has a maximum of n elements on the second level, and a maximum of n elements in the second level, and a maximum of n elements in the k-th level. How many levels does Quicksort need in this case, k until it only has sublists of length one? To calculate this, we set the maximum number of elements in the kth level equal to 1 and solve this equation for k: n k = 1 n = k k = log (n)

13 3 QUICKSORT 11 The deterministic quicksort algorithm therefore never needs more than log (n) levels in a good case!

14 3 QUICKSORT 1 at the runtime of the simpler first sorting algorithm from the last section, because for large n, n (n 1) n On average, however, the deterministic quicksort algorithm always needs a number of comparisons, which for large lists behave like nlog (n) Man also says the deterministic quicksort algorithm has an average running time of O (nlog (n)) (ie at most of the order nlog (n)) but more on this in the next section 33 Randomized quicksort algorithm In the last section we dealt with the bad case of the deterministic Quicksort set apart If you give this deterministic quicksort algorithm a sorted list as input, it does not run faster than the simple first sorting algorithm! This means that someone who aims to sabotage the deterministic quicksort algorithm knows exactly that one only has to enter a sorted list. We want to improve this now! An algorithm that calculates the longest when entering the desired result cannot be our end result. The idea of ​​improving the algorithm, like with the code breaker, is to incorporate chance! Instead of always choosing the first element of the list as the pivot element, we now select the pivot element at random again. This modification of the algorithm does not cost us much! We only have to change one line: Algorithm 3: randomquicksort Input: List a = [a 1 ,, a n] with n different numbers from N if n = 1 then (return a); r: = random number from {1 ,, n}; pivot: = a r; z 1: = 1; z: = 1; (Counting variables) for (i: = to n) do if a i pivot then r z: = a i; z = z + 1; end return detquicksort (l); return pivot; return detquicksort (r); If the pivot elements are chosen randomly, the number of comparisons for sorting no longer depends on the input list! A sorted list

15 3 QUICKSORT 13 can in this case be sorted just as quickly as any other list. So an opponent of the algorithm no longer has an advantage! Regardless of the order in which he enters the elements in the list, chance determines the duration of the runtime.It can happen that the randomized algorithm makes very unfavorable choices of the pivot element, so that it first selects the largest number as the pivot, then the second largest, and so on. You will take a closer look at this case in the next exercise. The main objective of this chapter will be to calculate the expected runtime of the randomized quicksort algorithm! How many comparisons will this algorithm take on average? Before that, however, a few general tasks on randomized quicksort, which you should look at as a warm-up for this topic: Task 10 The input of the rand quicksort is an unsorted list with the numbers {1 ,,, 10} What is the probability that the worst case occurs? That means with what probability does the algorithm choose the pivot elements in sorted order (i.e. either or)? Exercise 11 The rand quicksort algorithm receives as an input list: [] What is the probability of the following sorting sequence drawn in the tree representation? Figure 5: Tree representation of the process Tip: What is the probability that the number 4 at the beginning of the 10-element list will be chosen as the pivot? And what is the probability that the number 3 in the smaller 3-element sublist will be chosen as the pivot?

16 3 QUICKSORT 14 So first calculate all the probabilities of these pivot choices for the respective sublists. What do you have to do with these individual probabilities to get the total probability you are looking for? Exercise 1 Let the input list be the same as in Exercise 8 Find a sequence of the randomized quicksort that occurs with the highest possible probability and draw the sequence in the tree diagram Runtime analysis of the randomized quicksort We now want to show that the average runtime of the randomized quicksort depends on the The order of magnitude is O (nlog (n)) (like the good case of the deterministic quicksort). For this we define the random variable X, which stands for the number of comparisons when the randomized quicksort is run. The goal is now to calculate E [X] , the expected number of comparisons. The main idea for the whole will be the following sub-problem: What is the probability of two specific numbers in the input list being compared with one another during the calculation of the algorithm? Work on the following exercise as the first step in solving this problem: Exercise 13 The unsorted list [] is given. The randomized quicksort selects the following random pivot elements in the order: 4, 7, 5 Draw the tree of the process and fill in the white fields in the table : Go through all pairs of numbers (i, j) with ij and think about X ij X ij X 1 X 13 X 14 X 15 X 16 X 3 X 4 X 5 XX 17 X 7 X 34 X 35 X 36 X 37 X 45 X 46 X 47 X 56 X 57 X 67 Figure 6: Which pairs of numbers are compared with each other?

17 3 QUICKSORT 15 whether these two numbers i and j were ever compared with each other during sorting If this is the case write a 1 in the table at the appropriate place (X ij) If the two numbers are never compared with each other, then write a 0 Generally applies So in this notation from Exercise 10 for i, j with i

18 3 QUICKSORT 16 Exercise 14 The following unsorted list is given to the rand quicksort as input: [] (a) In which choices of the first pivot element is it clear that the numbers and 7 are never compared with each other? (b) How big is the probability that after the random choice of the first pivot element it is already clear that and 7 will never be compared with each other? Exercise 15 Still looking at the same unsorted list as in the last exercise. The randomized quicksort selects the following pivot elements in the given order: (i) 8, 4, 7, 5, (ii) 9, 1, 7, 5, 3 In In which of the cases described above are the two numbers and 7 compared? So if we want to calculate the probability that two certain numbers i and j (i

19 3 QUICKSORT 17 Sentence 1 Let i and j be two numbers (1 i

20 3 QUICKSORT 18 With the linearity of the expected value (ie with the formula: E [aX + by] = ae [x] + be [y]) the following applies: E [X] = E [X ij] {according to remark 1} = = = = 1 i

21 3 QUICKSORT 19 (1,) (1,3) (1,3) (1, n -) (1, n-1) (1, n) d = n-1 occurs 1 time (, 3) ( , 4) (, n-1) (, n) d = n- occurs times (3,4) (3, n) d = n-3 occurs 3 times (n-3, n) d = 3 occurs n-3 times before (n-, n) d = occurs n- times (n-1, n) d = 1 occurs n-1 times Figure 8: Rewriting the sum From this figure we can see that it is exactly there is a pair that has the large distance n 1 Pairs with distance n already exist and for the distance n 3 there are 3 pairs The smaller you make the distance, the more such number pairs (i, j) you can find the whole thing a little more precisely so you can see that the number of pairs of numbers at the distance d is just nd! If we want to rewrite the sum 1 i

22 3 QUICKSORT 0 Thus: 1 i

23 3 QUICKSORT 1 Number H n as ln (n) grows for increasing n That means H n is of the order of magnitude O (log (n)) So our expected number of comparisons that the randomized quicksort makes is also of the Order of magnitude O (log (n)) And it follows: Theorem 3 The randomized quicksort algorithm has an average running time of the order of magnitude O (n log (n)) Exercise 18 Let [] be an input list for the rand quicksort and let Y be a random variable which for the number of comparisons of the rand algorithm of the number 4 with any other number Calculate E [Y] Tip: Write the random variable Y again with the help of random variables X ij

24 4 MATRIX MULTIPLICATION TEST 4 Matrix multiplication test 41 Introduction / repetition matrices What is a matrix? The Matrix not only describes a well-known science fiction film, but also an important basic concept in mathematics!

25 4 MATRIX MULTIPLICATION TEST 3 Exercise 0 You get the following matrix: A = (a) What type of matrix is ​​this? (?? - matrix) (b) Determine the entries: a 11, a 4 and a 35 (c) Which entries are in the 3rd row and which in the column? Matrices that have the same number of columns as rows are also called square matrices (n n matrices) For example, matrix B at the very beginning of this section was such a matrix, namely a square matrix of the size (matrix) also its own Names are given to matrices that have only one row or only one column: They are called vectors. For us, matrix C was a vector of size 3 (a vector with 3 entries). Now we have remained on a very theoretical and formal level of Matrices Can we already draw a connection to our daily life with them? As the following example shows, we can rewrite tables from our everyday life in a matrix: Report grade table of class 4b Martina Julian Ralph Tanja Wanda Linda Roger Daniel Kathrin Lukas Michael Dt Egl Math Phs Bio Gg A = Figure 9: From the table to the matrix This is all but furthermore not interesting! Strictly speaking, it is just a slight change of notation to give the matrices an advantage over the normal ones

26 4 MATRIX MULTIPLICATION TEST Obtaining 4 tables takes more! We want to take a closer look at the addition and multiplication of matrices: Adding matrices In order to be able to add two matrices, they must be exactly the same size! If this is the case, take the entries that are in the same place (same row and same column) and add these two numbers. The sum of two matrices is again a matrix of the same size Definition 3 If A and B are two m n-matrices, the following applies: a 11 a 1n b 11 b 1n A + B = + a m1 a mn b m1 b mn a 11 + b 11 a 1n + b 1n = a m1 + b m1 a mn + b mn exercise 1 On a sports day, five groups A, B, C, D, E of 4 players each are formed.Three items must be carried out, in each of which you can achieve up to 10 points.The tables below show the results of the individual players. Write the three Tables in matrices P, K and R and calculate T = P + K + R What do the entries of the matrix T indicate? Shoot bow & arrow Player 1 Player Player 3 Player 4 ABCDE Kickboardparcour A Player 1 10 Player 10 Player 3 7 Player 4 6 BCDEA Player 1 7 Player 4 Player 3 9 Player 4 8 Ring Throwing BCDE Figure 10: Scoring table of sports day Task The following matrices are given: 4 () () A =, B =, C =

27

28

29 4 MATRIX MULTIPLICATION TEST 7 Exercise 4 Calculate: () () (a) () (b) We would now like to go into a little more specifically about vectors As already mentioned, these are matrices that consist of only one column (column vectors) or only one row ( Row vectors) So that the vectors can be clearly delimited from the rest of the matrices, we will denote them with lowercase letters from now on: u = (u 1 un) row vector, v = v 1 vn column vector If a matrix is ​​multiplied by a vector or vice versa, see below if at all possible, this results in another vector. Investigate this using the following exercise: Exercise 5 Given a 3 3-matrix A, a row vector u of size 3 and a column vector v of the same size: A = 3 0 u = (1 1) 1 v = compute, if possible: (a) v A (b) A v (c) u A (d) A u Exercise 6 Let T be the matrix from the sports day exercise (exercise 1) Compute: () T Interpret the Entries of this matrix!

30

31 4 MATRIX MULTIPLICATION TEST 9 4 Verification of matrix multiplications As we found out in the last section, the multiplication of two matrices is time-consuming You have to calculate many partial products of the matrix entries and then add them up How can we skilfully check whether our result is correct?

So that the boss can be sure that all entries of the calculated matrix C are correct, he has no choice but to recalculate all entries on the deterministic route, which means he would have to do the entire work of the accountant again Can we improve this by incorporating randomness? Is there a randomized process for the boss in which he has to calculate less? Let us first consider how much the accountant has calculated: Exercise 9 The following matrix multiplications are given: () () (a) (b) How many multiplications have to be calculated in order to determine these matrix products? (The product matrix does not necessarily have to be determined for this) So that the considerations with the stationery problem do not become unnecessarily complicated for us, we assume for the further considerations that the three variables m, n and l (number of companies, articles and ordering options) are all the same are large (= n) The accountant had to calculate nn = n matrix entries for the product matrix C And since he has for each entry

33An algorithm that, depending on the case, still supports the accountant and also claims that the result is correct without this being the case? The answer is: yes! There is a relevant difference between the algorithm and the accountant: If you let the accountant do his calculation several times, it may well be that he makes the same mistake every time, or even arbitrarily changes the same matrix entry. The algorithm, on the other hand, runs completely randomly and equally distributed! The