Data Structures and Algorithms: DSAA204

ASSESSMENT BRIEF

 
Unit Code:  DSAA204
Unit Title: Data Structure and Algorithms
Type of Assessment: Assessment 4- Content Analysis (Reflective Journal)
Length/Duration:2500 words
Unit Learning Outcomes addressed:Evaluate the efficiency and effectiveness of data structures and algorithms. Demonstrate reasoning about the efficiency of algorithms. Assess and apply suitable recursive data structures and algorithms to IT systems and applications.                                                   
Submission Date:Week 14
Assessment Task:Students are required to analyse the lecture materials of Weeks 1 to 11 and create concise content analysis. This will include summaries of the theoretical concepts contained in the course lecture slides, covering definitions, analysis and examples.
Total Mark:40 Marks
Weighting:40%
     

ASSESSMENT DESCRIPTION:  

Students are required to analyse the weekly lecture material of weeks 1 to 11 and create concise content analysis summaries of the theoretical concepts contained in the course lecture slides. 

Where the lab content or information contained in technical articles from the Internet or books helps to fully describe the lecture slide content, discussion of such theoretical articles or discussion of the lab material should be included in the content analysis.

The document structure is as follows (2500 Words): 

  1. Title Page 
  2. Introduction and Background (150 words)
  3. Content analysis (reflective journals) for each week from 1 to 11 (2300 words; around 210 words per week):
    1. Theoretical Discussion 
      1. Topics Covered
      1. Definitions and/or basic plans for Data Structure/Algorithm learnt in the lecture.
      1. Mathematical Analysis
    1. Reflection on the Contents
      1. Demonstrate understanding of the data structure/algorithm learnt in the lecture by applying it on one of your own examples.
    1. Outcome 
      1. What was the most important concept learnt?  ii. How that concept (data structure/ algorithm) improves your knowledge in relation the concepts learnt in the earlier lectures.
  4. Conclusion (50 words)

Your report must include: 

  • At least five references, out of which, three references must be from academic resources.
  • Harvard Australian referencing for any sources you use. 
  • Refer to the Academic Learning Skills student guide on Referencing.

Analysis of the data structures and algorithms for weeks 1 to 11.

Algorithms are processes or sets of well-defined instructions used to solve computational problems. The study of algorithms is important in many fields, not only computer science. They play a key role in the modern world of technology, medicine, security, finance, and physics. The related field, Data Structures is the study of the means of organizing data so we can use it efficiently. Data Structures and algorithms go hand in hand because the speed and efficiency of an algorithm will depend on the specific data structure chosen for the task. In this class, we were introduced to various data structures; among them being stacks, arrays, and queues. We were also introduced to various algorithms, how to design them, how to measure their running time, the common notations used in their analysis, and how to use the notations in the calculation of an algorithm’s running time.

In week one we were introduced to algorithms and data structures. We talked about the importance of studying algorithms, two data structures called stacks and queues, and the Dijkstra two-stack algorithm.

I enjoyed learning about stacks; how they can be implemented using arrays or linked lists, the benefits of using them, the operations that can be performed on them, that is popping, peeking, and pushing. With push, we add an element to the top of the stack, with pop we remove an element from the stack, the other operation. Empty is also used to check whether there’s at least one element in the stack. We also talked about the Dijkstra two-stack algorithm which is an algorithm that uses a stack to evaluate an arithmetic operation. In the algorithm, one of the stacks is used to store numerical values while the other stack is used to store the operators. The algorithm says to read each token in the expression, if the token is a left parenthesis, ignore it, if it is a numerical value, push it into the values stack if it is a right parenthesis, compute and finish off the mini-operation we’ve already popped into the two’s stacks. 

In week two we talked about resizing arrays; how to add items to them, how to shrink them, the implementation and the cost of the operation in terms of space and time, the best case, and the worst case of the operations. We saw how the resize operation can be done on an array particularly using the java programming language. When growing an array (adding items to it) of size n, we create a new array of size 2n and copy the items into the new array then add the new elements. Adding to an array usually takes constant time but if we consider this operation, it takes a little more than constant time because we don’t have to double the array size and copy over the elements every time. In Big O notation, we tend to ignore the leading coefficient and only take the largest most significant part. If we ignore the occasional O(N) time it takes to double the array size and copy over the elements, we’d say the operation takes O (1) constant time. This is an amortized time. Another thing we were taught is queues, how to implement them using a linked list or an array in java, generics, iterators, and the java collections library.

In week three we discussed the analysis of algorithms and its importance. We saw the methods we can use to time a program; the manual and the automated way. The automated way involves saving the timestamp at the moment before running the algorithm, saving the timestamp immediately after the algorithm has completed then getting the difference. We looked at the cost of various operations and came to the observation that most primitive operations that is, variable declaration, assignment, integer comparison, and others take constant time. We saw how we can use mathematical models for order of growth among them being constant, logarithmic, linear, and quadratic. We were introduced to the Big Theta, Big Oh, and Big Omega notations used in the analysis of algorithms. We talked about bounds, the upper bound and the lower bound, the best case, average case, and worst-case situations. We also looked at the typical memory usage for java primitive types and arrays and how we can calculate the amount of memory any java object occupies by summing the amount of space each primitive property it has taken plus the padding memory around the primitives.

In week four we discussed sorting, its importance, and its applications in the real world. We talked about callbacks and how the sort () method compares data without inherently knowing information about the type of an item’s key. We were also taught how to use the Comparable interface in java to sort custom objects using the sort () method. This involves implementing the interface and passing the custom object. We also talked about various sorting algorithms among them being selection sort, insertion sort, shell sort, and shuffling. In selection sort, we start with an initial ‘minimum’ value. We iterate through the items until we find n item that is smaller than our ‘lowest’ and set it to become the new lowest and move it to the left. We then subdivide the items into ‘sorted’ and ‘unsorted’. We continue doing this; taking the minimum value from the unsorted to the sorted until the whole list is sorted. 

In insertion sort, we take the list of items and divide it into two, the ‘sorted’ and ‘unsorted’. We then move the first item of the unsorted into the sorted and compare it to the item in the sorted, if it is higher, we move it to the left. We repeat this iteration until the whole list has been sorted.

In week five we looked at the Merge Sort algorithm. It is a pretty efficient algorithm that divides the array or list of items into two halves then sorts each half and merges the two halves.

We repeat these steps and eventually, the whole array is sorted. We looked at how we can implement it in java. We also learned about assertions; how to test our functions using assert to detect logic errors in the program. The thing I liked about merge sort is that it is relatively faster than the other sorting algorithms we had already covered. Its running time is nLogn, unlike selection sort whose running time is O(n^2) or insertion sort whose running time is also O(n^2). One downside Merge sort has is the space complexity, because we are growing and shrinking arrays every time, it occupies more space in memory. It uses 6NlogN array accesses to sort an array of length N. This might be undesirable in some situations where memory is an issue. We saw how we can improve the merge sort by making it a hybrid of insertion sort or eliminating the copy to the auxiliary array. The hybrid method works well, particularly for small sub-arrays.

In week six we talked about Quicksort. Quicksort is recursive like merge sort. Quicksort randomly shuffles the array before sorting it. Quicksort uses a pivot, the pivot is an item we select as a starting point. In the final sorted array, the pivot will be in its correct position, and items on its left are smaller, items on its right are larger. The basic explanation of the algorithm is: we shuffle the array, partition it using a pivot j then sort each sub-array recursively by going from left to right as long as each item is less than or greater than the pivot placing it in its respective position. Quicksort was invented by Tony Hoare in 1959. It was refined and improved by Bob Sedgewick. The writer of the algorithms book we used in this class) in 1975 and it gained so much popularity that it became the default library sort subroutine in Unix. It is faster than merge sort and insertion sort; its best-case is O(nlogn) while in the worst case it is O(n^2). In terms of space, it is also more efficient than merge sort because it is an in-place sorting algorithm. The original Quick sort used O(n) space whereas Sedgewick’s uses O(logn). 

In week seven we looked at Heap Sort. A heap is an ordered binary tree. A max heap is a binary tree with the restriction that the values of the parent node should be greater than those of the child nodes.To begin heap sort, we begin by representing the input array as a complete binary tree. We then call build max heap to build a max heap. Having a max heap, we know the largest item (the root node), we swap the largest item with the item at the left end of the array and remove it from the tree. We call heapify again and the new largest item goes to the root node. We then take the item at the root node and swap it with the item that comes after the one we previously swapped. We heapify the array again and the largest item comes to the top of the tree, we repeat the process until we have a sorted array. In terms of time, Heap sort is relatively fast as it runs in O(nlogn) time and we end up calling it n-1 times. It is also space-efficient as it is an in-place sorting algorithm.

In week eight we discussed Binary Search Trees and Elementary Symbol tables. A symbol table is a data structure used to associate a key with a value. They are also known as maps, dictionaries, or associative arrays. Binary search trees are binary trees that contain a key-value pair in each node and for which the keys are in symmetric order. The keys are incrementally larger in the right sub-tree; that is the key of every node is larger than the key of the other node to its left. In the Java programming language, we can represent Binary Search Trees by creating a class with properties Key, Value, and Node, which are also objects with their individual properties. Searching for a node with a specific key a Binary Search Tree using recursion is as follows, we check whether the tree is empty, if empty, terminate. if the search item is equal to the one in the current node, terminate with success. if the search item key is less than the one in the current node, go left. if the search item key is more than the one in the current node, go right. Repeat recursively until we find the item we are looking for. If not, terminate with an error code.

In week nine we talked about Balanced Binary Search Trees(BBST), BBSTs are self-balancing BSTs. They adjust themselves in order to maintain a low height for faster operations such as insertions and deletions. We looked at how they can be implemented and how they can be represented as red-black BSTs. We looked at how to insert into a single 2-node, how to insert into a 2-node at the bottom, how to insert into a tree with two keys, and how to insert into a 3-node at the bottom. We also looked at how we can search for an item in it. Insertion into a 2-node involves adding a new key to a 2-node to create a 3-node. Insertion into a 3-node at bottom involves adding a new key to the 3-node which temporarily makes it a 4-node then moving the middle key in the 4-node into the parent and repeating the steps as necessary. If you reach up the tree root and it is a 4-node, split It into three 2-nodes This turned out to be the same as searching in BSTs. When searching BBSTs we ignore the color and search recursively.

BBSTs are better than BSTs because the worst case of BSTs is linear whereas the worst case of BBSTs is still logarithmic.

In week ten we looked at Hash Tables. A hash table is a data structure that works like a dictionary; it stores data in an associative manner. It uses something known as a hash function to evaluate a key and give us an integer that we can use to remap into an index in the array.

 A hash function always gives us the same value for the same key. Creating a hash function that avoids collisions is the most difficult part. To avoid collisions, we can create a linked list off an index, the linked list would hold the data that collided at said index from the hash function. This is called chaining. Retrieval would involve entering the key into the hash function then the function would spit out an index then if we have a chaining we can search through the linked-list attached to the index. We also looked at Linear probing as a way of resolving collisions. In linear probing, we use the hash function to find the index for a key. If the spot contains a value, we use the next available spot and if we reach the end of the array, we go back to the front.

In week eleven we talked about string sorts. We looked at the computational challenges that studying string sorts helps to solve for example Genomic Sequences and Information processing. Programming systems like the programs we write use sophisticated string-processing techniques in their life cycle that is in the compiler, interpreter, and assembler.

We discussed the C char data type which is an 8-bit integer that supports 7-bit ASCII. The char data type can represent at most 256 characters. The Java char data type on the other hand is a 16-bit unsigned integer. We also looked at the String data type in java, it is an immutable sequence of characters with properties such as length, and methods such as substring() and isEmpty(). We talked about string comparisons and the time complexity of the operations.

We also discussed Radix sort algorithms the first one being LSD(Least Significant Digit) Radix sort and MSD(Most Significant Digit) Radix sort which use digits to sort items. Another sorting algorithm we looked at is 3 way radix quicksort which is a combination of both radix and quicksort. It uses the digits to perform a hash then divides them into a separate list.

Reflection on the data structures and algorithmns

Based on the topics we’ve learned in this class, an application I would use a data structure for is a music player with a search function. The search function can use a Hash Table to save references to songs and artist names. Searching an artist’s name passes it to the hash function which returns an array of the artists’ songs.

            Below is the code to search this Hash Table:

Another example of data structure that would be useful in the music application is the stack. I would use a stack to record the most recently played songs; whenever a user presses back to go to previous song, it is popped from the stack. pressing forward pushes to the stack.

Outcome

The Merge-insertion sort hybrid algorithm is the best take-away from this class. Combining the two algorithms greatly reduces the cost of merging. From this we can see how thinking outside the box while optimizing an algorithm is important.

 Conclusion

This was a fun and enjoyable class and we learned so much about algorithms. I got to see the various data structures and how the choice of data structure affects the speed of a program/algorithm. I especially loved how simple day-to-day things like the English dictionary can be represented as a data structure. Algorithms are everywhere around us and algorithm optimization could mean life or death in critical real-time systems.

Sources

Cormen, T., Leiserson, C., Rivest, R. and Stein, C., 2009. Introduction to algorithms. 3rd ed. MIT press.

Sedgewick, R. and Flajolet, P., 2013. An Introduction to the Analysis of Algorithms, Second Edition. Addison-Wesley.

Wilf, H., 1994. Algorithms and complexity, Internet Edition. Book News, Inc. Portland, OR.

Sedgewick, R. and Wayne, K., 2011. Algorithms. Upper Saddle River, NJ: Addison-Wesley.

Heineman, G., Pollice, G. and Selkow, S., 2016. Algorithms in a Nutshell. Sebastopol: O’Reilly Media Inc.