Skip to content

gutscdav000/fun_with_datastructures

Repository files navigation

fun_with_datastructures

  • DNA sequence alignment

    This project is a project where a memoization dynamic programming algorithm is used to achieve efficient computation. When matching 2 DNA sequences there may be 3 types of character comparisons: a match, a mismatch, and a gap. A match and mismatch are self explanitory, but a gap is where you choose to remove one of the characters from the sequence to improve the score of the DNA match. matches receive 2 points, mismatches -2 points, and gaps -1 points. these point value rules are used to build up the memoization cache composing the next value in the cache of the bordering 3 values (e.g. (x - 1, y), (x - 1, y - 1), & (x, y - 1)). Then the algorithm traces back through the cache maximizing the score to find the optimal DNA alignment. the SequenceAligner.java class contains the memoization aspect of this algorithm.

  • knn image classifier

    This project is an image classfier that uses the MNIST dataset of handwritten digits to train itself to classify handwritten numeric values. It employes a K-nearest neighbor algorithm where the majority classification of the K most similar images are used to select the image's value. This particular algorithm uses euclidean distance of the raw pixel intensities for differentiating images and it uses a heap for efficiency.

  • palatte similarity

    This project uses a hashmap to compare the RGB pixel colors in an image to that of another. This is done at various levels of quantization that can be modified by the sliding bar provided by the GUI anywhere from 8 bits per RGB channel all the way down to one. Additionally when you click and hold your cursor on a particular pixel on the screen it will highlight all of the other matching pixels on both images using the xray effect. The hash table itself has been implemented to be self-adjusting to move the previously fetched list item to the head of the list for each array index within the table, and it uses chaining as the collision resolution strategy.

  • permutations

    This exercise was a fun recursive algorithm that I implemented first in class. It uses a helper function with two accumulator parameters to find the permutations at each index as the low index gets marched down towards the end of the string until it reaches the base case where the low index has reached the last element in the string. This algorithm though simple it is quite useful, as I used it for a freelance computer forensics job for generating wordlists.

  • twitter bot project

    This project was one of my favorites where Twitter's API was used in conjunction with string matching algorithms to efficiently search twitter data for a preselected string. In addition it contains the functionality to see what people have been tweeting about given a subject string, and can return the most popular words tweeted by someone with a given twitter handle. The string matching algorithms were a naive algorithm, Knuth-Morris-Pratt (KMP), and Boyer-Moore. The niave algorithm has no optimizations. The KMP algorithm constructs a finite state automata of a search string to be found in other strings and has a worstcase linear run time. Finally the Boyer-Moore algorithm uses a nifty shift routine for substrings that are known not to match and the Galil rule which skips substring comparisons that are known to match creating a linear runtime in the worst case and sublinear in the majority of cases. In this project I found that KMP runs well on strings with a smaller alphabet, where Boyer-Moore surpasses KMP in key comparisons when the strings are longer and the alphabett size is larger. For example KMP is much better at comparing binary and hexidecimal strings, while Boyer-Moore is much better at comparing ASCII strings.

  • wire routing project

    Another one of my favorite projects! It implements a wire routing algorithm using a breadth first search where the goal is to minimize the aggregate length of all of the wires while also minimizing the run time of the program. The breadth first search algorithm is used for laying inndividual wires and uses a priority queue ordered by shortest euclidean distance as the heuristic to determine the order in which wires are laid. Then the wires are connected in order of shortest distance between start and end points using the manhattan distance.

    This was my overall favorite project from my datastructures class for many reasons. Firstly, it was so open ended that it gave you so much freedom to implement and optimize. One could choose to use Djikstra's algorithm, the breadth first search algorithm I used, or if you have some AI background modify BFS and use A* instead. Additionally, we used so many of the data structures to implement the wire routing algorithm and represent the board that it predicated a high degree of fluency with the structures we learned during the semester, as well as their associated runtimes due to the freedom to optimize. In my project I used my DoublyLinkedList structure, PriorityQueue, HashMap, and a BFS algorithm which really encapsulated the vast majority of our semester excluding string matching algorithms and sorting algorithms. Giving us the freedom to optimize this algorithm for both runtime and wirelength led to an iterative process that made us discover nuances of the implementation such as the fact that euclidian distance is better for organizing which direction to move when optimizing one wire, but manhattan is better when determining which wire to lay first. If you are interested in a deeper explanation of how I attacked this project there is a report I wrote in the Driver.java class.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages