Theses/Dissertations - Computer Science

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 24
  • Item
    Static analysis-based software architecture reconstruction and its applications in microservices.
    (2021-08-10) Das, Dipta, 1993-; Cerny, Tomas, 1979-
    Microservice architecture (MSA) is the predominant building block of modern cloud-based enterprise applications. MSA has several advantages over monolithic applications like scalability and maintainability, but it comes with some downsides. Modern enterprise applications consist of hundreds of individual microservices and lack a unified view. Due to this lack of unified view and distributed nature, security and anomaly assessments are harder to automate for MSA. Software Architecture Reconstruction (SAR) can be used to construct a centralized perspective for MSA. This thesis proposes an approach to automate the process of SAR using static code analysis. Also, we extend SAR for containerized microservices which are typically deployed and managed using dedicated orchestration tools like Kubernetes. In addition, we demonstrate two applications of SAR in MSA: Role-Based Access Control (RBAC) inconsistency detection and code smell detection. Finally, we verify our approach through case studies on two real-world benchmark projects.
  • Item
    On the performance of convolutional neural networks initialized with Gabor filters.
    (2021-08-05) Rai, Mehang, 1996-; Rivas Perea, Pablo, 1980-
    Over the years, image recognition has been gaining popularity due to its various possible usages. Convolutional Neural Networks (CNNs) have been the classic approach taken on by many researchers because of their capability to learn through the parameter space given a sufficient amount of representative data. When observing a fully trained CNN, researchers have found that the pattern on the kernel filters (convolution window) of the receptive convolutional layer closely resembles the Gabor filters. Gabor filters have existed for a long time, and researchers have been using them for texture analysis. Given the nature and purpose of the receptive layer of CNN, Gabor filters could act as a suitable replacement strategy for the randomly initialized kernels of the receptive layer in CNN, which could potentially boost the performance without any regard to the nature of the dataset. The findings in this thesis show that when low-level kernel filters are initialized with Gabor filters, there is a boost in accuracy, Area Under ROC (Receiver Operating Characteristic) Curve (AUC), minimum loss, and speed in some cases based on the complexity of the dataset.
  • Item
    Unsupervised representational learning of hierarchical graphs with graph convolutional networks.
    (2021-04-30) Boadu, Frimpong, 1996-; Baker, Erich J.
    Interpretation of functional genomic data attempts to correlate gene and protein expression with phenotype. While direct analysis of gene relationships associated with phenotype manifestation provides reasonable correlations, their use for direct assessment often fails to capture numerous complex biological phenomena. As a result, tools that cluster sets of genes based solely on set membership often fail to capture knowledge about contextual relationships of individual gene sets. To leverage large scale gene set relationships and gene set metadata to increase accuracy, we developed an approach that uses the network relationship of genes to extract system-level relationships. We employed an approach that segregates an empirically derived gene-gene network graph using deep learning to encode the network structure into low-dimensional embeddings. By applying graph neural network approaches we can generate lower dimensional embeddings that more accurately identify sets of genes related to biological traits of interest. Using well-adopted metadata over-representation techniques, we further demonstrate that our approach produces drastically different results when compared to direct set comparison methods, and more accurate results when subjected to manual analysis.
  • Item
    Tyro : a first step towards automatically generating parallel programs from sequential programs.
    (2020-11-03) Sanjel, Arun, 1995-; Speegle, Gregory David.
    Currently, MapReduce is used as the standard for automatic parallelization of programs. However, MapReduce restricts programs to a simple framework with limited parallelism but still requires the user to understand parallelism within the framework. In this thesis, we present Tyro, a new tool that automatically translates a sequential Python program into a parallel PySpark program. Tyro identifies potential code fragments where parallelism can be done and translates them. It uses Abstract Syntax Trees (AST) for fragment detection and gradual program synthesis to convert the Python operations into PySpark operations. Tyro also verifies the generated code against given user test cases. We evaluated Tyro by automatically converting different real world sequential Python programs into PySpark programs. The resulting PySpark programs perform up to 9x faster (on 9 parallel machines) compared to the original. The promising result of Tyro against these benchmarks shows how Tyro can utilize gradual synthesis and operation translation to go beyond MapReduce with automatic parallelization.
  • Item
    Optimizing k-means clustering using mini-batches and distance bounds.
    (2020-10-22) Shrestha Khimbaja, Sumit, 1994-; Hamerly, Gregory James, 1977-
    Clustering is a crucial branch of machine learning that groups the input data into different clusters based on the features of the data without the training label. K-means clustering is a widely used iterative clustering technique that groups data into k clusters by repeatedly minimizing a criterion function. The standard k-means algorithm, Lloyd’s algorithm, is simple and easy to implement the algorithm. But it does perform a lot of unnecessary work while computing the distance between the data points and the centers. This increases the runtime of the algorithm and makes it unsuitable for real-life applications. Previous works on k-means optimizations have used the triangle inequality law to generate geometric bounds for the data points which are then used to prevent unnecessary distance computations. Another approach to accelerate the k-means algorithm is to use only using a subset of data instead of the entire data in a single iteration. Due to the reduction in data size used for computation, this method also generates faster clustering results. In this thesis, we are proposing an algorithm that accelerates the k-means algorithm by using geometric bounds on mini-batches. We are combining the triangle inequality based optimization technique with the mini-batch approach by running the k-means algorithm in minibatches for multiple iterations and using the geometric bounds within the mini-batch to reduce the distance computations. The results show that there is a large speedup over the existing optimization techniques in terms of runtime while producing good quality clusters.
  • Item
    Investigating rhythmic accuracy using 3D spatial interaction for digital musical input : MoveMIDI.
    (2020-07-19) Arterbury, Timothy, 1996-; Poor, G. Michael.
    Many human-computer interfaces exist which allow users to interact with music software to create and perform music. Some interfaces allow users to interact using movements of their body to create music. This thesis describes a form of movement interaction called 3D spatial interaction and evaluates its application for human control of music software by utilizing the MoveMIDI prototype and conceptual framework. MoveMIDI interprets a user's positional body movement relative to its virtual 3D environment to control music software. In a user study, the usability of the initial MoveMIDI prototype as a rhythmic input device was evaluated by measuring rhythmic accuracy of participants using the prototype, a drum interface with sticks, and a finger-drumming interface. The study revealed initial spatial unsureness of participants using MoveMIDI due to visualization issues and lack of haptic feedback. These issues prompted the creation of a follow-up prototype using head-mounted display 3D visualization and haptic feedback.
  • Item
    Force touch gesture based interaction for virtual keyboards.
    (2018-04-20) Zhunussov, Kuanysh, 1994-; Poor, G. Michael.
    With extreme popularity of touch screen mobile devices, the demand for effective one-handed text-entry on virtual keyboards is continually growing. To increase text-entry speed and decrease error rate, this thesis proposes force touch gesture based suggestion selection for virtual keyboards and analyzes the performance compared to standard keying and swyping keyboards. The prototype, called OctoType, was built for iOS smartphones to take advantage of the 3D-touch technology by implementing an interaction based on OctoPocus, a dynamic gesture guide. Two users studies were conducted to evaluate performance of OctoType. The results showed that OctoType outperforms standard keying keyboards by 13.1% in terms of text-entry speed. Moreover, OctoType registered force touch gestures with an accuracy of greater than 97%. Given these advantages of force touch gesture based interaction, this thesis also introduces the framework, called Gesturizer, that provides a conflict-free gesture interaction with arbitrary shapes to iOS applications running on devices with 3D-touch.
  • Item
    Alternatives using the Leap Motion to extend mid-air word-gesture keyboards.
    (2015-12-14) Benoit, Garrett.; Poor, G. Michael.
    Lately, the use of touchless, mid-air, gesture-based interactions has increased significantly due to the popularity of augmented and virtual reality and advances in other industries (e.g., medicine, gaming), and with this wide-spread application comes the need for effecient, mid-air text-entry. Word-gesture keyboards have garnered attention in recent years, now coming standard on most Android devices, offering efficient means of gesture-based text-entry. For the first time, Markussen et al. combined the two with the inception of Vulture \cite{ref_vulture}, the first mid-air, word-gesture keyboard, providing the fastest means of mid-air text-entry yet. This thesis builds on the findings of Markussen et al. and presents alternatives means for word separation in mid-air text-entry for word-gesture keyboards, exploring and identifying the problems of new techniques and presenting possible solutions. Of the new techniques, a bimodal approach shows great promise, reaching a mean text-entry rate of 15.8 Words Per Minutes for a single session with no training.
  • Item
    Models for rested touchless gestural interaction.
    (2015-07-31) Guinness, Darren, 1990-; Poor, G. Michael.
    Touchless mid-air gestural interaction has gained mainstream attention with the emergence of off-the-shelf commodity devices such as the Leap Motion and the Xbox Kinect. One of the issues with this form of interaction is fatigue, a problem colloquially known as the "Gorilla Arm Syndrome.'' However, by allowing interaction from a rested position, whereby the elbow is rested on a surface, this problem can be limited in its effect. In this paper we evaluate 3 possible methods for performing touchless mid-air gestural interaction from a rested position: a basic rested interaction, a simple calibrated interaction which models palm positions onto a hyperplane, and a more complex calibration which models the arm's interaction space using the angles of the forearm as input. The results of this work found that the two modeled interactions conform to Fitts's law and also demonstrated that implementing a simple model can improve interaction by improving performance and accuracy.
  • Item
    Giving the users a hand : towards touchless hand gestures for the desktop.
    (2014-09-05) Hari Haran, Alvin Jude.; Poor, G. Michael.; Computer Science.; Baylor University. Dept. of Computer Science.
    Touchless, mid-air, gesture-based interactions have recently moved out of laboratories and Hollywood movies and into the hands of users. There is little difference in the interaction style and techniques used today from that of the 1980's, despite advances in the technology enabling this interaction. For this interaction to achieve mainstream popularity, and to be as ubiquitous as the keyboard or the mouse, common problems such as the "Gorilla Arm Syndrome'' will have to be addressed. Additionally, the common use-case such as gestural navigation, selection, and manipulation will need to be improved and eventually standardized. This thesis presents solutions to existing problems and introduces possible interaction techniques that allows users to perform the actions above. This is expected to pave the way for touchless mid-air hand gestures to be a ubiquitous form of interaction on the desktop.
  • Item
    Faster k-means clustering.
    (2013-09-24) Drake, Jonathan, 1989-; Hamerly, Gregory James, 1977-; Computer Science.; Baylor University. Dept. of Computer Science.
    The popular k-means algorithm is used to discover clusters in vector data automatically. We present three accelerated algorithms that compute exactly the same clusters much faster than the standard method. First, we redesign Hamerly’s algorithm to use k heaps to avoid checking distance bounds for all n points, with little empirical gain. Second, we use an adaptive number of distance bounds to avoid redundant calculations (Drake and Hamerly 2012). Experiments show the superior performance of adaptive k-means in medium dimension (20 ≤ d ≤ 200) on uniform random data. Finally, we reformulate the triangle inequality to constrain the search space for a point’s nearest center to an annular region centered at the origin. For uniform random data, annulus k-means is competitive with or much faster than other algorithms in low dimension (d < 20), and it outperforms other algorithms on five of six naturally-clustered, real-world datasets tested (d ≤ 74).
  • Item
    Designing incentives in P2P systems.
    (2013-09-24) Berciu, Radu Mihai.; Donahoo, Michael J.; Computer Science.; Baylor University. Dept. of Computer Science.
    The goal of this thesis is bringing closer together the game theoretic approach of creating incentives with the requirements and properties of P2P systems. Briefly, we detail the P2P system context that incentive mechanisms must address, focusing on the main properties (e.g., the existence of cheap identities), types of transacted goods, common goals (e.g., maximize utilization, robustness to rational manipulations) and common problems (e.g., easy-riding) of such systems; we define the design space for P2P incentive mechanisms through the first taxonomy for such mechanisms and examine the main classes; we analyze in-depth how known incentive mechanisms achieve their goals, from both a P2P systems and a game theory perspectives using BitTorrent and mechanism design models; we bundle our prescriptions into a framework for designing P2P incentive mechanisms, and we use it to create an incentive mechanism for a BitTorrent-like system.
  • Item
    Performance improvements to peer-to-peer file transfers using network coding.
    (2013-09-16) Kelley, Aaron A.; Poucher, William Benjamin, 1948-; Computer Science.; Baylor University. Dept. of Computer Science.
    A common peer-to-peer approach to large data distribution is to divide the data into blocks. Peers will gather blocks from other peers in parallel. Problems with this approach are that each peer must know which blocks other peers have available, and in some instances it may not be possible to complete a download if certain blocks are not available in the network. Network coding, a method of distributing data over a peer-to-peer network by employing linear algebra, addresses these issues but comes with a substantial computational overhead. We examine possibilities for mitigating this extra computational cost through reduction of number of operations required to perform matrix multiplication in a finite field, by taking advantage of the small number of elements in the field and precomputing results. We evaluate our approach through simulation and demonstrate that it may serve to allow for faster transfer times on a more robust peer-to-peer network.
  • Item
    Age classification from facial images for detecting retinoblastoma.
    (2012-11-29) Chiam, Tak Chien.; Hamerly, Gregory James, 1977-; Computer Science.; Baylor University. Dept. of Computer Science.
    Facial age estimation from images is a difficult problem, both because it is naturally difficult to tell the exact age of a person visually, and because of the variations in images, such as illumination, pose, and expression. We want to classify people into two groups, children (age ≤ 5) and adults (age > 5), to facilitate the detection of retinoblastoma, a type of pediatric cancer. Current regression based methods are ineffective, as they usually have mean absolute error of 5 years, which is too high for our purposes. We study the facial anthropometric measurements of humans at different ages, and build a system based on these growth patterns. We detect 76 facial landmarks using Active Shape Models, analyze all possible ratios computable from these landmarks, and use the best ratios as input into a Support Vector Machine. Our final system does very well on our problem, correctly classifying 85% of images.
  • Item
    Information storage capacity of genetic algorithm fitness maps.
    (2011-09-14) Montañez, George D.; Hamerly, Gregory James, 1977-; Computer Science.; Baylor University. Dept. of Computer Science.
    To accurately measure the amount of information a genetic algorithm can generate, we must first measure the amount of information one can store, using a fitness map. The amount of information generated, minus the storage capacity, gives a tighter estimate on the levels of information generated by genetic algorithms. To measure the information storage capacity of fitness maps, we use the method suggested by Abu-Mostafa et al. (Abu-Mostafa and St Jacques, 1985) for measuring the information storage capacity of general forms of memory. Additionally, we measure the information in reference to the active information metric, as developed by Dembski et al. (Dembski and Marks, 2009). Our results show that a number of bits linear in the size of the search space can be stored in a fitness map, but only a logarithmic number of bits can be extracted by a genetic algorithm with stabilizing population and fixed population size.
  • Item
    MultiKarma : a fully decentralized virtual multi-currency.
    (2011-09-14) Allen, Jon D. (Jon Douglas); Donahoo, Michael J.; Computer Science.; Baylor University. Dept. of Computer Science.
    Participant-based technologies enable users to contribute resources to a shared pool that in the aggregate provides valuable services, such as social networks, massive multiplayer online games, file exchange, etc. Such systems depend on participant contribution; however, some peers may be unwilling to contribute at a level on par with their consumption. Monetary systems incentivize participation through compensation that allows portability, asynchronous participation, granularity and misbehavior costs. The use of government-backed currencies for incentive structures in participant-based systems results in exchange barriers and high transaction costs, while centralized virtual currencies (e.g., Facebook credits) remove many of the benefits of currency. Karma proposes the use of peer-to-peer systems to create a decentralized, consensus-based currency; however, it lacks a complete specification or implementation. We provide a specification, implementation, and evaluation of Karma. Next, we extend Karma to create a multi-currency system called MultiKarma where participants can mint, manage, and distribute their own currency.
  • Item
    Studies of active information in search.
    (2010) Ewert, Winston.; Hamerly, Gregory James, 1977-; Computer Science.; Baylor University. Dept. of Computer Science.
    A search process is an attempt to locate a solution to a problem, such as an optimization problem, where the space is usually too large to exhaustively sample. In order to investigate this idea this work looks a three examples of searches as cases studies. The examples considered are the location of a hidden string using a hamming distance, the encoding of a binary string using a perceptron, and developing programs using nand gates. In all of these cases, it is shown that the search processes work by making use of problem specific information. In addition, the algorithms used to demonstrate these search processes are often relatively inefficient at extracting the information from the available knowledge sources.
  • Item
    An OCL-based verification approach to analyzing static properties of a UML model.
    (2010-06-23T12:33:42Z) Sun, Wuliang.; Song, Eunjee.; Computer Science.; Baylor University. Dept. of Computer Science.
    There is a need for more rigorous analysis techniques that developers can use for verifying the critical properties in UML models. The UML-based Specification Environment (USE) tool supports verification of invariants, preconditions, and postconditions specified in the Object Constraint Language (OCL), which is useful when checking critical properties. However, the USE requires one to specify a model using its own textual language and does not allow one to import any model specification files created by other UML modeling tools. Hence, we often create a model with OCL constraints using a modeling tool such as the IBM Rational Software Architect (RSA) and then use the USE for the model verification. This approach, however, requires a manual transformation between two different specification formats, which diminishes the benefit of model-level verification. In this thesis, we describe our own implementation of a specification transformation engine based on the Model-Driven Architecture (MDA) framework.
  • Item
    "Two-way" obliviousness in general aspect-oriented modeling.
    (2008-10-01T16:55:12Z) Roberts, Nathan V.; Song, Eunjee.; Computer Science.; Baylor University. Dept. of Computer Science.
    A key problem in software development is producing systems that are maintainable even as the concerns at play evolve. Aspect-oriented programming (AOP) seeks to foster maintainability by isolating the specifications of cross-cutting concerns, allowing them to be modified in relative isolation from the rest of the system. Research in aspect-oriented modeling (AOM) aims to develop a model-layer analogue of AOP, allowing integration with accepted modeling practices. Aspects usually allow developers of the primary model to be oblivious to the aspects that modify the primary model; because of this, aspects can be closely coupled to potentially transient details of the primary model. When those details change, the aspects that depend on them may no longer have the desired effect. In this thesis, we examine three approaches to AOM, and introduce a novel solution to the problem of obliviousness by extending a graph-transformational approach to AOM.
  • Item
    PG-means: learning the number of clusters in data.
    (2007-03-19T14:52:48Z) Feng, Yu.; Hamerly, Gregory James, 1977-; Computer Science.; Baylor University. Dept. of Computer Science.
    We present a novel algorithm called PG-means in this thesis. This algorithm is able to determine the number of clusters in a classical Gaussian mixture model automatically. PG-means uses efficient statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we apply a statistical test to the entire model at once, not just on a per-cluster basis. We show that this method works well in difficult cases such as overlapping clusters, eccentric clusters and high dimensional clusters. PG-means also works well on non-Gaussian clusters and many true clusters. Further, the new approach provides a much more stable estimate of the number of clusters than current methods.
All items in BEARdocs are protected by original copyright, with all rights reserved, unless otherwise indicated.