Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Sketch-Based Algorithmic Frameworks for Genome-Scale Mapping
DescriptionSketching is a widely used class of techniques aimed at generating compact representations of longer biological sequences. Instead of comparing sequences, sketches allow us to sample from a subspace of k-mers and use those samples for comparison, saving both time and memory in the end application. One of the key metrics to consider here is density, which refers to the fraction of the sampled k-mers retained by the sketch. While a lower density is preferable for space considerations, it could also impact the sensitivity of the mapping process.
In this work, we study sketch-based data sparsification with high performance computing to improve scalability in mapping. Our contributions are twofold: 1) we present a scalable parallel algorithmic framework for alignment-free mapping, called JEM-mapper, and 2) we present a sketch library called MHSketch by extending JEM-mapper to adopt different sequence sketching schemes. Experimental evaluation demonstrates the ability of our approach to significantly reduce density and reap performance benefits from it. In particular, results show that MHSketch achieves accurate mapping while reducing time-to-solution (speedups between 2.2x to 9.3x), and drastically reducing memory usage (>92% savings) compared to other tools.
In this work, we study sketch-based data sparsification with high performance computing to improve scalability in mapping. Our contributions are twofold: 1) we present a scalable parallel algorithmic framework for alignment-free mapping, called JEM-mapper, and 2) we present a sketch library called MHSketch by extending JEM-mapper to adopt different sequence sketching schemes. Experimental evaluation demonstrates the ability of our approach to significantly reduce density and reap performance benefits from it. In particular, results show that MHSketch achieves accurate mapping while reducing time-to-solution (speedups between 2.2x to 9.3x), and drastically reducing memory usage (>92% savings) compared to other tools.

