Table 2: GNUMAP using hashmap and FM-index comparison using the same mapping parameters. Run-time and number of mapped reads using the hashmap and FM-index versions of GNUMAP on the ART generated synthetic dataset and on the real dataset. As kmer size increases, from 7 to 10, run-time is about the same. For kmers sizes 11 and 12 run-time is much larger (see bolded rows). The run-time increase for these kmers is caused by the composition of hg19 where lots of 11 and 12-mers (13-mers in real dataset where 11 and 12-mers didn't finish) happen to have frequencies near the chosen threshold (1,000) resulting in more NW alignments that are costly in terms of time. For 13-mers and larger kmers, the run-time decreases and eventually settles at a minimum run-time.

Hashmap FM-Index
k
k
Run-Time
(HH:MM)

Mapped Reads
Run-Time
(HH:MM)

Mapped Reads
7 0:52 0% 0:06 0%
8 0:53 0% 0:06 0%
9 1:01 0.17% 0:18 0.02%
10 1:30 1.89% 1:34 1.63%
11 10:08 46.51% 7:03 44.76%
12 7:14 70.68% 5:10 70.57%
13 3:38 69.82% 2:08 69.81%
14 1:19 65.65% 0:55 65.65%
15 1:09 59.50% 0:30 59.50%
16 - - 0:21 55.26%
17 - - 0:17 55.21%
18 - - 0:15 55.38%
19 - - 0:14 55.05%
20 - - 0:12 55.59%
21 - - 0:11 48.85%
22 - - 0:10 44.68%
23 - - 0:10 40.57%

(b) Real Dataset.

Hashmap FM-Index
k
k
Run-Time
(HH:MM)

Mapped Reads
Run-Time
(HH:MM)

Mapped Reads
7 3:56 0% 0:30 0%
8 3:45 0% 0:30 0%
9 4:12 0.65% 0:45 0.08%
10 5:04 6.48% 4:43 5.80%
11 - - - -
12 - - - -
13 20:57 91.04% 10:57 91.03%
14 14:36 91.48% 5:04 91.48%
15 8:50 91.71% 2:59 91.71%
16 - - 2:10 91.88%
17 - - 1:51 92.07%
18 - - 1:43 92.23%
19 - - 1:34 92.32%
20 - - 1:29 92.36%
21 - - 1:23 92.36%
22 - - 1:19 92.36%
23 - - 1:15 92.31%