Running an example command

flagstat

Once you have data converted to ADAM, you can gather statistics from the ADAM file using flagstat. This command will output stats identically to the samtools flagstat command.

./bin/adam-submit flagstat NA12878_chr20.adam

Outputs:

51554029 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
50849935 + 0 mapped (98.63%:0.00%)
51554029 + 0 paired in sequencing
25778679 + 0 read1
25775350 + 0 read2
49874394 + 0 properly paired (96.74%:0.00%)
50145841 + 0 with itself and mate mapped
704094 + 0 singletons (1.37%:0.00%)
158721 + 0 with mate mapped to a different chr
105812 + 0 with mate mapped to a different chr (mapQ>=5)

In practice, you will find that the ADAM flagstat command takes orders of magnitude less time than samtools to compute these statistics. For example, on a MacBook Pro, the command above took 17 seconds to run while samtools flagstat NA12878_chr20.bam took 55 seconds. On larger files, the difference in speed is even more dramatic. ADAM is faster because it is multi-threaded, distributed and uses a columnar storage format (with a projected schema that only materializes the read flags instead of the whole read).

Running on a cluster

We provide the adam-submit and adam-shell commands under the bin directory. These can be used to submit ADAM jobs to a spark cluster, or to run ADAM interactively.