ADAM User Guide¶
ADAM is a library and command line tool that enables the use of Apache Spark to parallelize genomic data analysis across cluster/cloud computing environments. ADAM uses a set of schemas to describe genomic sequences, reads, variants/genotypes, and features, and can be used with data in legacy genomic file formats such as SAM/BAM/CRAM, BED/GFF3/GTF, and VCF, as well as data stored in the columnar Apache Parquet format. On a single node, ADAM provides competitive performance to optimized multi-threaded tools, while enabling scale out to clusters with more than a thousand cores. ADAM’s APIs can be used from Scala, Java, Python, R, and SQL.
The ADAM/Big Data Genomics Ecosystem¶
ADAM builds upon the open source Apache Spark, Apache Avro, and Apache Parquet projects. Additionally, ADAM can be deployed for both interactive and production workflows using a variety of platforms. A diagram of the ecosystem of tools and libraries that ADAM builds on and the tools that build upon the ADAM APIs can be found below.
As the diagram shows, beyond the ADAM CLI, there are a number of tools built using ADAM’s core APIs:
- Avocado - Avocado is a distributed variant caller built on top of ADAM for germline and somatic calling.
- Cannoli - ADAM Pipe API wrappers for bioinformatics tools, (e.g., BWA, bowtie2, FreeBayes)
- DECA - DECA is a reimplementation of the XHMM copy number variant caller on top of ADAM.
- Gnocchi - Gnocchi provides primitives for running GWAS/eQTL tests on large genotype/phenotype datasets using ADAM.
- Lime - Lime provides a parallel implementation of genomic set theoretic primitives using the ADAM region join API.
- Mango - Mango is a library for visualizing large scale genomics data with interactive latencies.
For more, please see our awesome list of applications that extend ADAM.
- Deploying ADAM on AWS
- Running ADAM on CDH 5, HDP, and other YARN based Distros
- Running ADAM on Toil
- Running ADAM on Slurm
- Running ADAM’s command line tools
- Action tools
- Conversion tools
- Printing tools
- API Overview
- Loading data with the ADAMContext
- Working with genomic data using GenomicDatasets
- Using ADAM’s RegionJoin API
- Using ADAM’s Pipe API
- ADAM Python Documentation
- Building Downstream Applications
- Extend the ADAM CLI by adding new commands
- Extend the ADAM CLI by adding new commands in an external repository
- Use ADAM as a library in new applications