Use ADAM as a library in new applications

To use ADAM as a library in new applications:

Create an object with a main(args: Array[String]) method and handle command line arguments. Feel free to use the args4j library or any other argument parsing library.

object MyExample {
  def main(args: Array[String]) {
    if (args.length < 1) {
      System.err.println("at least one argument required, e.g. input.foo")
      System.exit(1)
    }
  }
}

Create an Apache Spark configuration SparkConf and use it to create a new SparkContext. The following serialization configuration needs to be present to register ADAM classes. If any additional Kyro serializers need to be registered, create a registrator that delegates to the ADAM registrator. You might want to provide your own serializer registrator if you need custom serializers for a class in your code that either has a complex structure that Kryo fails to serialize properly via Kryo’s serializer inference, or if you want to require registration of all classes in your application to improve performance.

val conf = new SparkConf()
  .setAppName("MyCommand")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .set("spark.kryo.registrator", "org.bdgenomics.adam.serialization.ADAMKryoRegistrator")
  .set("spark.kryo.referenceTracking", "true")

val sc = new SparkContext(conf)
// do something

Configure the new application build to create a fat jar artifact with ADAM and its transitive dependencies included. For example, this maven-shade-plugin configuration would work for an Apache Maven build.

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <configuration>
    <createDependencyReducedPom>false</createDependencyReducedPom>
    <filters>
      <filter>
        <artifact>*:*</artifact>
        <excludes>
          <exclude>META-INF/*.SF</exclude>
          <exclude>META-INF/*.DSA</exclude>
          <exclude>META-INF/*.RSA</exclude>
        </excludes>
      </filter>
    </filters>
  </configuration>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <transformers>
          <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
        </transformers>
      </configuration>
    </execution>
  </executions>
</plugin>

Build the new application and run via spark-submit.

spark-submit \
  --class MyCommand \
  target/my-command.jar \
  input.foo

A complete example of this pattern can be found in the heuermh/adam-examples repository.

Writing your own registrator that calls the ADAM registrator

As we do in ADAM, an application may want to provide its own Kryo serializer registrator. The custom registrator may be needed in order to register custom serializers, or because the application’s configuration requires all serializers to be registered. In either case, the application will need to provide its own Kryo registrator. While this registrator can manually register ADAM’s serializers, it is simpler to call to the ADAM registrator from within the registrator. As an example, this pattern looks like the following code:

import com.esotericsoftware.kryo.Kryo
import org.apache.spark.serializer.KryoRegistrator
import org.bdgenomics.adam.serialization.ADAMKryoRegistrator

class MyCommandKryoRegistrator extends KryoRegistrator {

  private val akr = new ADAMKryoRegistrator()

  override def registerClasses(kryo: Kryo) {

    // register adam's requirements
    akr.registerClasses(kryo)

    // ... register any other classes I need ...
  }
}