bdgenomics.adam.ds.CoverageDataset¶

class bdgenomics.adam.ds.CoverageDataset(jvmDataset, sc)[source]¶

Wraps a GenomicDataset with Coverage metadata and functions.

__init__(jvmDataset, sc)[source]¶

Constructs a Python CoverageDataset from a JVM CoverageDataset. Should not be called from user code; instead, go through bdgenomics.adamContext.ADAMContext.

Parameters:	jvmDataset – Py4j handle to the underlying JVM CoverageDataset. sc (pyspark.context.SparkContext) – Active Spark Context.

Methods

`__init__`(jvmDataset, sc)	Constructs a Python CoverageDataset from a JVM CoverageDataset.
`aggregatedCoverage`([bpPerBin])	Gets coverage overlapping specified ReferenceRegion.
`broadcastRegionJoin`(genomicDataset[, flankSize])	Performs a broadcast inner join between this genomic dataset and another genomic dataset.
`broadcastRegionJoinAndGroupByRight`(…[, …])	Performs a broadcast inner join between this genomic dataset and another genomic dataset.
`cache`()	Caches underlying RDD in memory.
`collapse`()	Merges adjacent ReferenceRegions with the same coverage value.
`coverage`([bpPerBin])	Gets coverage overlapping specified ReferenceRegion.
`filterByOverlappingRegion`(query)	Runs a filter that selects data in the underlying RDD that overlaps a single genomic region.
`filterByOverlappingRegions`(querys)	Runs a filter that selects data in the underlying RDD that overlaps a several genomic regions.
`flatten`()	Gets flattened genomic dataset of coverage, with coverage mapped to each base pair.
`fullOuterShuffleRegionJoin`(genomicDataset[, …])	Performs a sort-merge full outer join between this genomic dataset and another genomic dataset.
`leftOuterShuffleRegionJoin`(genomicDataset[, …])	Performs a sort-merge left outer join between this genomic dataset and another genomic dataset.
`leftOuterShuffleRegionJoinAndGroupByLeft`(…)	Performs a sort-merge left outer join between this genomic dataset and another genomic dataset, followed by a groupBy on the left value.
`persist`(sl)	Persists underlying RDD in memory or disk.
`pipe`(cmd, tFormatter, xFormatter, convFn[, …])	Pipes genomic data to a subprocess that runs in parallel using Spark.
`rightOuterBroadcastRegionJoin`(genomicDataset)	Performs a broadcast right outer join between this genomic dataset and another genomic dataset.
`rightOuterBroadcastRegionJoinAndGroupByRight`(…)	Performs a broadcast right outer join between this genomic dataset and another genomic dataset.
`rightOuterShuffleRegionJoin`(genomicDataset)	Performs a sort-merge right outer join between this genomic dataset and another genomic dataset.
`rightOuterShuffleRegionJoinAndGroupByLeft`(…)	Performs a sort-merge right outer join between this genomic dataset and another genomic dataset, followed by a groupBy on the left value, if not null.
`save`(filePath[, asSingleFile, disableFastConcat])	Saves coverage as feature file.
`shuffleRegionJoin`(genomicDataset[, flankSize])	Performs a sort-merge inner join between this genomic dataset and another genomic dataset.
`shuffleRegionJoinAndGroupByLeft`(genomicDataset)	Performs a sort-merge inner join between this genomic dataset and another genomic dataset, followed by a groupBy on the left value.
`sort`()	Sorts our genome aligned data by reference positions, with contigs ordered by index.
`sortLexicographically`()	Sorts our genome aligned data by reference positions, with contigs ordered lexicographically
`toDF`()	Converts this GenomicDataset into a DataFrame.
`toFeatures`()	Converts CoverageDataset to FeatureDataset.
`transform`(tFn)	Applies a function that transforms the underlying DataFrame into a new DataFrame using the Spark SQL API.
`transmute`(tFn, destClass[, convFn])	Applies a function that transmutes the underlying DataFrame into a new genomic dataset of a different type.
`union`(datasets)	Unions together multiple genomic datasets.
`unpersist`()	Unpersists underlying RDD from memory or disk.