bdgenomics.adam.ds.SliceDataset¶

class bdgenomics.adam.ds.SliceDataset(jvmDataset, sc)[source]¶

__init__(jvmDataset, sc)[source]¶

Constructs a Python SliceDataset from a JVM SliceDataset. Should not be called from user code; instead, go through bdgenomics.adamContext.ADAMContext.

Parameters:	jvmDataset – Py4j handle to the underlying JVM SliceDataset. sc (pyspark.context.SparkContext) – Active Spark Context.

Methods

`__init__`(jvmDataset, sc)	Constructs a Python SliceDataset from a JVM SliceDataset.
`broadcastRegionJoin`(genomicDataset[, flankSize])	Performs a broadcast inner join between this genomic dataset and another genomic dataset.
`broadcastRegionJoinAndGroupByRight`(…[, …])	Performs a broadcast inner join between this genomic dataset and another genomic dataset.
`cache`()	Caches underlying RDD in memory.
`countKmers`(kmerLength)	Counts the k-mers contained in a slice.
`filterByOverlappingRegion`(query)	Runs a filter that selects data in the underlying RDD that overlaps a single genomic region.
`filterByOverlappingRegions`(querys)	Runs a filter that selects data in the underlying RDD that overlaps a several genomic regions.
`flankAdjacentFragments`(flankLength)	For all adjacent records in the genomic dataset, we extend the records so that the adjacent records now overlap by _n_ bases, where _n_ is the flank length.
`fullOuterShuffleRegionJoin`(genomicDataset[, …])	Performs a sort-merge full outer join between this genomic dataset and another genomic dataset.
`leftOuterShuffleRegionJoin`(genomicDataset[, …])	Performs a sort-merge left outer join between this genomic dataset and another genomic dataset.
`leftOuterShuffleRegionJoinAndGroupByLeft`(…)	Performs a sort-merge left outer join between this genomic dataset and another genomic dataset, followed by a groupBy on the left value.
`persist`(sl)	Persists underlying RDD in memory or disk.
`pipe`(cmd, tFormatter, xFormatter, convFn[, …])	Pipes genomic data to a subprocess that runs in parallel using Spark.
`rightOuterBroadcastRegionJoin`(genomicDataset)	Performs a broadcast right outer join between this genomic dataset and another genomic dataset.
`rightOuterBroadcastRegionJoinAndGroupByRight`(…)	Performs a broadcast right outer join between this genomic dataset and another genomic dataset.
`rightOuterShuffleRegionJoin`(genomicDataset)	Performs a sort-merge right outer join between this genomic dataset and another genomic dataset.
`rightOuterShuffleRegionJoinAndGroupByLeft`(…)	Performs a sort-merge right outer join between this genomic dataset and another genomic dataset, followed by a groupBy on the left value, if not null.
`save`(fileName)	Save slices as Parquet or FASTA.
`shuffleRegionJoin`(genomicDataset[, flankSize])	Performs a sort-merge inner join between this genomic dataset and another genomic dataset.
`shuffleRegionJoinAndGroupByLeft`(genomicDataset)	Performs a sort-merge inner join between this genomic dataset and another genomic dataset, followed by a groupBy on the left value.
`sort`()	Sorts our genome aligned data by reference positions, with contigs ordered by index.
`sortLexicographically`()	Sorts our genome aligned data by reference positions, with contigs ordered lexicographically
`toDF`()	Converts this GenomicDataset into a DataFrame.
`transform`(tFn)	Applies a function that transforms the underlying DataFrame into a new DataFrame using the Spark SQL API.
`transmute`(tFn, destClass[, convFn])	Applies a function that transmutes the underlying DataFrame into a new genomic dataset of a different type.
`union`(datasets)	Unions together multiple genomic datasets.
`unpersist`()	Unpersists underlying RDD from memory or disk.