|
@@ -18,11 +18,10 @@ CALDER is a Hi-C analysis tool that allows: (1) compute chromatin domains from w
|
|
|
Due to reasons such as low data quality or large scale structrual variation, compartments can be unrealiablly called at one `bin_size` (equivalent to `resoltution` in the literature) but might be captured at another `bin_size`. We added an opitimized `bin_size` selection strategy to call reliable compartments. It is based on the observation from our large scale compartment analysis (https://www.nature.com/articles/s41467-021-22666-3) that, although compartments can change between different conditions, their overall correlation `cor(compartment_rank_1, compartment_rank_2)` is high (> 0.4).
|
|
|
<br>
|
|
|
<br>
|
|
|
-Given a `bin_size` specified by user, we call compartment with extended `bin_sizes` and choose the smallest `bin_size` such that no bigger `bin_size` can increase the correclation with a reference compartment more than 0.05. For example, if correclation for `bin_size=10000` is 0.2 while for `bin_size=50000` is 0.6, we are more confident the latter is more reliable; if correclation for `bin_size=10000` is 0.5 while for `bin_size=50000` is 0.52, we would choose the former as it has higher resolution.
|
|
|
+Given a `bin_size` specified by user, we call compartments with extended `bin_sizes` and choose the smallest `bin_size` such that no bigger `bin_size` can increase the correclation with a reference compartment more than 0.05. For example, if correclation for `bin_size=10000` is 0.2 while for `bin_size=50000` is 0.6, we are more confident the latter is more reliable; if correclation for `bin_size=10000` is 0.5 while for `bin_size=50000` is 0.52, we would choose the former as it has higher resolution.
|
|
|
<br>
|
|
|
<br>
|
|
|
-High quality compartment calls were generated for `hg19` (hic data from GSE63525), `hg38` (hic data from https://data.4dnucleome.org/files-processed/4DNFI1UEG1HD/), `mm9` (hic data from GSM3959427), `mm10` (hic data from http://hicfiles.s3.amazonaws.com/external/bonev/CN_mapq30.hic)
|
|
|
-
|
|
|
+`bin_size` is extended in the following way such that we can aggregated directly from the input contact matrix into larger `bin_sizes`
|
|
|
```
|
|
|
if(bin_size==5E3) bin_sizes = c(5E3, 10E3, 50E3, 100E3)
|
|
|
if(bin_size==10E3) bin_sizes = c(10E3, 50E3, 100E3)
|
|
@@ -31,6 +30,10 @@ if(bin_size==25E3) bin_sizes = c(25E3, 50E3, 100E3)
|
|
|
if(bin_size==40E3) bin_sizes = c(40E3, 80E3)
|
|
|
if(bin_size==50E3) bin_sizes = c(50E3, 100E3)
|
|
|
```
|
|
|
+<br>
|
|
|
+<br>
|
|
|
+High quality reference compartments were generated for `hg19` (hic data from GSE63525), `hg38` (hic data from https://data.4dnucleome.org/files-processed/4DNFI1UEG1HD/), `mm9` (hic data from GSM3959427), `mm10` (hic data from http://hicfiles.s3.amazonaws.com/external/bonev/CN_mapq30.hic)
|
|
|
+
|
|
|
|
|
|
# Installation
|
|
|
|