1.利用箱线图比较两类样本的某个细胞比例差异
比较直观,但是缺点在于如果单细胞样本个数过少且异质性大,导致很难有统计学显著意义
library(ggpubr)
data <- data.frame(Cancer = c(0.5, 0.6, 0.8, 0.2),
Normal = c(0.2, 0.3, 0.7, 0.4),
Celltype = "T cells")
mydata <- reshape2::melt(data,id.vars=c("Celltype"))
ggboxplot(mydata, x = "Celltype", y = "value",
color = "variable", palette = "jama",
add = "jitter") + stat_compare_means(aes(color=variable))
2.R o/e 比值
好多文章都有用这个,我的理解是四格表卡方检验计算出来的观测除以期望
Cell_type | Cancer | Normal |
---|---|---|
Tcell | 80 | 200 |
Bcell | 100 | 120 |
Tam | 200 | 100 |
例如上述数据,一开始有三类细胞,分别在癌和正常的个数如表所示,那么计算R
o/e 的时候就要构建四格表,以T细胞为例
Cell_type | Cancer | Normal |
---|---|---|
Tcell | 80 | 200 |
Others | 300 | 220 |
##计算卡方值以及期望和观测值
x <- chisq.test(matrix(c(80,300,200,220),ncol = 2))
Roe <- x$observed / x$expected
Roe
## [,1] [,2]
## [1,] 0.6015038 1.3605442
## [2,] 1.2145749 0.8058608
#p值
paste0("P-value = ",x$p.value)
## [1] "P-value = 6.55065992002061e-15"
可以看出Normal组Roe>1,说明T细胞比例在Normal组相对Cancer组会更多一些
3.OR指数
这个其实跟Roe是等同效果,只不过它是利用fisher检验来计算OR值
###T细胞在Cancer组OR值
fisher.test(matrix(c(80,300,200,220),ncol = 2))
##
## Fisher's Exact Test for Count Data
##
## data: matrix(c(80, 300, 200, 220), ncol = 2)
## p-value = 2.184e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.2117305 0.4052156
## sample estimates:
## odds ratio
## 0.2938061
###T细胞在Normal组OR值
fisher.test(matrix(c(200,220,80,300),ncol = 2))
##
## Fisher's Exact Test for Count Data
##
## data: matrix(c(200, 220, 80, 300), ncol = 2)
## p-value = 2.184e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.467822 4.722986
## sample estimates:
## odds ratio
## 3.403605
终: 写这个单纯记录一下过程,避免后面自己忘记了,仅为拙见