# 一.安装

## 1.1 安装依赖
1. R        
2. R 包:argparser, tidyverse, formattable, AnnotationForge, seqinr, clusterProfiler       

## 1.2 安装 emcp

```
git clone http://git.genek.cn:3333/zhxd2/emcp.git
```

# 二.使用

## 2.1 准备输入文件

输入文件为蛋白序列, 参考 example_data 目录下的 proteins.fa        

## 2.2 第一步:在线注释
登录 http://eggnog-mapper.embl.de/ 上传蛋白序列, 在线进行蛋白注释.          
完成后下载 out.emapper.annotations     


## 2.3 第二步:运行 emapperx.R

以测试数据为例
```
cd example_data
Rscript ../emapperx.R out.emapper.annotations proteins.fa
```

这一步两个功能:    
1. 对 emapper 注释结果进行统计绘图      
2. 构建 OrgDB 用于富集分析等     

## 2.4 大数据集的另一种选择: emapperx_split.R

对于大数据集，可以使用 `emapperx_split.R` 脚本，它每次只构建一种GO本体类型（MF、BP或CC）的OrgDB包，有效解决内存不足问题。

```
cd example_data

# 构建分子功能(MF)的OrgDB
Rscript ../emapperx_split.R out.emapper.annotations proteins.fa MF

# 构建生物过程(BP)的OrgDB
Rscript ../emapperx_split.R out.emapper.annotations proteins.fa BP

# 构建细胞组分(CC)的OrgDB
Rscript ../emapperx_split.R out.emapper.annotations proteins.fa CC
```

这种方法可以显著减少内存使用，适合处理大型基因组数据集。