Setp 3-1: Clumping (remove linked SNPs)

Function: gprs clump

This option encodes plink1.9 clump function

plink --bfile [bfiles] --clump [qc snpslists] --clump-p1  --clump-p2  --clump-r2  --clump-kb  --clump-field  --clump-snp-field  --out 

The plink_bfiles_dir, qc snpslists and clump_output_dir will automatically be filled in the script. Users have to indicate the options below.

How to use it?

Shell:

$ gprs clump --ref [str] --data_dir [str] --clump_kb [int] --clump_p1 [float/scientific notation] --clump_p2 [float/scientific notation] --clump_r2 [float] --clump_field [str] --clump_snp_field [str] --plink_bfile_name [str] --qc_file_name [str] --output_name [output name]

Python:

from gprs.gene_atlas_model import GeneAtlasModel

if __name__ == '__main__':
    geneatlas = GeneAtlasModel( ref='1000genomes/hg19',
                    data_dir='data/2014_GWAS_Height' )

    geneatlas.clump(output_name='2014height',
                    clump_kb='250',
                    clump_p1='0.02', clump_p2='0.02',
                    qc_file_name='2014height',
                    plink_bfile_name='2014height')


output files

  • *.clump

CHR F SNP BP P TOTAL NSIG S05 S01 S001 S0001 SP2
1 1 rs1967017 145723645 3.72e-16 19 0 2 6 3 8 rs11590105(1),rs17352281(1),rs9728345(1),rs11587821(1)
1 1 rs760077 155178782 7.45e-10 12 0 2 2 1 7 rs11589479(1),rs3766918(1),rs4625273(1),rs4745(1),rs12904(1)

Setp 3-2: Filter SNPs depends on .clump

After clumping, we have to filter SNPs again, to remove linked SNPs. In this step, we will have new SNPs list, and use it for generate PRS model.

Function: gprs select-clump-snps

How to use it?

Shell:

$ gprs select-clump-snps --result_dir [str] --qc_file_name [str] --clumpfolder_name [str] --clump_file_name [str] --clump_kb [int] --clump_p1 [float/scientific notation] --clump_r2 [float] --output_name [output name]

Python:

from gprs.gene_atlas_model import GeneAtlasModel

if __name__ == '__main__':
    geneatlas = GeneAtlasModel( ref='1000genomes/hg19',
                    data_dir='data/2014_GWAS_Height' )

    geneatlas.select_clump_snps(output_name='2014height',clump_file_name='2014height',
                           qc_file_name='2014height',clumpfolder_name='',clump_kb='250',
                    clump_p1='0.02', clump_r2='0.02')

output files

  • *.qc_clump_snpslist.csv

With Chromosome information:

CHR SNP Allele Beta SE Pvalue
1 rs1967017 A -0.0736 0.0315 0.01938
1 rs760077 T -0.1603 0.0543 0.003139

Without Chromosome information:

SNPID Allele Beta SE Pvalue
9:98316375:A:G A -0.0736 0.0315 0.01938
9:105570921:T:C T -0.1603 0.0543 0.003139