High Performance Computing (HPC)

The main Bioinformatics Unit computing infrastructure is composed by a HPC cluster with 38 nodes: 34 physical nodes (master and compute nodes) and 4 virtual nodes (shadow master and submit nodes) with global 1590 vCPU and 4447 GB of RAM. All interconnects by an isolate, secure network of 40 Gbs.

A second cluster is a Hadoop cluster with 5 nodes: 4 physical (master and compute nodes) and a virtual node (ambari) with global 104 vCPU and 377 GB of RAM running the Hortonwork HDP 3.1.4.0 distribution managed with Ambari 2.7.4.0. As before, all interconnects by an isolate, secure network of 40 Gbs.

The clusters mounts by NFS4 or HDFS a shared file-system from a dedicated HPC ultra-low latency storage (EMC-Isilon) with 110 TB of capacity.

Methods and tools

https://bioinfo.cnic.es/Apps

Services

1. Omics Data Analysis

A) Data preprocessing:

The Unit has implemented pipelines for the preprocessing and first level analysis of data generated by the following techniques:

  • Bulk Transcriptomics and gene expression regulation:
    • RNA-Seq
    • miRNA-Seq
    • ATAC-Seq
    • ChIP-Seq
    • ChiRP-Seq
    • SLAM-Seq
    • Gro-Seq
  • scRNA-Seq:
  • Genome-wide methylomicanalysis:
    • MethylationEPIC BeadChip
    • Whole Genome Bisulfite Sequencing
  • Somatic and Germline variant detection by:
    • Targeted gene sequencing
    • Whole Exome Sequencing
    • Whole Genome Sequencing
    • SNP arrays

B) Data visualization tools:

C) Probabilistic models:

2. Cardiovascular Data Science

This emerging area deals with the integrative analysis of the phenotypical, exposure and molecular profiling of large human cohorts for a better understanding of cardiovascular and vascular-related diseases. 

The Bioinformatics Unit aims to provide with complete solutions for these type of projects:

  1. Implementation of data lake tools such as i2b2 transmart:
  2. Implementation of Laboratory Information Management System (LIMS) openbis
  3. Application of machine learning algorithms for the integration of large amounts of data:

On-going projects:

  1. Progression of Early Subclinical Atherosclerosis (PESA):
  2. IM-Joven:
  3. Aragon Workers Health Study:

3. Protein Structure prediction

  1. In-silico modeling and docking of protein and complex with local implementation of Rosetta suite and PyRosetta
  2. In-silico modeling of protein with local implementation of I-Tasser
  3. In-silico modeling of mutant protein and complex with local implementation of Strum, DAMpred and Rosetta tools