Computational Science and Engineering

Scientific Computing

QCRI’s Computational Science and Engineering center conducts research in bioinformatics and high performance computing.

We Are Hiring!  Click for details.

Bioinformatics is the field of science in which biology, computer science, statistics and information-based technology form a single discipline.

The science of Bioinformatics, which is the melding of molecular biology with computer science, is essential to the use of genomic and proteomic information in understanding human diseases and in the identification of new molecular targets for drug discovery.

In the past 10 years, a bioinformatics concern was the creation of a large database to store biological and biomedical information, such as nucleotide and amino acid sequences.

Development of this type of database involved not only design issues, but the development of complex interfaces whereby researchers could both access existing data and submit new or revised data.

However, the field of Bioinformatics has evolved to the point that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures.

Important sub-disciplines within bioinformatics include:

  • The development and implementation of tools that enable efficient access to, and use and management of, various types of information. Data to be stored are usually of large size, and have about 100,000-120,000 variables.
  • The ability to visualize data. Biologists are not prepared to handle the huge data produced by the proteins or DNA microarray projects or to use the ‘eye’ to visualize and interpret the output. Therefore, to detect, pattern, visualize, classify, and store the data, more sophisticated tools are needed. The interdisciplinary data-mining problems are required as a new tool to dig into the data. Basic statistical tools and statistical inferences are also useful, including cluster analysis, Bayesian modeling, classification and discrimination, neural networks, and graphical models. The basic idea behind those approaches is to learn (classification, neural networks, principal component analysis (PCA), support vector machine); to predict (prediction, regression, regression tree); and to cluster (hierarchical clustering, Bayesian clustering, k-means, mixture model with Gibbs sampler or EM algorithm), to name a few.
  • The development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences.
  • The ability to capitalize on the emerging technology of database-mining. In fact, due to the large array of data that is generated from a single analysis, it is essential to implement the use of algorithms that can detect expression patterns from such large volumes of data correlating to a given biological/pathological phenotype from multiple samples. At the protein level, bioinformatics can be a tool that enables the identification of validated biomarkers correlating strongly to disease progression such as cancer. This would not only classify the cancerous and non-cancerous tissues according to their molecular profile, but could also focus attention upon a relatively small number of molecules that might warrant further biochemical/molecular characterization to assess their suitability as potential therapeutic targets.

Work at QCRI’s bioinformatics center involves collaboration on diverse disciplines such as mathematics, computer sciences, biology, statistics, and economics. We also aim to develop genomic, proteomics, and bioinformatic tools that can be applied to study infectious diseases as well as a drug discovery.

High-Performance Computing (HPC) is a key enabler of simulation-based science and engineering. Through Visualization, researchers are able to synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data; detect the expected and discover the unexpected; provide timely, defensible, and understandable assessments; and communicate assessments effectively for action. Simulation techniques allow researchers to design a model of a real-world system and conduct experiments on this model to understand the behavior of the system and evaluate various strategies for the operation of the system.

Another area of HPC of interest to QCRI is Data Intensive Computing, which enables the handling of vast amounts of data. This is especially important in light of the fact that the volume of digital data grew by 50 percent between 2009 and 2010 to 1.2 zettabytes (ZB) and is expected to reach 35 ZB by 2020.

HPC means more than just high-end computing. Software and hardware techniques, such as parallel processing, that have been developed over the past two decades are now essential for mainstream computing. As this technology advances, QCRI plans to be at the forefront of HPC developments of the future.

For technical or informational questions, please send an email to QCRI Careers with the name of the group to whom you’re directing your question, e.g. ALT, CS&E, Cyber Security, Data Analytics, Distributed Systems or Social Computing, in the subject line.

, Al-Nasr tower, Doha, Qatar.