Alexandria Engineering Libraries: Fast Similarity Search for Protein and DNA Sequences

الثلاثاء، 5 أغسطس 2014

Fast Similarity Search for Protein and DNA Sequences

Author : Mohamed Gamal Mohamed Badr

Degree : M.Sc. Computer

Title: Fast Similarity Search for Protein and DNA Sequences

Abstract

Protein function prediction is a fundamental task in computational biology and has many practical applications. This task plays a critical role in the process of drug design. This process includes detecting a target protein based on its function, and then this protein’s function is to be moderated or blocked. Advances in genome sequencing technology resulted in a large growth of the size of proteins’ sequences databases. A significant portion of these databases’ protein sequences still haven’t their functions explored. A number of methods have been developed for protein function prediction. Manual analysis techniques usually provide high accuracy for predicting protein function. However the huge amount of sequence data has made manual analysis tedious and cumbersome. Hence, a number of computational methods have been developed for predicting protein function. These computational methods usually depend on different sources of information. These sources of information include protein homology, protein interaction network analysis, gene expression analysis and literature’s text mining. The most prevalent methods used for protein homology detection are those based on protein homology. The idea behind these methods is that given a newly sequenced protein (a query), we search a database of well characterized proteins (proteins with their function and other information recorded) and retrieve database proteins homologous to this newly sequenced protein. Homology is usually inferred via protein sequence similarity. Hence homologous proteins are detected by scoring similarity of query with database sequences. After detecting homologous proteins, functional information is transferred from database to query sequence based on level of homology. An important challenge is detecting homologies in cases of low pairwise similarity; this problem is called remote homology detection. Many methods have been developed for solving this problem. Profile based method are usually used for remote homology detection. In this type of methods a profile is created for the query and this profile is scored against database sequences. An extension to profile based methods is profile-profile methods in which a profile is createdfor the query and clusters of closely related sequences in the database, then these profiles are compared. HHsearch: a remote protein homology detection based on

comparing two profile hidden Markov models (HMMs) achieves relatively higher sensitivity than other remote homology detection in the literature. However, Hlisearch used dynamic programming algorithm for comparing two HMMs, hence HHsearch is a computationally intensive method. To solve this problem, we have developed SHsearch as a faster alternative for HHsearch that significantly reduces computational time with a minimal sensitivity loss. SHsearch focuses on comparing the most important sub-models instead of comparing the complete two models as in HHsearch. The results show a speedup of 88X for SHsearch relative to HHsearch with 8.2 sensitivity loss at error rate of 10, which deemed to be acceptable.

Alexandria Engineering Libraries

الصفحات

Alexandria Engineeing Libraries

بحث هذه المدونة الإلكترونية

شرح طريقة تحميل الامتحانات من المدونة

الثلاثاء، 5 أغسطس 2014

Fast Similarity Search for Protein and DNA Sequences

ليست هناك تعليقات:

إرسال تعليق