Author : Mohamed Gamal Mohamed Badr
Degree : M.Sc. Computer
Title:
Fast Similarity
Search for Protein and DNA Sequences
Abstract
Protein function prediction is a
fundamental task in computational biology and has many practical applications.
This task plays a critical role in the process of drug design. This process
includes detecting a target protein based on its function, and then this
protein’s function is to be moderated or blocked. Advances in genome sequencing
technology resulted in a large growth of the size of proteins’ sequences
databases. A significant portion of these databases’ protein sequences still
haven’t their functions explored. A number of methods have been developed for
protein function prediction. Manual analysis techniques usually provide high
accuracy for predicting protein function. However the huge amount of sequence
data has made manual analysis tedious and cumbersome. Hence, a number of
computational methods have been developed for predicting protein function. These
computational methods usually depend on different sources of information. These
sources of information include protein homology, protein interaction network
analysis, gene expression analysis and literature’s text mining. The most
prevalent methods used for protein homology detection are those based on
protein homology. The idea behind these methods is that given a newly sequenced
protein (a query), we search a database of well characterized proteins
(proteins with their function and other information recorded) and retrieve
database proteins homologous to this newly sequenced protein. Homology is
usually inferred via protein sequence similarity. Hence homologous proteins are
detected by scoring similarity of query with database sequences. After
detecting homologous proteins, functional information is transferred from
database to query sequence based on level of homology. An important challenge
is detecting homologies in cases of low pairwise similarity; this problem is
called remote homology detection. Many methods have been developed for solving
this problem. Profile based method are usually used for remote homology
detection. In this type of methods a profile is created for the query and this
profile is scored against database sequences. An extension to profile based
methods is profile-profile methods in which a profile is createdfor the query
and clusters of closely related sequences in the database, then these profiles
are compared. HHsearch: a remote protein homology detection based on
comparing two profile hidden Markov
models (HMMs) achieves relatively higher sensitivity than other remote homology
detection in the literature. However, Hlisearch used dynamic programming
algorithm for comparing two HMMs, hence HHsearch is a computationally intensive
method. To solve this problem, we have developed SHsearch as a faster
alternative for HHsearch that significantly reduces computational time with a
minimal sensitivity loss. SHsearch focuses on comparing the most important
sub-models instead of comparing the complete two models as in HHsearch. The
results show a speedup of 88X for SHsearch relative to HHsearch with 8.2
sensitivity loss at error rate of 10, which deemed to be acceptable.
ليست هناك تعليقات:
إرسال تعليق