Alignment-based approaches generally give excellent results when the sequences under study are closely related and can be reliably aligned, but when the sequences are divergent, a reliable alignment cannot be obtained and hence the applications of sequence alignment are limited. The pioneering approaches for sequence analysis were based on sequence alignment either global or local, pairwise or multiple sequence alignment. Since the origin of bioinformatics, sequence analysis has remained the major area of research with wide range of applications in database searching, genome annotation, comparative genomics, molecular phylogeny and gene prediction. Among them sequence data is increasing at the exponential rate due to advent of next-generation sequencing technologies. Molecular sequence and structure data of DNA, RNA, and proteins, gene expression profiles or microarray data, metabolic pathway data are some of the major types of data being analysed in bioinformatics. The emergence and need for the analysis of different types of data generated through biological research has given rise to the field of bioinformatics. Moreover, the 'assembly-type' software does not return anything that looks like a multiple alignment.In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches. Moreover, our sequences are not exactly genome-size but typically 1-10 kB pieces of genomic and mRNA sequence. Thus, I had a look at things like phred/phrap (which is much too expensive for us) or bwa (which uses lots of funny terminology like 'color space', which is beyond my horizon). It din't even help to add a reference sequence to the alignment (note that the reference is not necessarily from the same species, so there are some mismatches but clearly enough similarity to guide the alignment process)Īs this problem resembles the 'assembly problem' common to the sequencing community (of which I am not a member). In this case, the alignment programs (treating endgaps like normal gaps) decided to find some non-existent 'similarity' between the DNAs and aligned them this way rather than providing the correct alignment with enormous end-gaps. There are several alignment scenarios that need to be covered, one of them being DNA sequences with a relatively modest overlap. However, this did not work well, since I don't see a way of tweaking the match/gap parameters the way we need them. I am a protein person, so I had a look at Muscle and Mafft, which both can handle DNA sequences as well. What we need is a program for multiple-alignment of DNA sequences. Again a case where I feel unable to help a colleague of mine, but I am sure that somebody here has an easy answer.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |