Semiautomated improvement of RNA alignments

  1. Ebbe S. Andersen1,
  2. Allan Lind-Thomsen2,7,
  3. Bjarne Knudsen3,
  4. Susie E. Kristensen2,
  5. Jakob H. Havgaard2,
  6. Elfar Torarinsson2,4,
  7. Niels Larsen5,
  8. Christian Zwieb6,
  9. Peter Sestoft4,
  10. Jørgen Kjems1, and
  11. Jan Gorodkin2
  1. 1Department of Molecular Biology, University of Aarhus, DK-8000 Århus C, Denmark
  2. 2Division of Genetics and Bioinformatics, IBHV, and Center for Bioinformatics, University of Copenhagen, DK-1870 Frederiksberg C, Denmark
  3. 3CLC bio A/S, DK-8000 Århus C, Denmark
  4. 4Center for Bioinformatics and Department of Natural Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
  5. 5Danish Genome Institute, DK-8000 Århus C, Denmark
  6. 6Department of Molecular Biology, The University of Texas Health Science Center at Tyler, Tyler, Texas 75708-3154, USA

Abstract

We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture of the SARSE editor makes it a flexible tool to improve all RNA alignments with relatively little human intervention. Online documentation and software are available at http://sarse.ku.dk.

Keywords

Footnotes

  • 7 Present address: Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and Molecular Medicine, The Panum Institute, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark.

  • Reprint requests to: Jan Gorodkin, Division of Genetics and Bioinformatics, IBHV, and Center for Bioinformatics, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark; e-mail: gorodkin{at}genome.ku.dk; fax: 45 3528 3042.

  • Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.215407.

    • Received July 11, 2007.
    • Accepted August 2, 2007.
« Previous | Next Article »Table of Contents