|
|
||||||||
1 Department of Medicine, 2 Department of Ecology and Evolution, 3 Ben May Institute for Cancer Research, and 4 Department of Computer Science, University of Chicago, Chicago, Illinois 60637, USA
5 Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China
6 Department of Biological Sciences, Purdue University Calumet, Hammond, Indiana 46323, USA
7 Center for Functional Genomics, ENH Research Institute, Northwestern University, Evanston, Illinois 60201, USA
Reprint requests to: San Ming Wang, Center for Functional Genomics, ENH Research Institute, Northwestern University, Evanston, IL 60201, USA; e-mail: swang1{at}northwestern.edu; fax: (224) 364-5003.
| ABSTRACT |
|---|
|
|
|---|
Keywords: low abundant; transcript; SAGE; EST; Drosophila; genome
| INTRODUCTION |
|---|
|
|
|---|
Low-abundant transcripts may also be a driving force in the evolutionary process (Alvarez 2001
). Answers to some fundamental biological questions may emerge from the systematic study of low-abundant transcripts. Low-abundant transcripts must be isolated before their biological roles can be determined. Despite intensive efforts on transcript identification, however, little is known about the prevalence of low-abundant transcripts, primarily because of isolation difficulties due to their small mass and high heterogeneity (Bishop et al. 1974
; Holland 2002
; Czechowski et al. 2004
).
SAGE is a method for genome-level transcript analysis (Velculescu et al. 1995
). Through isolating a short tag from a transcript and concatemerizing multiple tags for a single sequencing reaction, SAGE provides high sensitivity for transcript detection. SAGE can detect both known and novel transcripts, and provides quantitative information about the detected transcripts (Zhou et al. 2001
; Saha et al. 2002
). Drosophila is a well-established eukaryotic animal model. The Drosophila genome has been well sequenced and annotated, and its transcriptome has been extensively characterized by the large-scale EST approach (Adams et al. 2000
; Rubin et al. 2000
; Stapleton et al. 2002
). Compared with higher eukaryotic genomes, the smaller size of the Drosophila genome enables Drosophila SAGE tags to represent their original transcripts and to map in the Drosophila genome with high specificity (Jasper et al. 2001
, 2002
; Fujii and Amrein 2002
; Pleasance et al. 2003
).
Taking advantage of the vast information known about the Drosophila genome and the high sensitivity of the SAGE technique for transcript detection, we performed a thorough Drosophila transcriptome analysis using the SAGE method to investigate the prevalence of low-abundant transcripts. We expected to detect low-abundant transcripts as long as a significant quantity of low-abundant transcripts exists and a sufficient number of SAGE tags could be collected. Here we report the results from this study.
| RESULTS |
|---|
|
|
|---|
|
|
|
Taken together, the results from these two comparisons show that over half of the SAGE tags do not match the known Drosophila transcripts.
Verification of the origins of unmatched SAGE tags
We performed four types of experiments to verify the origins of the unmatched SAGE tags.
|
|
|
|
Location of novel SAGE tags in the Drosophila genome
To investigate the correlation between the novel transcripts detected from novel SAGE tags and known genes, we mapped the novel SAGE tags in the Drosophila genome. To provide high mapping specificity, we focused on the 18,913 unique SAGE tags that map only to a single locus in the Drosophila genome. These tags include 7,106 matched SAGE tags and 11,807 novel SAGE tags. Of the 7,106 matched SAGE tags, 88% are located in the annotated exons, and 12% are mapped in the unannotated loci. In contrast, of the 11,807 novel SAGE tags, only 1% are mapped within the annotated exons, while 99% are mapped in the unannotated loci (Table 4A
). Further analysis revealed that 48% of those mapped in the unannotated loci are located in the intergenic regions, 16% are located in the intragenic regions (most of which are intronic), and 36% are antisense of the intragenic regions (two-thirds of which are exonic). Since some annotated genes may end at a translational stop codon without 3' UTR sequences, we further mapped the 11,807 novel SAGE tags to the genomic sequences 1,000 bp downstream of the annotated genes. Only 7.5% of these novel tags mapped within the region. The 11,807 novel SAGE tags were rather uniformly distributed among different chromosomes (Table 4B
), although many tags mapped in particular chromosomes tend to be clustered. In conclusion, most novel transcripts detected by novel SAGE tags were expressed outside the annotated exons or genes, or were antisense of annotated exons or genes in the Drosophila genome.
|
|
| DISCUSSION |
|---|
|
|
|---|
We consider it unlikely that the following possibilities could be the major source of novel SAGE tags detected in this study.
Identification of low-abundant transcripts is more difficult than identification of high-abundant transcripts due to the redundancy and complexity of transcripts. In the last decade, new technologies with increased sensitivity, such as EST, subtraction/normalization EST, SAGE, and MPSS, have been developed and applied in transcriptome studies (Adams et al. 1992
; Velculescu et al. 1995
; Bonaldo et al. 1996
; Brenner et al. 2000
). When techniques with higher sensitivity are used, greater numbers of less abundant novel transcripts are identified. However, identification of all low-abundant transcripts remains a challenge, as evidenced by our current study. Although mathematic calculations can be used to estimate the scope of transcript collection for identification of full-set transcripts (Stern et al. 2003
; Reverter et al. 2005
), the final answer will likely come from experimental data showing that few or no novel transcripts could be identified. Thus far, this stage of transcript identification has not been achieved in most of the genomes studied (Kapranov et al. 2002
; Okazaki et al. 2002
; Seki et al. 2002
; Bertone et al. 2004
; Imanishi et al. 2004
; Schadt et al. 2004
; Scheetz et al. 2004
).
The genomic locations of the novel SAGE tags are interesting. Among the novel SAGE tags, nearly half are located intergenically, implying that more novel transcribed regions than current annotated ones are present in the Drosophila genome. Using a tilling array technique, a recent study also detected transcriptional activities in 41% of the intergenic region and 43% of the intronic region in the Drosophila genome (Stolc et al. 2004
). Furthermore, a third of the novel SAGE tags are antisense transcripts of the annotated genes, most of which are located in the known exons. The wide presence of antisense novel transcripts for known genes revealed in this study supports the concept that antisense transcript is one of the major means for gene expression regulation (Yelin et al. 2003
).
In conclusion, our study demonstrates the presence of a large quantity of low-abundant transcripts in Drosophila, which may also occur in other species (Bertone et al. 2004
). Systematic identification of low-abundant transcripts in model species is an important step toward the elucidation of the biological roles of low-abundant transcripts.
| MATERIALS AND METHODS |
|---|
|
|
|---|
radiation), larvae (second and third day), pupae (third day), adults (10 males and 10 females, up to 10 d), and testis from testicular tissue of adult males. Four SAGE libraries were constructed: (1) pooled sample that included an equal amount of total RNA from embryo, larvae, pupae, and young and aged adults; (2) embryo; (3) irradiated embryo; and (4) testis. SAGE libraries were constructed following the procedures (Lee et al. 2001
Construction of SAGE reference databases
The Drosophila genome sequences used were Drosophila Release 3.1 (http://www.fruitfly.org/cgi-bin/seq_tools/fasta_download.cgi). The genomic tag reference database was generated by extracting 10-base tags from all CATG sites in the genomic sequences, including the tags from the sense strand immediately adjacent to the CATGs and the tags from the antisense before CATG with reverse/complementary sequences. The SAGE tag reference database from the physically isolated known transcripts was constructed by extracting 10 bases adjacent to CATGs in the full-length cDNA, 3' ESTs, and 5' ESTs (UniGene Drosophila melanogaster database release 17, http://www.ncbi.nlm.nih.gov/). Of these sequences, 94% of mRNA, 78% of 3' ESTs, and 80% of 5' ESTs contain CATG and are therefore detectable by SAGE. The SAGE tag reference database from the Drosophila annotated transcripts was constructed by extracting 10 bases adjacent to all CATGs in each annotated "transcript" sequence in Release 3.1 (http://flybase.net/annot/download_sequences.html).
Conversion of SAGE tags into 3' cDNAs
A set of unmatched SAGE tags was randomly selected from the total unmatched SAGE tag list and converted into 3' cDNAs using the GLGI method (generation of longer 3' cDNA from SAGE tags for gene identification) (Chen et al. 2002
; Supplementary Table 5 [http://www.biochem.northwestern.edu/ibis/faculty/smwang.htm]). The 3' cDNA sequences were deposited in GenBank with accession numbers CB305186CB305318.
RT-PCR confirmation of novel transcripts detected by novel SAGE tags
RT-PCR was used to confirm novel transcripts detected by novel SAGE tags. Sense primers and antisense primers were designed based on the 3' cDNAs converted from novel SAGE tags. Total RNA samples from embryonic, early larval, later larval, pupal, male and female adult, and pooled tissues were used as the templates for the analysis (Supplementary Table 6 [http://www.biochem.northwestern.edu/ibis/faculty/smwang.htm]). RNase A-treated RNA samples were used as negative control.
Northern blot confirmation of novel transcripts detected by novel SAGE tags
Northern blot was used to confirm novel transcripts detected by novel SAGE tags. RNA samples from whole adults were used for the detection. The 3' cDNAs converted from novel SAGE tags were used as probes. Probe labeling, hybridization, and signal detection were performed using the Bright Star Bio-Detection system (Ambion) following the protocol.
RT-PCR detection of novel transcripts expressed from novel SAGE tag-mapped, unannotated genomic regions
Genomic segments mapped by novel SAGE tags were used for the test. Each segment starts at the novel SAGE tag-mapped location and moves downstream to the polyA signal sequences AATAAA or ATTAAA. Sense primers were designed based on the mapped novel SAGE tag; antisense primers were designed based on the genomic sequences upstream of AATAAA or ATTAAA (Supplementary Table 8 [http://www.biochem.northwestern.edu/ibis/faculty/smwang.htm]). The pooled total RNA samples were used as the templates for the detection. RNase A-treated RNA samples were used as control for monitoring genomic DNA contamination.
| ACKNOWLEDGMENTS |
|---|
| Footnotes |
|---|
Received November 17, 2004; accepted February 23, 2005.
| REFERENCES |
|---|
|
|
|---|
Adams, M.D., Dubnick, M., Kerlavage, A.R., Moreno, R., Kelley, J.M., Utterback, T.R., Nagle, J.W., Fields, C., and Venter, J.C. 1992. Sequence identification of 2,375 human brain genes. Nature 355: 632634.[CrossRef][Medline]
Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al. 2000. The genome sequence of Drosophila melanogaster. Science 287: 21852195.
Alvarez, L.H. 2001. Does increased stochasticity speed up extinction? J. Math. Biol. 43: 534544.[Medline]
Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta M., Weissman, S., et al. 2004. Global identification of human transcribed sequences with genome tiling arrays. Science 306: 22422246.
Bishop, J.O., Morton, J.G., Rosbach, M., and Richardson, M. 1974. Three abundance classes in HeLa cell messenger RNA. Nature 250: 199204.[CrossRef][Medline]
Blake, W.J., Kaern, M., Cantor, C.R., and Collins, J.J. 2003. Noise in eukaryotic gene expression. Nature 422: 633637.[CrossRef][Medline]
Bonaldo, M.F., Lennon, G., and Soares, M.B. 1996. Normalization and subtraction: Two approaches to facilitate gene discovery. Genome Res. 6: 791806.
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S., McCurdy, S., Foy, M., Ewan, M., et al. 2000. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18: 630634.[CrossRef][Medline]
Chen, J., Lee, S., Zhou, G., and Wang, S.M. 2002. High-throughput GLGI procedure for converting a large number of serial analysis of gene expression tag sequences into 3' complementary DNAs. Genes Chromosomes Cancer 33: 252261.[CrossRef][Medline]
Czechowski, T., Bari, R.P., Stitt, M., Scheible, W.R., and Udvardi, M.K. 2004. Real-time RT-PCR profiling of over 1400 Arabidopsis transcription factors: Unprecedented sensitivity reveals novel rootand shoot-specific genes. Plant J. 38: 366379.[CrossRef][Medline]
Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. 2002. Stochastic gene expression in a single cell. Science 297: 11831186.
Fujii, S. and Amrein, H. 2002. Genes expressed in the Drosophila head reveal a role for fat cells in sex-specific physiology. EMBO J. 21: 53535363.[CrossRef][Medline]
Holland, M.J. 2002. Transcript abundance in yeast varies over six orders of magnitude. J. Biol. Chem. 277: 1436314366.
Imanishi, T., Itoh, T., Suzuki, Y., ODonovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M., et al. 2004. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2: 856875.
Jasper, H., Benes, V., Schwager, C., Sauer, S., Clauder-Munster, S., Ansorge, W., and Bohmann, D. 2001. The genomic response of the Drosophila embryo to JNK signaling. Dev. Cell 1: 579586.[CrossRef][Medline]
Jasper, H., Benes, V., Atzberger, A., Sauer, S., Ansorge, W., and Bohmann, D. 2002. A genomic switch at the transition from cell proliferation to terminal differentiation in the Drosophila eye. Dev. Cell 3: 511521.[CrossRef][Medline]
Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P., and Gingeras, T.R. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916919.
Kuznetsov, V.A., Knott, G.D., and Bonner, R.F. 2002. General statistics of stochastic process of gene expression in eukaryotic cells. Genetics 161: 13211332.
Lee, S., Chen, J., Zhou, G., and Wang, S.M. 2001. Generation of highquantity and quality tag/ditag cDNAs for SAGE analysis. Biotechniques 31: 348354.[Medline]
Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H., et al. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563573.[CrossRef][Medline]
Ozbudak, E.M., Thattai, M., Kurtser, I., Grossman, A.D., and van Oudenaarden, A. 2002. Regulation of noise in the expression of a single gene. Nat. Genet. 31: 6973.[CrossRef][Medline]
Paulsson, J. 2004. Summing up the noise in gene networks. Nature 427: 415418.[CrossRef][Medline]
Pleasance, E.D., Marra, M.A., and Jones, S.J. 2003. Assessment of SAGE in transcript identification. Genome Res. 13: 12031215.
Reanney, D.C., MacPhee, D.G., and Pressing, J. 1983. Intrinsic noise and the design of the genetic machinery. Aust. J. Biol. Sci. 36: 7790.[Medline]
Reverter, A., McWilliam, S.M., Barris, W., and Dalrymple, B.P. 2005. A rapid method for computationally inferring transcriptome coverage and microarray sensitivity. Bioinformatics 21: 8089.
Rubin, G.M., Hong, L., Brokstein, P., Evans-Holm, M., Frise, E., Stapleton, M., and Harvey, D.A. 2000. A Drosophila complementary DNA resource. Science 287: 22222224.
Saha, S., Sparks, A.B., Rago, C., Akmaev, V., Wang, C.J., Vogelstein, B., Kinzler, K.W., and Velculescu, V.E. 2002. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20: 508512.[CrossRef][Medline]
Schadt, E.E., Edwards, S.W., GuhaThakurta, D., Holder, D., Ying, L., Svetnik, V., Leonardson, A., Hart, K.W., Russell, A., Li, G., et al. 2004. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol. 5: R73.[CrossRef][Medline]
Scheetz, T.E., Laffin, J.J., Berger, B., Holte, S., Baumes, S.A., Brown, R., Chang, S., Coco, J., Conklin, J., Crouch, K., et al. 2004. Highthroughput gene discovery in the rat. Genome Res. 14: 733741.
Seki, M., Narusaka, M., Kamiya, A., Ishida, J., Satou, M., Sakurai, T., Nakajima, M., Enju, A., Akiyama, K., Oono, Y., et al. 2002. Functional annotation of a full-length Arabidopsis cDNA collection. Science 296: 141145.
Stapleton, M., Liao, G., Brokstein, P., Hong, L., Carninci, P., Shiraki, T., Hayashizaki, Y., Champe, M., Pacleb, J., Wan, K., et al. 2002. The Drosophila gene collection: Identification of putative full-length cDNAs for 70% of D. melanogaster genes. Genome Res. 12: 12941300.
Stern, M.D., Anisimov, S.V., and Boheler, K.R. 2003. Can transcriptome size be estimated from SAGE catalogs? Bioinformatics 19: 443448.
Stolc, V., Gauhar, Z., Mason, C., Halasz, G., van Batenburg, M.F., Rifkin, S.A., Hua, S., Herreman, T., Tongprasit, W., Barbano, P.E., et al. 2004. A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306: 655660.
Sun, M., Zhou, G., Lee, S., Chen, J., Shi, R.Z., and Wang, S.M. 2004. SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics 5: 14.[CrossRef][Medline]
Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial analysis of gene expression. Science 270: 484487.
Wang, S.M., Fears, S.C., Zhang, L., Chen, J.J., and Rowley, J.D. 2000. Screening poly(dA/dT)-cDNAs for gene identification. Proc. Natl. Acad. Sci. 97: 41624167.
Yelin, R., Dahary, D., Sorek, R., Levanon, E.Y., Goldstein, O., Shoshan, A., Diber, A., Biton, S., Tamir, Y., Khosravi, R., et al. 2003. Widespread occurrence of antisense transcription in the human genome. Nat. Biotechnol. 21: 379386.[CrossRef][Medline]
Zhou, G., Chen, J., Lee, S., Clark, T., Rowley, J.D., and Wang, S.M. 2001. The pattern of gene expression in human CD34+ hematopoietic stem/progenitor cells. Proc. Natl. Acad. Sci. 98: 1396613971.![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
L. Bianchetti, Y. Wu, E. Guerin, F. Plewniak, and O. Poch SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages Nucleic Acids Res., September 25, 2007; 35(18): e122 - e122. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Swiezewski, P. Crevillen, F. Liu, J. R. Ecker, A. Jerzmanowski, and C. Dean Small RNA-mediated chromatin silencing directed to the 3' region of the Arabidopsis gene encoding the developmental regulator, FLC PNAS, February 27, 2007; 104(9): 3633 - 3638. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Ge, Q. Wu, Y.-C. Jung, J. Chen, and S. M. Wang A large quantity of novel human antisense transcripts detected by LongSAGE Bioinformatics, October 15, 2006; 22(20): 2475 - 2479. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Metta, R. Gudavalli, J.-M. Gibert, and C. Schlotterer No Accelerated Rate of Protein Evolution in Male-Biased Drosophila pseudoobscura Genes Genetics, September 1, 2006; 174(1): 411 - 420. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |