I am interested in understanding gene regulation via computational analysis of DNA sequences. In particular I have studied regions upstream of the transcription start site of human genes, that regulate transcription. I have carried out large-scale analysis on all known genes in the human genome in order to extract patterns, and non-random sequence characteristics for these regions. I am also interested in intronic sequences, and their role in gene regulation. My research is highly exploratory and creative. I address key issues surrounding the central dogma of biology. These have profound, and far reaching applications for our understanding of cell biology, with many practical applications, including treatment of human diseases, such as cancer.
Background Information: Genes control and determine growth and development of each cell in the human body. There are many very different cell types, from those of the eyes, to brain, and skin etc… Yet each individual cell contains a complete copy of the genome. So the same complement of genes code for different cells, with different functions. There must be a program that controls which genes are expressed in each cell, when, and at what levels. This program is contained in the regulatory regions of the genome, and is “coded
for” within these non-coding sequences.
The genetic code is known, and protein coding regions and their general function are well understood. However, the coding sequences comprise less than 5% of the total genome, and non-coding regions comprise a massive proportion of the genome. It is the non-coding regions that are relatively unknown, as are their codes and functioning rules.
Non-coding regulatory sequences of genes contain small motifs that are
recognised by, and bind regulatory proteins. These motifs and their surrounding sequences are essential for gene regulation. They decide when and where a particular gene is expressed. However, the code (or codes) for gene regulation still eludes us.
Results so Far: During my research, I have made use of the inherent connection between sequence, structure and function, and of the principle that it is possible to derive biological meaning purely from sequence data.
My most significant accomplishment was the discovery that DNA sequence patterns, and signatures from non-coding regulatory regions (upstream of human genes) are very different to the protein coding regions. I found that they contain statistically significant purine-pyrimidine patterns. These are not present in protein coding sequences.
I have compared DNA sequences to random models, and found that the purine/pyrimidine property of the sequence becomes less random from the intergenic sequence, towards the transcription start site. This feature can be attributed to the importance of purines and pyrimidines in determining DNA structure, since this the purine/pyrimidine sequence determines relative stiffness, verses flexibility, and curvature of the DNA. This also points towards a potential structural code within regulatory DNA. Also, I have found that transcription factor binding motifs have a highly non-random presence within only certain locations of the non-coding regions.
Future Work: Since it is possible to derive patterns from coding sequences that are consistent with the genetic code, it should in theory be possible to do the same with non-coding sequences. It is likely that there is a code for gene regulation, albeit cryptic and complex. This information is embedded within DNA.
My future goal is to decipher these codes for gene regulation. I will do this by discovering patterns, and nonrandom characteristics within non-protein coding regions of the genome. Ultimately, this will lead among other things, to protein-DNA recognition patterns, and the extraction of a complex code. This includes binding recognition and its structural characteristics. Whilst this it is not straight forward, the answer lies within the DNA sequence, and computational code breaking techniques will be utilised to yield results.
E-mail Dana at email@example.com