Anne Thessen

CV photoMy vision is of a future when data sharing and reuse are a normal part of the research workflow. This will enable a new type of data-centric science that focuses on holistic analysis of data to answer large-scale questions. My place in this vision is as a leader within this new, interdisciplinary genre with a focus on applying data-driven methods to ecological systems. I wish to continue and expand my current participation in big-data projects while concurrently maintaining an ecology-centric research program that can benefit from those projects. Gradually, these two lines of study will merge as these cyberinfrastructure projects mature and can provide advanced services.

Research Momentum My research trajectory is based on using innovative computing techniques to study emergent properties of complex systems in biology. I am driven by the belief that large amounts of knowledge are hiding in existing data sets. Recent, fascinating developments in machine learning and semantics can establish new connections to which we were previously blind. My current pursuits involve bringing biology to the semantic web. The major bottleneck is transforming our existing knowledge into a semantic framework in such a way that it can evolve over time, give attribution and establish trust. The real challenge to adoption of new technology is affecting the cultural change among the practitioners of science. My emerging computing skills complement my formal training in biological oceanography to both create and advocate for the development of appropriate tools and services to generate a more data-centric research practice within the biology community. The best way to advocate for data-driven discovery is to apply these methods in my own research and publish the success.

I base my research on the holistic approach to studying marine ecosystems in order to reveal system properties. Small changes in the gene sequences or cell physiology can result in big changes in how ecosystems work and thus the goods and services they provide. My research is not solely genome focused, but requires integration of “omics” data with environmental, ecological and physiological data to more comprehensively explain an ecosystem. Thus far, I have had to use mostly manual, brute-force methods for studies that require integration of interdisciplinary data sets because current tools and systems either do not exist or are lacking essential functionality. I believe that science of the future will rely less on single-investigator studies using discreet data silos and more on interdisciplinary collaborations using distributed data.

Current Research Activities I am currently pursuing funding to study the marine environment near Kotzebue, AK which is being impacted by climate change. We plan to model and discover emergent properties of the microbial community using a combination of high-throughput molecular techniques, in situ experimentation, lab experimentation and field monitoring. Understanding this highly complicated and dynamic system in the context of climate change will require a data-centric approach that combines computational and empirical methods. It is my hope and expectation that I will establish a long-term study site in Kotzebue, AK.

While practicing biology, I have become involved in several data-centered infrastructure projects including the Encyclopedia of Life, the International Census of Marine Microbes, the Data Conservancy and the NSF EarthCube project. I am a part of the Semantics and Ontologies Group and the Integration of Biology and Earth Sciences Data Group within EarthCube. I currently have funding (in collaboration with a semantic web specialist) to extract text from the Encyclopedia of Life, perform Natural Language Processing on that text, assign URIs to terms and create triples describing species associations.

Learn more at

Contact Anne at anne [dot] thessen /at/ ronininstitute {dot} org