Research overview


This summary is usually badly out-of-date simply because I forget to update it. The publications list is often better. As time goes on, older research topics become historical reviews. My main research interests are systems biology, voting technology, and formal verification. Most of my past research work was on various aspects of formal verification, ranging from theory to software tool design. In the last five years, I have been reducing my focus on that area, and increasing the focus on systems biology and voting technology.

Systems biology

My perspective on systems biology is that biological systems have to solve many of the same problems as computational systems, ranging from circuits to algorithms. For example, cells have to execute complex procedures and make robust decisions in the face of a variety of environments, some of which are actively hostile, in the presence of large amounts of noise. My hope is to exploit the understanding of principles, mathematical tools, and software tools that have been developed over the last six decades to understand how cells might be able to do these things. The ultimate goal is to add a higher layer of abstraction that can be used to understand biological systems, just as high-level abstractions (such as Boolean algebra and automata and complexity theory) help us to understand digital systems. Currently, my group's in systems biology can be subdivided into two categories, which I hope will eventually converge: model analysis and data analysis . The unifying theme of these research is the use of Boolean models, focusing on values that can be considered as have two qualitative values (true and false, low or high, inactive or active, etc.) (In same cases, it may be appropriate to use a small number of discrete values, but more than two values. That can be done without major changes to most theories and methods, so I'll use the word "Boolean" to encompass such methods, too.)

Why Boolean models?

Boolean models have advantages and disadvantages. The obvious disadvantage is that some accuracy is lost. If a phenomenon depends crucially on the continuous nature of sub-system, then Boolean models are not appropriate. However, there are good reasons to believe that Boolean models are highly applicable to biology. For example, many biological phenomena are ultimately Boolean. A cell is either alive or dead. Cells often have to make binary decisions about their phenotypes and process (does the cell commit to apoptosis or not?). Most importantly, biologists often speak in Boolean terms. From a circuits point of view, the only way we know how to do reliable computation with many unreliable components is to use digital logic. Although cells in higher organism do make errors (e.g., become cancerous), there are trillions of them in large animals that live for many decades, so the cells must be highly reliable. Why there is so much more digital than analog computation in the world today? The basic circuit elements (logic gates, flip-flops, etc.) are designed to avoid propagating incremental errors through multiple stages of logic. Thus, computers can literally do quadrillions of complex calculations without making a single observable error, something that would be impossible to achieve with analog computation. Of course, this idea is attractive to computer scientists and electrical engineers because there is a vast collection of intellectual and software tools for understanding digital systems. If we could understand cellular processes as Boolean systems, we could avoid getting bogged down in the details of reaction rates, etc. and understand the essence of how the system functions. We could also understand the behavior of the systems at a much larger scale.

Model analysis

This assumes that there exists a functional model, similar to a circuit, that defines the possible temporal behaviors of the system, possibly in the presence of a changing environment. A conventional way to do this is to describe the system with a set of ordinary differential equations (ODEs), and then numerically integrate to compute the behavior of the system. However, instead of ODEs we use Boolean functional models . One advantage of Boolean models is that there are analysis techniques that can explore all possible behaviors of a system (multiple possible behaviors may arise because of variation in noise, variation in reaction rates, or variations in environmental conditions).

Pathway logic viewer

My first work along this lines was as part of the Pathway Logic project at SRI, International, where I spent a sabbatical from mid-2002 to mid-2003. At that time, the SRI Researchers were building a large model of the EGF (epidermal growth factor) pathway in Maude, a programming language based on rewriting logic. One of the advantages of Maude for this application was that it had a built-in model checker that could, in theory, answer queries such as: "Is it possible for the cell to reach a state where proteins A and B are activated at the same time?" Unfortunately, the network had grown to the point where answering such took many hours, if indeed they completed at all.

Fortunately, the Maude representation bore a close resemblance to a safe Petri net , and there are heuristics for solving the "reachability problem" mentioned above that were very fast in practice. The same computation that took Maude hours could be completed in a fraction of a second, making it possible to explore the EGF network interactively. Furthermore, Petri nets have a nice graphical representation that could be used to display how it was possible to reach a state where proteins A and B are activated at the same time. A paper describing this system was published in 2005. Another paper , examining more theoretical issues, was published in 2006. There is an online demo of an old version of this system. A new version, which has been extensively improved by SRI, can be downloaded from SRI.

Model checking the Caulobacter Crescentus cell cycle

"Robustness" is an important topic in systems biology. Biological processes are selected to perform their function in spite of all kinds of noise, failures, and disruptions. One kind of robustness that has not been extensively explored is "timing robustness" - can a cell maintain its phenotype when delays in sub-processes are altered?

The process of cell division (the cell cycle) has many steps, some of which execute in parallel, that must be precisely coordinated. But the timing of reactions in a cell cannot be precise, because of signal noise and environmental factors. So the cell cycle has engineering constraints similar to those of asynchronous circuits, which must function reliably despite uncertainty in signal delays and in the environment. In other words, evolution might design robust asynchronous circuits.

Based on this reasoning, we developed a Boolean model of the key regulators of the Caulobacter cell cycle. Concentrations of proteins were modeled with Boolean values, and combinations of reactions were modeled as Boolean gates with arbitrary output delays. Physical processes such as cytokinesis were modeled as finite-state machines.

We wrote a specification in CTL (a temporal logic) that said that the resulting model progressed through the major milestones of the cell cycle in the correct order, forever, and used the model checker NuSMV to determine whether the specification was met for all possible delays.

The result was that the specification was met in almost all cases. There were only two "hazards", where inopportune timing of reactions could result in a failure of the specification (in both cases, the cell cycle would get "stuck"). In each case, it was possible to add a plausible constraint to the model that one reaction always occured before another. The modified model met the specification.

A paper describing this result has been accepted by "Proceedings of the National Academy of Sciences" and will be published soon.

Data analysis

Another component of my research is using Boolean concepts in data analysis. Boolean data analysis is attractive because it can scale up to relatively large systems, and, in some case, it describes the data in ways that are easily understood.

Analyzing time courses

We devised an algorithm called "StepMiner" for finding abrupt changes in time course data between two distinct levels. For example, suppose an experimentor perturbs a tissue and collects gene expression data, using microarrays, for thousands of genes at several time points before and after the perturbation. How do we analyze this data? One of the simplest questions might be: "What genes go up or down and when do they change?" That is what StepMiner answers.

StepMiner works by fitting a step function to the pattern of expression of each individual gene. The algorithm tries each step position (between consecutive time points), and sets the low and high levels of the step function to minimize the sum-of-squares error from the actual data. The best result is assigned a p-value using an F-statistic.

In longer time courses, StepMiner can also find Boolean signals with two steps, where the second step returns to the original level before the first step.

A paper on StepMiner was published in 2007. Free binaries, source code, and examples are also available.

StepMiner has been used in two other studies at Stanford. In one, we analyzed a gene expression microarray time course of the response of osteosarcoma tumor cells to inactivation and reactivation of an additional MYC oncogene from transgenic mice. To be continued...

Formal verification of hardware and protocols

This work encompasses quite a variety of approaches and application domains.

Murphi

(the "phi" should be the Greek letter)

Murphi is a description language and automatic formal verification system for the high-level description of finite-state systems, including many protocols and distributed algorithms. We have used the system both as a verification tool and as a platform for studying state reduction methods. Several new methods have been discovered and implemented in Murphi.

Bus Protocol (PCI) Specification

We have formally specified and verified the core subset of the PCI bus protocol. This website contains a downloadable specification, information on how the specification was written and tested, and a brief desription of the bugs discovered in the official PCI speicfication through our methodology.

Predicate Abstraction

We have applied Predicate Abstraction to a few interesting problems. The web page contains a description of the protocols that we have verified.

BDD-based verification of high-level designs

Another area of research is using BDDs for verification of protocols described in higher-level programming notation. The main ideas are how to translate the high-level notation into BDDs, and how to eliminate redundant information in the BDD representation for applications where they do not otherwise perform well.

BDD based verification

Recently, we have resumed investigation of using BDDs to verify designs in synthesizable HDL (e.g. Verilog or VHDL). In order to contain state explosion problems, we have been looking at new methods to compute reasonably accurate overapproximations (superset) of the reachable state space. The method represents the state space using a collection of BDDs which constrain non-disjoint subsets of the state variables of the system (we call these "overlapping projections"). The results seem to be significantly better than previous approximation schemes, which used required the use of disjoint subsets of the state variables. Allowing even a few overlap bits makes it possible to capture some of the communication and interactions between state machines. Draft to appear here shortly.

SVC: A Decision procedure for quantifier-free first-order logic

SVC is an automatic theorem-prover for a decidable fragment of first-order logic which excludes quantifiers, but includes equality, uninterpreted functions and constants, arrays, records, and bit-vectors, as well as propositional connectives. SVC is a general-purpose tool that is a core component of much of our verification research. In particular, it is a central piece of our micro-processor verification technology. More recently, we have been using it for checking specifications of safety-critical systems. We have also experimented with SVC as an alternative ground decision procedure in the PVS theorem-prover, with promising results.

More details, including an on-line demonstration, are available here.

Symbolic simulation and quantifier-free first-order logic

We have been formally verifying high-level processor descriptions using a symbolic simulator and a decision procedure for a quantifier-free fragment of first-order logic with equality and uninterpreted functions.

Verification of transaction-based protocols using PVS

Certain protocols are designed to simulate a sequence of atomic transactions. However, because the implementation is distributed, it breaks up each transaction into a sequence of smaller steps. The steps of a single transaction may be executed by different nodes of the distributed system. We have developed a general method, which we call aggregation, for using an automated theorem-prover to verify such transaction-oriented protocols.


Checking specification of safety critical systems

coming soon


Automatic formal verification of hard real-time systems

We're not working actively in this area at the moment. From 1989 to 1994, I worked with Rajeev Alur, Costas Courcoubetis, Nicholas Halbwachs, and Howard Wong-Toi on extending finite automata and temporal logic to specify and verify systems with quantitative timing bounds, with an underlying model of time as a dense total order (as opposed to a discrete-time model, where time is considered to advance in individual "ticks").


Group's Home Page
Last updated sometime 2008