CS292C-13: Computer-Aided Reasoning for Software
Lecture 13: Symbolic Execution

by Yu Feng

Hoare Logic and Symbolic Execution
1
Symbolic Execution as an Instance
Symbolic execution is an instantiation of Hoare logic's formal verification approach
2
Operational Implementation
Symbolic execution implements Hoare logic by tracking path conditions through program execution
3
Automated Reasoning
While Hoare logic provides the theory, symbolic execution automates the verification process
4
Practical Application
Symbolic execution turns Hoare logic's formal reasoning into mechanized program analysis
Symbolic execution is Hoare logic with forward reasoning (strongest post-conditions)
Why Care About Symbolic Execution?
Test Generation
KLEE, SAGE, Pex create systematic tests
Bug Finding
Detects assert violations and crashes
Security
Tools like angr and Triton analyze vulnerabilities
Verification
Path summaries enable formal verification
The Spectrum of Program Validation Tools
From lightweight to formal techniques: increasing confidence, but also increasing manual effort
Manual Testing
Human-driven exploration
  • Ad-hoc testing
  • Limited coverage
  • Domain knowledge
Fuzzing
Automated random inputs
  • Coverage-guided
  • Mutation-based
  • Crash detection
Symbolic Execution
Path-sensitive analysis
  • Constraint solving
  • High path coverage
  • Precise bug finding
Formal Verification
Mathematical guarantees
  • Complete proofs
  • Specification checking
  • Exhaustive analysis
Each technique offers different trade-offs between scalability, precision, and automation as we move from lightweight methods (faster but less thorough) to formal techniques (more thorough but labor-intensive)
History of Symbolic Execution
1
1976: First Foundations
Lori Clarke develops a system to generate test data and symbolically execute programs, establishing early theoretical foundations
2
1976: Formalization
James King publishes "Symbolic execution and program testing," providing formal methodology and algorithms
3
2005-Present: Practical Implementation
  • Integration with SMT solvers enables practical applications
  • Development of heuristics to control exponential path explosion
  • Advanced heap modeling and pointer reasoning techniques
Running Toy Example
def f(x, y):
if (x > y):
x = x + y
y = x - y
x = x - y
if (x - y > 0):
assert False
return (x, y)
Execute the program using symbolic values.
Begin with x=A, y=B
The symbolic state maps variables to symbolic expressions.
The path condition is a quantifier-free formula over symbolic inputs that encodes all branch decisions taken so far.
All paths in the program form its execution tree, with both feasible and infeasible paths.
Key Data Structures
Practical Obstacles
1
Loops & Recursion
Create infinite execution trees
2
Path Explosion
Exponential growth in states
3
Complex Structures
Heap & pointers hard to model
4
Solver Limits
Non-linear arithmetic, strings challenge solvers
Handling Loops
Dealing with infinite execution trees:
  • Finitize paths by unrolling loops and recursion (bounded verification)
  • Finitize paths by limiting the size of PCs (bounded verification)
  • Use loop invariants (verification)
Random Testing
Random testing is a software testing technique that generates random inputs to discover unexpected program behaviors and bugs without requiring detailed test specifications.
Generate Random Inputs
Create randomized test cases across the input domain
Execute Program
Run the program with random inputs
Observe Results
Check for crashes, unexpected outputs, or assertion failures
Analyze Failures
Investigate and fix bugs found through random execution
Advantages
Simple to implement, requires minimal specifications, can find unexpected edge cases, and scales well to large programs
Limitations
May miss structured inputs, lacks systematic coverage, and can be inefficient for rare conditions
Concolic Execution
Concolic execution (a portmanteau of concrete and symbolic) combines concrete execution with symbolic analysis to overcome limitations of pure symbolic execution.
How It Works
  1. Execute program with concrete inputs
  1. Simultaneously collect symbolic constraints
  1. Negate path constraints to explore new paths
  1. Use SMT solvers to generate new concrete inputs
This approach leverages the speed of concrete execution while gaining the path exploration benefits of symbolic analysis.
Advantages
  • Handles complex code that pure symbolic execution struggles with
  • Provides concrete test cases for each detected issue
  • Reduces path explosion by using concrete values for complex operations
  • Can interface with real environments for library/API calls
Concolic testing bridges the gap between systematic symbolic analysis and practical testing concerns, making it particularly valuable for real-world software validation.
Concolic Execution: Strengths & Limitations
Strengths
Systematic Path Coverage
More thorough than pure fuzzing approaches
Avoids Infeasible Paths
Concrete state helps avoid many infeasible path conditions that pure symbolic execution would explore
Wide Industry Adoption
Implemented in numerous tools including DART, CUTE, SAGE (Windows), KLEE, Angr, and Pex
Limitations
Path Explosion
Still encounters path explosion problems when handling deep loops
Solver Constraints
Puts significant strain on solvers when dealing with complex arithmetic or heap operations
⚠️ Needs Concrete Stubs
Requires concrete implementations for OS/environment calls that cannot be symbolically executed
Program Validation Strategies
Different approaches to validating software correctness and finding bugs, from simple to sophisticated:
Random Testing
Uses randomly generated inputs to find bugs with minimal setup
Symbolic Execution
Analyzes code paths using symbolic values and constraint solving
Concolic Execution
Combines concrete and symbolic execution for practical path exploration
Each strategy offers different tradeoffs between implementation complexity, coverage thoroughness, and scalability to real-world programs.
Heap & Pointer Modelling
Key approaches to handling memory in symbolic execution:
Bit-precise memory modeling with the theory of arrays
Used in EXE, Klee, and SAGE to model memory at the bit level, enabling precise reasoning about memory operations and pointer arithmetic while maintaining symbolic relationships
Lazy concretization
Implemented in Java PathFinder (JPF) to delay concrete evaluation of memory addresses until absolutely necessary, reducing the number of paths to explore
Concolic lazy concretization
Employed by CUTE to combine concrete execution with selective symbolic tracking of pointer constraints, balancing precision with practical performance
These techniques address the fundamental challenge of modeling pointer relationships and heap structures symbolically while maintaining scalability.
Heap Modeling: Lazy Concretization
class Node {
int elem;
Node next;
}
n = symbolic(Node);
x = n.next;
Heap Modeling: Concolic Testing
Solver Optimisations

Expression Simplification
Reduce complexity on-the-fly
Incremental Solving
Use assumption stacks for efficiency
Constraint Caching
Reuse results across sibling paths
Concretisation
Simplify opaque parts like string libraries
Environment Modelling
POSIX Models
KLEE implements open, read functions
Math Libraries
Stubs return unconstrained symbolic values
System Calls
Replay for binary analysis in angr
Take-Aways
Symbolic execution = execute with symbols + constraints
Finds deep bugs, auto-generates tests, feeds verification & synthesis
Key challenges: loops, path explosion, environment
Modern tools combine SE with fuzzing, ML, and SMT advances
Further Reading
  • King, "Symbolic Execution & Program Testing" (1976)
  • KLEE (OSDI 08)
  • DART (PLDI 05), CUTE (ASE 05)
  • Baldoni et al., "Survey of Symbolic Execution" (2018)
  • angr documentation & demos