This project focuses on automating and optimizing the task of codebase refactoring and cleanup, which is a time consuming software engineering activity.
The engine treats code refactoring as an optimisation problem:
- Evaluate the source file across six quality metrics (fitness function)
- Search for better versions by applying refactoring operators (move operators) to the code
- Accept or reject each candidate based on whether it improves the fitness score — and whether it still passes your test suite (hard constraint)
Requirements: Python 3.10+
# Install dependencies
pip install -r requirements.txt# Run with defaults (Hill Climbing, 20 iterations, no test constraint)
python main.py sample_code.py
# Run with test-case constraint — any refactoring that breaks tests is discarded
python main.py sample_code.py --tests test_sample_code.py
# Save the refactored output to a file
python main.py sample_code.py --output refactored.py --tests test_sample_code.pyEvaluates random operators independently at each iteration. Tracks the global best but has no memory of direction. Good as a baseline.
python main.py sample_code.py -a random_search
python main.py sample_code.py -a random_search --iterations 30 --seed 42Greedy local search — only accepts moves that improve the score. Accumulates improvements step by step.
# Default (first-improvement, no restarts)
python main.py sample_code.py -a hill_climbing --iterations 30
# Steepest-ascent: evaluate all operators per step, pick the best
python main.py sample_code.py -a hill_climbing --steepest
# Random restarts: restart from a perturbation when stuck
python main.py sample_code.py -a hill_climbing --iterations 40 --restarts 2Probabilistically accepts worse solutions early on to escape local optima, then converges as the temperature cools.
# Default (geometric cooling)
python main.py sample_code.py -a simulated_annealing --iterations 50
# Linear cooling schedule
python main.py sample_code.py -a simulated_annealing --sa-schedule linear --iterations 50
# Adaptive cooling (adjusts rate based on recent acceptance rate)
python main.py sample_code.py -a simulated_annealing --sa-schedule adaptive --iterations 60
# Fine-grained temperature control
python main.py sample_code.py -a simulated_annealing --sa-t-start 2.0 --sa-t-end 0.001 --sa-alpha 0.90 --iterations 100Evolves a population using tournament selection, function-level crossover, and mutation. Supports elitism.
# Default (pop=8, 20 generations)
python main.py sample_code.py -a genetic_algorithm
# Larger population, more generations
python main.py sample_code.py -a genetic_algorithm --population 12 --generations 30
# High crossover rate, lower mutation
python main.py sample_code.py -a genetic_algorithm --ga-crossover 0.9 --ga-mutation 0.2 --ga-tournament-k 3Multi-objective evolutionary algorithm. Maintains a population ranked by Pareto dominance across all six fitness objectives simultaneously.
# Default (pop=6, 10 generations)
python main.py sample_code.py -a nsga2
# Larger population and more generations
python main.py sample_code.py -a nsga2 --population 10 --generations 20The most important flag for correctness. When --tests is supplied, every candidate the algorithm evaluates is first run against the test suite in a subprocess. Any candidate that fails even one test receives score = inf and is immediately discarded — it will never be accepted as the best result, no matter how good its metrics look.
# Enable the hard constraint
python main.py sample_code.py --tests test_sample_code.py
# Works with all algorithms
python main.py sample_code2.py -a nsga2 --tests test_sample_code2.py --population 8 --generations 15
# Increase timeout if your tests are slow
python main.py sample_code.py --tests test_sample_code.py --test-timeout 60Running against the refactored output to verify (assuming the the refactored code is named as refactored.py):
python test_sample_code.py --module refactored.py -vSix metrics are computed for every candidate. All are minimised except naming quality (which is inverted before combining).
| Metric | Default Weight | What it measures |
|---|---|---|
cyclomatic_complexity |
0.20 | Average decision points per function |
lines_of_code |
0.10 | Non-blank, non-comment line count |
duplication_ratio |
0.20 | Fraction of duplicate non-trivial lines |
naming_quality |
0.10 | Identifier naming convention score (maximise) |
lack_of_cohesion |
0.20 | Disconnected variable groups per function (LCOM-inspired) |
coupling |
0.20 | Average fan-out: distinct cross-function calls per function |
# Focus heavily on cohesion and coupling
python main.py sample_code.py \
--w-cohesion 0.35 --w-coupling 0.35 \
--w-complexity 0.15 --w-duplication 0.10 \
--w-loc 0.025 --w-naming 0.025
# Focus only on reducing duplication
python main.py sample_code.py \
--w-duplication 0.8 --w-complexity 0.1 \
--w-cohesion 0.05 --w-coupling 0.05 \
--w-loc 0.0 --w-naming 0.0