SBSE_Project

This project focuses on automating and optimizing the task of codebase refactoring and cleanup, which is a time consuming software engineering activity.

How It Works

The engine treats code refactoring as an optimisation problem:

Evaluate the source file across six quality metrics (fitness function)
Search for better versions by applying refactoring operators (move operators) to the code
Accept or reject each candidate based on whether it improves the fitness score — and whether it still passes your test suite (hard constraint)

Setup

Requirements: Python 3.10+

# Install dependencies
pip install -r requirements.txt

Quick Start

# Run with defaults (Hill Climbing, 20 iterations, no test constraint)
python main.py sample_code.py

# Run with test-case constraint — any refactoring that breaks tests is discarded
python main.py sample_code.py --tests test_sample_code.py

# Save the refactored output to a file
python main.py sample_code.py --output refactored.py --tests test_sample_code.py

Running Each Algorithm

Random Search

Evaluates random operators independently at each iteration. Tracks the global best but has no memory of direction. Good as a baseline.

python main.py sample_code.py -a random_search
python main.py sample_code.py -a random_search --iterations 30 --seed 42

Hill Climbing

Greedy local search — only accepts moves that improve the score. Accumulates improvements step by step.

# Default (first-improvement, no restarts)
python main.py sample_code.py -a hill_climbing --iterations 30

# Steepest-ascent: evaluate all operators per step, pick the best
python main.py sample_code.py -a hill_climbing --steepest

# Random restarts: restart from a perturbation when stuck
python main.py sample_code.py -a hill_climbing --iterations 40 --restarts 2

Simulated Annealing

Probabilistically accepts worse solutions early on to escape local optima, then converges as the temperature cools.

# Default (geometric cooling)
python main.py sample_code.py -a simulated_annealing --iterations 50

# Linear cooling schedule
python main.py sample_code.py -a simulated_annealing --sa-schedule linear --iterations 50

# Adaptive cooling (adjusts rate based on recent acceptance rate)
python main.py sample_code.py -a simulated_annealing --sa-schedule adaptive --iterations 60

# Fine-grained temperature control
python main.py sample_code.py -a simulated_annealing --sa-t-start 2.0 --sa-t-end 0.001 --sa-alpha 0.90 --iterations 100

Genetic Algorithm

Evolves a population using tournament selection, function-level crossover, and mutation. Supports elitism.

# Default (pop=8, 20 generations)
python main.py sample_code.py -a genetic_algorithm

# Larger population, more generations
python main.py sample_code.py -a genetic_algorithm --population 12 --generations 30

# High crossover rate, lower mutation
python main.py sample_code.py -a genetic_algorithm --ga-crossover 0.9 --ga-mutation 0.2 --ga-tournament-k 3

NSGA-II

Multi-objective evolutionary algorithm. Maintains a population ranked by Pareto dominance across all six fitness objectives simultaneously.

# Default (pop=6, 10 generations)
python main.py sample_code.py -a nsga2

# Larger population and more generations
python main.py sample_code.py -a nsga2 --population 10 --generations 20

Test-Case Constraint

The most important flag for correctness. When --tests is supplied, every candidate the algorithm evaluates is first run against the test suite in a subprocess. Any candidate that fails even one test receives score = inf and is immediately discarded — it will never be accepted as the best result, no matter how good its metrics look.

# Enable the hard constraint
python main.py sample_code.py --tests test_sample_code.py

# Works with all algorithms
python main.py sample_code2.py -a nsga2 --tests test_sample_code2.py --population 8 --generations 15

# Increase timeout if your tests are slow
python main.py sample_code.py --tests test_sample_code.py --test-timeout 60

Running against the refactored output to verify (assuming the the refactored code is named as refactored.py):

python test_sample_code.py --module refactored.py -v

Fitness Objectives and Weights

Six metrics are computed for every candidate. All are minimised except naming quality (which is inverted before combining).

Metric	Default Weight	What it measures
`cyclomatic_complexity`	0.20	Average decision points per function
`lines_of_code`	0.10	Non-blank, non-comment line count
`duplication_ratio`	0.20	Fraction of duplicate non-trivial lines
`naming_quality`	0.10	Identifier naming convention score (maximise)
`lack_of_cohesion`	0.20	Disconnected variable groups per function (LCOM-inspired)
`coupling`	0.20	Average fan-out: distinct cross-function calls per function

# Focus heavily on cohesion and coupling
python main.py sample_code.py \
    --w-cohesion 0.35 --w-coupling 0.35 \
    --w-complexity 0.15 --w-duplication 0.10 \
    --w-loc 0.025 --w-naming 0.025

# Focus only on reducing duplication
python main.py sample_code.py \
    --w-duplication 0.8 --w-complexity 0.1 \
    --w-cohesion 0.05 --w-coupling 0.05 \
    --w-loc 0.0 --w-naming 0.0

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
algorithms		algorithms
.gitignore		.gitignore
README.md		README.md
display.py		display.py
fitness.py		fitness.py
main.py		main.py
operators.py		operators.py
requirements.txt		requirements.txt
sample_code.py		sample_code.py
test_sample_code.py		test_sample_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SBSE_Project

How It Works

Setup

Quick Start

Running Each Algorithm

Random Search

Hill Climbing

Simulated Annealing

Genetic Algorithm

NSGA-II

Test-Case Constraint

Fitness Objectives and Weights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SBSE_Project

How It Works

Setup

Quick Start

Running Each Algorithm

Random Search

Hill Climbing

Simulated Annealing

Genetic Algorithm

NSGA-II

Test-Case Constraint

Fitness Objectives and Weights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages