gpu-optimization

Here are 96 public repositories matching this topic...

alternbits / awesome-cuda-books

A curated list of best cuda programming books

cpp gpu cuda nvidia gpu-computing cuda-basics gpu-programming gpu-optimization cuda-programming cuda-tutorial cuda-cpp cuda-book

Updated May 19, 2026

tensorforger / FluxRT

Star

Real-time stream editing pipeline powered by the FLUX.2-klein-4B model, optimized for consumer GPUs

gpu-optimization diffusion-models real-time-ai

Updated May 16, 2026
Python

MIT-Lu-Lab / cuPDLPx

Star

A GPU-Accelerated First-Order LP Solver

gpu optimization solver linear-programming gpu-acceleration operations-research first-order-methods mathematical-programming gpu-optimization

Updated Apr 1, 2026
Cuda

ind4skylivey / 0ptiscaler4linux

Star

The intelligent OptiScaler installer Linux gamers needed. Automates FSR4, XeSS & DLSS configuration with GPU-optimized profiles for RDNA3/4, Arc & RTX cards.

vulkan proton linux-tools shell-scripting upscaling mesa gaming-performance gpu-optimization dlss linux-gaming xess steam-deck rdna3 frame-generation rdna4 optiscaler fsr4 amd-fsr

Updated Jan 27, 2026
Shell

GVProf / GVProf

Star

GVProf: A Value Profiler for GPU-based Clusters

machine-learning patterns profiler gpu cuda data-flow instrumentation binary-analysis clusters redundancy gpu-optimization value-profiler

Updated Mar 24, 2024
Python

ai-infra-curriculum / ai-infra-performance-learning

Sponsor

Star

AI Infrastructure Performance Engineer Learning Track - GPU optimization, inference optimization, and cost reduction

learning machine-learning performance curriculum advanced inference profiling tensorrt cost-optimization gpu-optimization ai-infrastructure

Updated May 29, 2026
Python

RobThePCGuy / Performance-Mod-Guide-For-Valheim

Star

Boost Valheim's FPS to forge a smoother Viking journey!

game-configuration gpu-optimization optimization-techniques tech-guide valheim-mods gaming-mods high-priority-mode cpu-optimization steam-guide viking-game performance-tweaking gaming-efficiency valheim-tips valheim-performance valheim-tricks

Updated Nov 25, 2024
PowerShell

yui0 / waifu2x-glsl

Sponsor

Star

Fast waifu2x converter with GPU optimization

macos linux resolution glsl waifu2x gpgpu glew waifu2x-glsl fast-waifu2x-converter gpu-optimization nyanko

Updated Dec 21, 2020
C

yui0 / waifu2x-ocl

Sponsor

Star

Fast waifu2x converter with GPU optimization

windows macos linux resolution opencl waifu2x fast-waifu2x-converter gpu-optimization waifu2x-ocl nyanko

Updated May 11, 2020
C

fms-zth / BlackFlash

Star

Handwritten Flash Attention 2 CUDA kernel for Blackwell (SM120) with TMA, swizzle, double buffering & warp specialization

gpu-optimization tma tensor-cores cuda-flash-attention

Updated May 25, 2026
Cuda

ai-infra-curriculum / ai-infra-senior-engineer-learning

Sponsor

Star

AI Infrastructure Senior Engineer Learning Track - Advanced ML infrastructure and technical leadership

kubernetes learning distributed-systems machine-learning performance curriculum advanced gpu-optimization mlops senior-engineer ai-infrastructure

Updated Jun 1, 2026
Python

philtimmes / KeSSie

Star

KeSSie HUGE Context Semantic recall for Large Language Models

Updated Feb 21, 2026
Python

The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA

model-management gpu-optimization real-time-monitoring secure-api big-data-integration gpu-scheduling

Updated Dec 28, 2025
Python

paralleliq / piqc-knowledge-base

Star

Production-ready checklists and frameworks for deploying LLMs, GenAI models, and AI infrastructure. Covers vLLM, Kubernetes, GPU optimization, observability, compliance, and Day-0 to Day-2 operations.

kubernetes machine-learning deployment optimization best-practices checklists model-serving gpu-optimization mlops production-readiness ai-governance vllm genai llm-deployment ai-infrastructure

Updated Apr 15, 2026
Shell

LessUp / sgemm-optimization

Star

Bilingual CUDA SGEMM optimization tutorial and reference implementation, from naive kernels to Tensor Core WMMA | 双语 CUDA SGEMM 优化教程与参考实现，从朴素内核到 Tensor Core WMMA

tutorial cuda matrix-multiplication high-performance-computing cuda-kernels shared-memory gemm sgemm gpu-optimization bank-conflict tensor-cores wmma

Updated May 28, 2026
Cuda

VoidYogendra / Face-Point

Star

First open-source real-time face filter app using MediaPipe FaceMesh for high-performance, GPU-accelerated effects.

android ndk video-processing face-detection opengl-es glsl-shaders mediacodec face-landmarks gpu-optimization face-filters mediapipe mediapipe-facemesh real-time-video-processing

Updated May 25, 2026
Kotlin

AMD-AGI / GPU-Optimization-for-LLM-Inference

Star

This is a short course covering GPU optimization techniques for LLM inference

llamas gpu-optimization llm-inference

Updated May 11, 2026
Python

OriginNeuralAI / OriginNeuralAI

Star

Physics-based computation at scale — Hamiltonian dynamics, spectral theory, and statistical mechanics powering optimization, drug discovery, genomics, molecular proof, and agentic commerce.

genomics drug-discovery ising-model post-quantum-cryptography hamiltonian-dynamics gpu-optimization simulated-bifurcation blockchain-verification spectral-theory physics-based-computation

Updated May 28, 2026
Python

AnubhabBanerjee / WarpGroup-backend

Star

A high-performance C++ backend for extreme-context LLM inference. It replaces item-count batching with dynamic, VRAM-aware First-Fit Decreasing (FFD) bin packing. By using PyBind11 for async queueing, 16-token alignment, and `cudaHostAlloc` for zero-copy FlashAttention-2 transfers, it mathematically eliminates OOMs and maximizes GPU throughput.

zero-copy bin-packing cpp17 memory-management pybind11 gpu-optimization llm-inference flash-attention-2

Updated Jun 5, 2026
Python

ZeroKernel798 / Triton-CUDA-Lab

Star

用于复现和优化常见的深度学习算子，基于cuda和triton两种方案，可供学习和参考

triton gpu-optimization cuda-programming

Updated Jun 1, 2026
Python

Improve this page

Add a description, image, and links to the gpu-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-optimization

Here are 96 public repositories matching this topic...

alternbits / awesome-cuda-books

tensorforger / FluxRT

MIT-Lu-Lab / cuPDLPx

ind4skylivey / 0ptiscaler4linux

GVProf / GVProf

ai-infra-curriculum / ai-infra-performance-learning

RobThePCGuy / Performance-Mod-Guide-For-Valheim

yui0 / waifu2x-glsl

yui0 / waifu2x-ocl

fms-zth / BlackFlash

ai-infra-curriculum / ai-infra-senior-engineer-learning

philtimmes / KeSSie

raj200501 / GPUOptimizerML

paralleliq / piqc-knowledge-base

LessUp / sgemm-optimization

VoidYogendra / Face-Point

AMD-AGI / GPU-Optimization-for-LLM-Inference

OriginNeuralAI / OriginNeuralAI

AnubhabBanerjee / WarpGroup-backend

ZeroKernel798 / Triton-CUDA-Lab

Improve this page

Add this topic to your repo