MLIR Compilation Framework
MLIR Compilation Framework
A production-grade compiler built on LLVM/MLIR that compiles PyTorch models and C/C++ code to domain-specific hardware.
Architecture
The compiler follows a three-layer design:
- Frontend: Clang-based C/C++ compilation + PyTorch TorchDynamo capture with 350+ op lowerings
- Midend: 18-pass MLIR optimization pipeline with cost-model-driven tiling, DMA insertion, and register optimization
- Backend: LLVM SelectionDAG instruction selection and assembly generation
Key Technical Contributions
- Cost-model-driven tiling: Automatic vectorization dimension selection based on access pattern analysis and instruction latency costs
- Multi-level memory tiling: Two-level tiling for explicit memory hierarchy management with auto-inserted DMA operations
- Accumulator promotion: Automatic conversion of load/store patterns to register-carried values across reduction loops
- Store-to-load forwarding: Epilogue fusion that eliminates intermediate memory round-trips
- Pure Python PyTorch frontend: TorchDynamo integration with TOSA/Linalg lowering, no C++ dependencies
Scale
- 25,000+ lines of code across frontend, midend, and backend
- 205+ custom MLIR dialect operations
- 141 tests covering backend instruction selection and midend pass validation