MLIR Compilation Framework

A production-grade compiler built on LLVM/MLIR that compiles PyTorch models and C/C++ code to domain-specific hardware.

Architecture

The compiler follows a three-layer design:

Frontend: Clang-based C/C++ compilation + PyTorch TorchDynamo capture with 350+ op lowerings
Midend: 18-pass MLIR optimization pipeline with cost-model-driven tiling, DMA insertion, and register optimization
Backend: LLVM SelectionDAG instruction selection and assembly generation

Cost-model-driven tiling: Automatic vectorization dimension selection based on access pattern analysis and instruction latency costs
Multi-level memory tiling: Two-level tiling for explicit memory hierarchy management with auto-inserted DMA operations
Accumulator promotion: Automatic conversion of load/store patterns to register-carried values across reduction loops
Store-to-load forwarding: Epilogue fusion that eliminates intermediate memory round-trips
Pure Python PyTorch frontend: TorchDynamo integration with TOSA/Linalg lowering, no C++ dependencies