Переглядів 572
Convolution is one of the most computationally intensive operations in CNN. A traditional approach to computing convolutions is known as the Im2col + BLAS method. This presentation talks about SConv: a direct-convolution algorithm based on an MLIR/LLVM code-generation toolchain that uses Vectorization and Matrix Multiplication ISA extensions to improve convolution performance, surpassing Img2col + BLAS on Intel x86 and IBM POWER10. We also describe a vector-based convolution packing routine that reduces total packing time, on full model inference, of 2.0x -- 3.9x on Intel x86 and 3.6x -- 7.2x on IBM POWER10. SConv convolution speedup, over an Im2col + BLAS method based on current BLAS implementations, is 12% -- 27% on Intel x86 and 26% -- 46% on IBM POWER10. The final speed-up for end-to-end machine-learning model inference ranges from 9% -- 25% for Intel x86 and 10% -- 42% for IBM POWER10 architectures. At the end of the talk, we lay out a plan to port SConv for RISC-V architectures.
Presenter: Guido Araújo, Full Professor of Computer Science and Engineering with University of Campinas, Brazil.