Bibliometrics
Skip Table Of Content Section
research-article
Open Access
Autotuning Convolutions Is Easier Than You Think
Article No.: 20, pp 1–24https://doi.org/10.1145/3570641

A wide range of scientific and machine learning applications depend on highly optimized implementations of tensor computations. Exploiting the full capacity of a given processor architecture remains a challenging task, due to the complexity of the ...

research-article
Open Access
User-driven Online Kernel Fusion for SYCL
Article No.: 21, pp 1–25https://doi.org/10.1145/3571284

Heterogeneous programming models are becoming increasingly popular to support the ever-evolving hardware architectures, especially for new and emerging specialized accelerators optimizing specific tasks. While such programs provide performance portability ...

research-article
Open Access
Source Matching and Rewriting for MLIR Using String-Based Automata
Article No.: 22, pp 1–26https://doi.org/10.1145/3571283

A typical compiler flow relies on a uni-directional sequence of translation/optimization steps that lower the program abstract representation, making it hard to preserve higher-level program information across each transformation step. On the other hand, ...

research-article
Open Access
An Optimized Framework for Matrix Factorization on the New Sunway Many-core Platform
Article No.: 23, pp 1–24https://doi.org/10.1145/3571856

Matrix factorization functions are used in many areas and often play an important role in the overall performance of the applications. In the LAPACK library, matrix factorization functions are implemented with blocked factorization algorithm, shifting ...

research-article
Open Access
HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache Hierarchy
Article No.: 24, pp 1–19https://doi.org/10.1145/3572839

In this article, we propose a “full-stack” solution to designing high-apacity and low-latency on-chip cache hierarchies by starting at the circuit level of the hardware design stack. We propose a novel half VDD precharge 2T Gain Cell (GC) design for the ...

research-article
Open Access
ACTION: Adaptive Cache Block Migration in Distributed Cache Architectures
Article No.: 25, pp 1–19https://doi.org/10.1145/3572911

Chip multiprocessors (CMP) with more cores have more traffic to the last-level cache (LLC). Without a corresponding increase in LLC bandwidth, such traffic cannot be sustained, resulting in performance degradation. Previous research focused on data ...

research-article
Open Access
Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators
Article No.: 26, pp 1–26https://doi.org/10.1145/3572908

Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. ...

research-article
Open Access
Scale-out Systolic Arrays
Article No.: 27, pp 1–25https://doi.org/10.1145/3572917

Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt—i.e., throughput/Watt adjusted when accounting for array ...

research-article
Open Access
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications
Article No.: 28, pp 1–25https://doi.org/10.1145/3575861

The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Performance ...

research-article
Open Access
FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers
Article No.: 30, pp 1–24https://doi.org/10.1145/3579854

Page-based virtual memory relies on TLBs to accelerate the address translation. Nowadays, the gap between application workloads and the capacity of TLB continues to grow, bringing many costly TLB misses and making the TLB a performance bottleneck. ...

Subjects

Comments

About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!