Bibliometrics
Skip Table Of Content Section
research-article
Open Access
Symbolic Analysis for Data Plane Programs Specialization
Article No.: 1, pp 1–21https://doi.org/10.1145/3557727

Programmable network data planes have extended the capabilities of packet processing in network devices by allowing custom processing pipelines and agnostic packet processing. While a variety of applications can be implemented on current programmable data ...

research-article
Open Access
BullsEye : Scalable and Accurate Approximation Framework for Cache Miss Calculation
Article No.: 2, pp 1–28https://doi.org/10.1145/3558003

For Affine Control Programs or Static Control Programs (SCoP), symbolic counting of reuse distances could induce polynomials for each reuse pair. These polynomials along with cache capacity constraints lead to non-affine (semi-algebraic) sets; and ...

research-article
Open Access
As-Is Approximate Computing
Article No.: 3, pp 1–26https://doi.org/10.1145/3559761

Although approximate computing promises better performance for applications allowing marginal errors, dearth of hardware support and lack of run-time accuracy guarantees makes it difficult to adopt. We present As-Is, an Anytime Speculative Interruptible ...

research-article
Open Access
TokenSmart: Distributed, Scalable Power Management in the Many-core Era
Article No.: 4, pp 1–26https://doi.org/10.1145/3559762

Centralized power management control systems are hitting a scalability limit. In particular, enforcing a power cap in a many-core system in a performance-friendly manner is quite challenging. Today’s on-chip controller reduces the clock speed of compute ...

research-article
Open Access
Lock-Free High-performance Hashing for Persistent Memory via PM-aware Holistic Optimization
Article No.: 5, pp 1–26https://doi.org/10.1145/3561651

Persistent memory (PM) provides large-scale non-volatile memory (NVM) with DRAM-comparable performance. The non-volatility and other unique characteristics of PM architecture bring new opportunities and challenges for the efficient storage system design. ...

research-article
Open Access
Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance
Article No.: 6, pp 1–23https://doi.org/10.1145/3561652

GraphBLASis a recent standard that allows the expression of graph algorithms in the language of linear algebra and enables automatic code parallelization and optimization. GraphBLAS operations are memory bound and may benefit from data locality ...

research-article
Open Access
SSD-SGD: Communication Sparsification for Distributed Deep Learning Training
Article No.: 7, pp 1–25https://doi.org/10.1145/3563038

Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous ...

research-article
Open Access
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM
Article No.: 8, pp 1–31https://doi.org/10.1145/3563697

Commodity DRAM-based processing-using-memory (PuM) techniques that are supported by off-the-shelf DRAM chips present an opportunity for alleviating the data movement bottleneck at low cost. However, system integration of these techniques imposes non-...

research-article
Open Access
Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks
Article No.: 9, pp 1–24https://doi.org/10.1145/3563695

MicroScope and other similar microarchitectural replay attacks take advantage of the characteristics of speculative execution to trap the execution of the victim application in a loop, enabling the attacker to amplify a side-channel attack by executing it ...

research-article
Open Access
Quantifying Resource Contention of Co-located Workloads with the System-level Entropy
Article No.: 10, pp 1–25https://doi.org/10.1145/3563696

The workload co-location, such as deploying offline analysis workloads with online service workloads on the same node, has become common for modern data centers. Workload co-location deployment improves data center resource utilization significantly. ...

research-article
Open Access
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks
Article No.: 11, pp 1–24https://doi.org/10.1145/3564606

Deep neural networks (DNNs) have become key solutions in the natural language processing (NLP) domain. However, the existing accelerators customized for their narrow target models cannot support diverse NLP models. Therefore, naively running complex NLP ...

research-article
Open Access
Occam: Optimal Data Reuse for Convolutional Neural Networks
Article No.: 12, pp 1–25https://doi.org/10.1145/3566052

Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. While CNNs are highly amenable to prefetching ...

research-article
Open Access
FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance Optimizations
Article No.: 13, pp 1–26https://doi.org/10.1145/3565885

With the rapid development of cloud computing, numerous cloud services, containers, and virtual machines have been bringing tremendous demands on high-performance memory resources to modern data centers. Heterogeneous memory, especially the newly released ...

research-article
Open Access
RegCPython: A Register-based Python Interpreter for Better Performance
Article No.: 14, pp 1–25https://doi.org/10.1145/3568973

Interpreters are widely used in the implementation of many programming languages, such as Python, Perl, and Java. Even though various JIT compilers emerge in an endless stream, interpretation efficiency still plays a critical role in program performance. ...

research-article
Open Access
SpecTerminator: Blocking Speculative Side Channels Based on Instruction Classes on RISC-V
Article No.: 15, pp 1–26https://doi.org/10.1145/3566053

In modern processors, speculative execution has significantly improved the performance of processors, but it has also introduced speculative execution vulnerabilities. Recent defenses are based on the delayed execution to block various speculative side ...

research-article
Open Access
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration
Article No.: 16, pp 1–26https://doi.org/10.1145/3566054

This article presents a code generator for sparse tensor contraction computations. It leverages a mathematical representation of loop nest computations in the sparse polyhedral framework (SPF), which extends the polyhedral model to support non-affine ...

research-article
Open Access
XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments
Article No.: 17, pp 1–25https://doi.org/10.1145/3568956

Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they are reused in ...

research-article
Open Access
YaConv: Convolution with Low Cache Footprint
Article No.: 18, pp 1–18https://doi.org/10.1145/3570305

This article introduces YaConv, a new algorithm to compute convolution using GEMM microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple CPU architectures. Previous approaches either create a copy of each image element ...

research-article
Open Access
Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy
Article No.: 19, pp 1–25https://doi.org/10.1145/3570304

Over the years, processor throughput has steadily increased. However, the memory throughput has not increased at the same rate, which has led to the memory wall problem in turn increasing the gap between effective and theoretical peak processor ...

Subjects

Comments

About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!