PIs: Hsiang-Yun Cheng and Meng-Tsung Tsai
Computational problems may be classified into CPU-bound and I/O-bound categories. The running time to complete the computation of a CPU-bound problem is primarily determined by the period spent on CPU calculations. In contrast, the bottleneck to solving an I/O-bound problem is waiting for the data to transfer between storage and computation units. Many papers in the literature discuss how to speed up I/O bound problems. Their approaches usually keep the algorithm the same to solve the target problem, so the hardware design plays a central role in obtaining the speedup. We are considering redesigning the algorithm when the hardware design changes.
Much progress has been made recently in adding lightweight computation units near storage units to leverage computations. Such an approach may lead to an algorithm design similar to those in distributed computations. These algorithms usually run faster, but in the meanwhile, they consume more resources. We consider computing near the storage unit a small sketch of the data and then transferring the sketch to the computation unit to complete the subsequent (complicated) computations.
We have designed algorithms for sketching the input data for many fundamental computational graph problems. This set includes BFS trees, DFS trees, spanning trees with few leaves, sparse spanning eulerian subgraphs, partitioning graphs into few forests, and partitioning graphs into balanced connected subgraphs. Since the computations of the sketches usually involve simple sampling procedures, they can be realized by lightweight computations near the storage units. Once the small sketches are given, then sending them to computation units to complete the subsequent computations can no longer be I/O-bound.
Though small sketches can mitigate the data congestion between storage and computation units, they require hardware support. We consider also how to mitigate data congestion with no new hardware design. We begin with some basic instructions (they are sufficient to solve a number of time-consuming problems) and discuss how to implement them using one-way data flow. To be specific, can a counter be increased without reading its current value? We answer this problem in the affirmative. The main advantage of such a design is that it can significantly reduce the waiting time for data fetches because it now contains only data writes, which can be realized in a batch and therefore saves the I/O time. Though it sounds unrealistic, this is possible for many basic instructions (in particular, the ones mentioned above).
The main direction of our lab is to mitigate data congestion by integrating the hardware and software design. We have obtained very positive results for many fundamental computational graph problems.