RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images with Diverse Sizes and Imbalanced Categories
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023
Yuan-Chih Chen and Chun-Shien Lu
Whole Slide Images (WSIs) are usually gigapixel in size and lack pixel-level annotations. The WSI datasets are also imbalanced in categories. These unique characteristics, significantly different from the ones in natural images, pose the challenge of classifying WSI images as a kind of weakly supervise learning problems. In this study, we propose, RankMix, a data augmentation method of mixing ranked features in a pair of WSIs. RankMix introduces the concepts of pseudo labeling and ranking in order to extract key WSI regions in contributing to the WSI classification task. A two-stage training is further proposed to boost stable training and model performance. To our knowledge, we are the first to investigate weakly supervised learning from the perspective of data augmentation to deal with the WSI classification problem that suffers from lack of training data and imbalance of categories.
Self-Adapted Utterance Selection for Suicidal Ideation Detection in Lifeline Conversations
The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), January 2023
Zhong-Ling Wang, Po-Hsien Huang, Wen-Yau Hsu and Hen-Hsen Huang
This paper investigates a crucial aspect of mental health by exploring the detection of suicidal ideation in spoken phone conversations between callers and counselors at a suicide prevention hotline. These conversations can be lengthy, noisy, and cover a broad range of topics, making it challenging for NLP models to accurately identify the caller's suicidal ideation. To address these difficulties, we introduce a novel, self-adaptive approach that identifies the most critical utterances that the NLP model can more easily distinguish. The experiments use real-world Lifeline transcriptions, expertly labeled, and show that our approach outperforms the baseline models in overall performance with an F-score of 66.01%. In detecting the most dangerous cases, our approach achieves a significantly higher F-score of 65.94% compared to the baseline models, an improvement of 8.9%. The selected utterances can also provide valuable insights for suicide prevention research. Furthermore, our approach demonstrates its versatility by showing its effectiveness in sentiment analysis, making it a valuable tool for NLP applications beyond the healthcare domain.
Planting Fast-growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-based Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), October 2022
Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Yi-Da Huang, and Wei-Kuan Shih
Owing to the considerations of cell density and low static power consumption, nonvolatile random-access memory (NVRAM) has been a promising candidate for collaborating with a dynamic random-access memory (DRAM) as the main memory in modern computer systems. As NVRAM also brings technical challenges (e.g., limited endurance and high writing cost) to computer system developers, the concept of write reduction becomes the famous doctrine in NVRAM-based system design. Unfortunately, a well-known machine learning algorithm, random forest, will generate a massive amount of write traffic to the main memory space during its construction phase. In other words, a random forest hits the Achilles’ heel of NVRAMbased systems. For remedying this pain, our work proposes an NVRAM-friendly random forest algorithm, namely, Amine, for an NVRAM-based system. The design principle of Amine is to replace write operations with read accesses without raising the read complexity of the random forest algorithm. According to experimental results, Amine can effectively decrease the latency of random forest construction by 64%, compared with the original random forest algorithm.
On Minimizing the Read Latency of Flash Memory to Preserve Inter-tree Locality in Random Forest
ACM/IEEE International Conference on Computer-Aided Design (ICCAD), October 2022
Yu-Cheng Lin, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Shuo-Han Chen, and Wei-Kuan Shih
Many prior research works have been widely discussed how to bring machine learning algorithms to embedded systems. Because of resource constraints, embedded platforms for machine learning applications play the role of a predictor. That is, an inference model will be constructed on a personal computer or a server platform, and then integrated into embedded systems for just-in-time inference. With the consideration of the limited main memory space in embedded systems, an important problem for embedded machine learning systems is how to efficiently move inference model between the main memory and a secondary storage (e.g., flash memory). For tackling this problem, we need to consider how to preserve the locality inside the inference model during model construction. Therefore, we have proposed a solution, namely locality-aware random forest (LaRF), to preserve the inter-locality of all decision trees within a random forest model during the model construction process. Owing to the locality preservation, LaRF can improve the read latency by 81.5% at least, compared to the original random forest library.
SGIRR: Sparse Graph Index Remapping for ReRAM Crossbar Operation Unit and Power Optimization
ACM/IEEE International Conference on Computer-Aided Design (ICCAD), October 2022
Cheng-Yuan Wang, Yao-Wen Chang, and Yuan-Hao Chang
Resistive Random Access Memory (ReRAM) Crossbars are a promising process-in-memory technology to reduce enormous data movement overheads of large-scale graph processing between computation and memory units. ReRAM cells can combine with crossbar arrays to effectively accelerate graph processing, and partitioning ReRAM crossbar arrays into Operation Units (OUs) can further improve computation accuracy of ReRAM crossbars. The operation unit utilization was not optimized in previous work, incurring extra cost. This paper proposes a two-stage algorithm with a crossbar OU-aware scheme for sparse graph index remapping for ReRAM (SGIRR) crossbars, mitigating the influence of graph sparsity. In particular, this paper is the first to consider the given operation unit size with the remapping index algorithm, optimizing the operation unit and power dissipation. Experimental results show that our proposed algorithm reduces the utilization of crossbar OUs by 31.4%, improves the total OU block usage by 10.6%, and saves energy consumption by 17.2%, on average.
D4AM: A General Denoising Framework for Downstream Acoustic Models
The Eleventh International Conference on Learning Representations, ICLR 2023, May 2023
Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen
The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to serve automatic speech recognition (ASR) systems. However, the training objectives of existing SE approaches do not consider the generalization ability to unseen ASR systems. In this study, we propose a general denoising framework for various downstream acoustic models, called D4AM. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. At the same time, our method aims to take the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients instead of going through a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. Experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. To the best of our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023
Qian-Bei Hong, Chung-Hsien Wu, and Hsin-Min Wang
The ability to generalize to mismatches between training and testing conditions and resist interference from other speakers is crucial for the performance of speaker verification. In this paper, we propose two novel approaches to improve the generalization ability to deal with the mismatched recorded scenarios and languages in test conditions and to reduce the influence of interference from other speakers on the similarity measurement of two speaker embeddings. First, parent embedding learning (PEL) is used for model training, which exploits the generalization ability of the shared structure to improve the representation of speaker embeddings. Second, partial adaptive score normalization (PAS-Norm) is used to reduce the influence of interference from other speakers on embedding-based similarity measures. In the experiments, the speaker embedding models are trained using the VoxCeleb2 dataset, and the performance is evaluated on four other datasets under different conditions, including VoxCeleb1, Librispeech, SITW, and CN-Celeb datasets. In the experiments on VoxCeleb1, evaluation results considering a large number of verification speakers and identity restrictions show that the proposed PEL-based system reduces the EER by 6.0% and 4.9% in these two cases, respectively, compared to the state-of-the-art (SOTA) system. Furthermore, in the experiments evaluating speaker verification in mismatch conditions on SITW and CN-Celeb, the proposed PEL-based system also outperforms the SOTA system. In the language mismatched conditions, the EER is reduced by 8.3%. For the evaluation of the influence of interference from other speakers, the EER is significantly reduced by 24.4% when PAS-Norm is used instead of the baseline AS-Norm score normalization method.
Performance Enhancement of SMR-based Deduplication Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), September 2022
Chun-Feng Wu, Martin Kuo, Ming-Chang Yang, and Yuan-Hao Chang
Due to the fast-growing amount of data and cost consideration, shingled-magnetic-recording (SMR) drives are developed to provide low-cost and high-capacity data storage by enhancing the areal-density of hard disk drives, and (data) deduplication techniques are getting popular in data-centric applications to reduce the amount of data that need to be stored in storage devices by eliminating the duplicate data chunks. However, directly applying deduplication techniques on SMR drives could significantly decrease the runtime performance of the deduplication system because of the time-consuming SMR space reclamation caused by the sequential write constraint of SMR drives. In this article, an SMR-aware deduplication scheme is proposed to improve the runtime performance of SMR-based deduplication systems with the consideration of the sequential write constraint of SMR drives. Moreover, to bridge the information gap between the deduplication system and the SMR drive, the lifetime information of data chunks is extracted to separate data chunks of different lifetimes in different places of SMR drives, so as to further reduce the SMR space reclamation overhead. A series of experiments was conducted with a set of realistic deduplication workloads. The results show that the proposed scheme can significantly improve the runtime performance of the SMR-based deduplication system with limited system overheads.
Accelerating Convolutional Neural Networks via Inter-operator Scheduling
IEEE International Conference on Parallel and Distributed Systems (ICPADS), Best Paper Runner-up, December 2022
Yi You, Pangfeng Liu, Ding-Yong Hong, Jan-Jan Wu and Wei-Chung Hsu
Convolution neural networks (CNNs) are essential in many machine learning tasks. Current deep learning frameworks and compilers usually treat the neutral network as a DAG (directed acyclic graph) of tensor operations and execute them one at a time according to a topological order, which respects the dependency in the DAG. There are two issues with this general approach. First, new CNNs have branch structures, and they form complex DAGs. These DAGs make it hard to find a good topology sort order that schedules operators within a GPU. Second, modern hardware has high computational power, which makes running operators sequentially on modern hardware under-utilizes resources. These two issues open the possibility of exploiting inter-operator parallelism, i.e., parallelism among independent operators in the DAG, to utilize the hardware resources more efficiently. In this work, we formally define the DAG scheduling problem that addresses the resource contention and propose an early-start-time-first algorithm with two heuristic rules for exploiting parallelism between independent operators. Experimental results show that our method improves the performance by up to 3.76× on RTX 3090 compared to the sequential execution.
Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Ya-Hui Yang, Shuo-Han Chen, and Yuan-Hao Chang
Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs), data bits of SK-RM can only be altered or detected at access ports, and shift operations are required to move data bits across access ports along the racetrack. Owing to these special characteristics, wordbased mapping and bit-interleaved mapping architectures have been proposed to facilitate reading and writing on SK-RM with different data layouts. Nevertheless, when SK-RM is used as an on-chip cache device, existing mapping architectures lead to the concerns of unpredictable access performance or excessive energy consumption during both data reads and writes. To resolve such concerns, this paper proposes extracting the merits of existing mapping architectures for allowing SK-RM to seamlessly switch its data update policy by considering the write latency requirement of cache accesses. Promising results have been demonstrated through a series of benchmark-driven experiments.
Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Yi-Shen Chen, Yuan-Hao Chang, and Tei-Wei Kuo
Phase-Change Memory (PCM) has emerged as a promising memory and storage technology in recent years, and Multi-Level-Cell (MLC) PCM further reduces the per-bit cost to improve its competitiveness by storing multiple bits in each PCM cell. However, MLC PCM has high energy consumption issue in its write operations. In contrast to existing works that try to enhance the energy efficiency of the physical program&verify strategy for MLC PCM, this work proposes a drift-tolerant coding scheme to enable the fast write operation on MLC PCM without sacrificing any data accuracy. By exploiting the resistance drift and asymmetric write characteristic of PCM cells, the proposed scheme can reduce the write energy consumption of MLC PCM significantly. Meanwhile, a segmentation strategy is proposed to further improve the write performance with our coding scheme. A series of analyses and experiments was conducted to evaluate the capability of the proposed scheme. The results show that the proposed scheme can reduce 6.2–17.1% energy consumption and 3.2–11.3% write latency under six representative benchmarks, compared with the existing well-known schemes.
A Deconvolution Approach to Unveiling the Immune Microenvironment of Complex Tissues and Tumors in Transcriptomics
BMC Bioinformatics, To Appear
Shu-Hwa Chen, Bo-Yi Yu, Wen-Yu Kuo, Ya-Bo Lin, Sheng-Yao Su, Wei-Hsuan Chuang, I-Hsuan Lu, Chung-Yen Lin
Resolving the composition of tumor-infiltrating leukocytes is essential for expanding the cancer immunotherapy strategy, which has witnessed dramatic success in some clinical trials but remained elusive and limited in its application. In this study, we developed a two-step streamed workflow to manage the complex bioinformatic processes involved in immune cell composition analysis. We developed a dockerized toolkit (DOCexpress_fastqc, https://hub.docker.com/r/lsbnb/docexpress_fastqc) to perform gene expression profiling from RNA sequencing raw reads by integrating the hisat2-stringtie pipeline and our scripts with Galaxy/Docker images. Then the output of DOCexpress_fastqc fits the input format of mySORT web, a web application that employs the deconvolution algorithm to determine the immune content of 21 cell subclasses. The usage of mySORT was also demonstrated using a pseudo-bulk pool through single-cell datasets. Additionally, the consistency between the estimated values and the ground-truth immune-cell composition from the single-cell datasets confirmed the exceptional performance of mySORT. The mySORT demo website and Docker image can be accessed for free at https://mysort.iis.sinica.edu.tw and https://hub.docker.com/r/lsbnb/mysort_2022.
SACS: A Self-Adaptive Checkpointing Strategy for Microkernel-Based Intermittent Systems
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2022
Yen-Ting Chen, Han-Xiang Liu, Yuan-Hao Chang, Yu-Pei Liang, and Wei-Kuan Shih
Intermittent systems are usually energy-harvesting embedded systems that harvest energy from ambient environment and perform computation intermittently. Due to the unreliable power, these intermittent systems typically adopt different checkpointing strategies for ensuring the data consistency and execution progress after the systems are resumed from unpredictable power failures. Existing checkpointing strategies are usually suitable for bare-metal intermittent systems with short run time. Due to the improvement of energy-harvesting techniques, intermittent systems are having longer run time and better computation power, so that more and more intermittent systems tend to function with a microkernel for handling more/multiple tasks at the same time. However, existing checkpointing strategies were not designed for (or aware of) such microkernel-based intermittent systems that support the running of multiple tasks, and thus have poor performance on preserving the execution progress. To tackle this issue, we propose a design, called self-adaptive checkpointing strategy (SACS), tailored for microkernel-based intermittent systems. By leveraging the time-slicing scheduler, the proposed design dynamically adjust the checkpointing interval at both run time and reboot time, so as to improve the system performance by achieving a good balance between the execution progress and the number of performed checkpoints. A series of experiments was conducted based on a development board of Texas Instrument (TI) with well-known benchmarks. Compared to the state-of-the-art designs, experiment results show that our design could reduce the execution time by at least 46.8% under different conditions of ambient environment while maintaining the number of performed checkpoints in an acceptable scale.
Rethinking the Interactivity of OS and Device Layers in Memory Management
ACM Transactions on Embedded Computing Systems (TECS), July 2022
Tse-Yuan Wang, Chun-Feng Wu, Che-Wei Tsao, Yuan-Hao Chang, Tei-Wei Kuo, and Xue Liu
Recently, the requirement of storing digital data has been growing rapidly; however, the conventional storage medium cannot satisfy these huge demands. Fortunately, thanks to biological technology development, storing digital data into deoxyribonucleic acid (DNA) has become possible in recent years. Furthermore, because of the attractive features (e.g., high storing density, long-term durability, and stability), DNA storage has been regarded as a potential alternative storage medium to store massive digital data in the future. Nevertheless, reading and writing digital data over DNA requires a series of extremely time-consuming processes (i.e., DNA sequencing and DNA synthesis). More specifically, among the two costs, the writing cost is the predominant cost of a DNA data storage system. Therefore, to enable efficient DNA storage, this article proposes an index management scheme for reducing the number of accesses to DNA storage. Additionally, this article introduces a new DNA data encoding format with VERA (Version Editing Recovery Approach) to reduce the total writing bits while inserting and deleting the data. To the best of our knowledge, this work is the first work to provide a total data management solution for DNA storage. According to the experimental results, the proposed design with VERA can reduce the cost by 77% and improve the performance by 71% compared to the append-only methods.
SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), December 2022
You-En Lin, An-Zi Yen, Hen-Hsen Huang and Hsin-Hsi Chen
When recalling life experiences, people often forget or confuse life events, which necessitates information recall services. Previous work on information recall focuses on providing such assistance reactively, i.e., by retrieving the life event of a given query. Proactively detecting the need for information recall services is rarely discussed. In this paper, we use a human annotated life experience retelling dataset to detect the right time to trigger the information recall service. We propose a pilot model– structured event enhancement network (SEEN) that detects life event inconsistency, additional information in life events, and forgotten events. A fusing mechanism is also proposed to incorporate event graphs of stories and enhance the textual representations. To explain the need detection results, SEEN simultaneously pro vides support evidence by selecting the related nodes from the event graph. Experimental results show that SEEN achieves promising performance in detecting information needs. In addition, the extracted evidence can be served as complementary information to remind users what events they may want to recall.
Enrichment of Prevotella intermedia in human colorectal cancer and its additive effects with Fusobacterium nucleatum on the malignant transformation of colorectal adenomas
Journal of Biomedical Science, October 2022
Chia-Hui Lo, Deng-Chyang Wu, Shu-Wen Jao, Chang-Chieh Wu, Chung-Yen Lin, Chia-Hsien Chuang, Ya-Bo Lin, Chien-Hsiun Chen, Ying-Ting Chen, Jiann-Hwa Chen, Koung-Hung Hsiao, Ying-Ju Chen, Yuan-Tsong Chen, Jaw-Yuan Wang, Ling-Hui Li
Owing to the heterogeneity of microbiota among individuals and populations, only Fusobacterium nucleatum and Bacteroides fragilis have been reported to be enriched in colorectal cancer (CRC) in multiple studies. Thus, the discovery of additional bacteria contributing to CRC development in various populations can be expected. We aimed to identify bacteria associated with the progression of colorectal adenoma to carcinoma and determine the contribution of these bacteria to malignant transformation in patients of Han Chinese origin.
Microbiota composition was determined through 16S rRNA V3–V4 amplicon sequencing of autologous adenocarcinomas, adenomatous polyps, and non-neoplastic colon tissue samples (referred to as “tri-part samples”) in patients with CRC. Enriched taxa in adenocarcinoma tissues were identified through pairwise comparison. The abundance of candidate bacteria was quantified through genomic quantitative polymerase chain reaction (qPCR) in tissue samples from 116 patients. Associations of candidate bacteria with clinicopathological features and genomic and genetic alterations were evaluated through odds ratio tests. Additionally, the effects of candidate bacteria on CRC cell proliferation, migration, and invasion were evaluated through the co-culture of CRC cells with bacterial cells or with conditioned media from bacteria.
Prevotella intermedia was overrepresented in adenocarcinomas compared with paired adenomatous polyps. Furthermore, co-abundance of P. intermedia and F. nucleatum was observed in tumor tissues. More notably, the coexistence of these two bacteria in adenocarcinomas was associated with lymph node involvement and distant metastasis. These two bacteria also exerted additive effects on the enhancement of the migration and invasion abilities of CRC cells. Finally, conditioned media from P. intermedia promoted the migration and invasion of CRC cells.
This report is the first to demonstrate that P. intermedia is enriched in colorectal adenocarcinoma tissues and enhances the migration and invasion abilities of CRC cells. Moreover, P. intermedia and F. nucleatum exert additive effects on the malignant transformation of colorectal adenomas into carcinomas. These findings can be used to identify patients at a high risk of malignant transformation of colorectal adenomas or metastasis of CRC, and they can accordingly be provided optimal clinical management.
AI4AVP: An Antiviral Peptides Predictor in Deep Learning Approach with Generative Adversarial Network Data Augmentation
Bioinformatics Advances, October 2022
Tzu-Tang Lin, Yih-Yun Sun, Ching-Tien Wang, Wen-Chih Cheng, I-Hsuan Lu, Chung-Yen Lin*, Shu-Hwa Chen*
Antiviral peptides from various sources suggest the possibility of developing peptide drugs for treating viral diseases. Because of the increasing number of identified antiviral peptides and the advances in deep-learning theory, it is reasonable to experiment with peptide drug design using in-silico methods.
We collected the most up-to-date antiviral peptides and used deep learning to construct a sequence-based binary classifier. A generative adversarial network was employed to augment the number of antiviral peptides in the positive training dataset and enable our deep-learning convolutional neural network model to learn from the negative dataset. Our classifier outperformed other state-of-the-art classifiers when using the testing dataset. We have placed the trained classifiers on a user-friendly web server, AI4AVP, for the research community.
Availability and implementation:
AI4AVP is freely accessible at http://axp.iis.sinica.edu.tw/AI4AVP/; codes and datasets for the peptide GAN and the AVP predictor CNN are available at https://github.com/lsbnb/amp_gan and https://github.com/LinTzuTang/AI4AVP_predictor.