Institute of Information Science, Academia Sinica



Press Ctrl+P to print from browser

Recent Research Results


REFROM: Responsive, Energy-efficient Frame Rendering for Mobile Devices

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2023

Tsung-Yen Hsu, Yi-Shen Chen, Yun-Chih Chen, Yuan-Hao Chang, and Tei-Wei Kuo

Yuan-Hao Chang


The increasing demand for high-quality graphics on mobile devices necessitates a high frame rate for display refresh. However, current process scheduling and memory management policies fail to consider the computation demands of frame rendering because they are optimized for saving energy and resource utilization. This leads to unresponsive displays for mobile users due to rendering delays. Accurately estimating computation demands is challenging for the mobile operating system, particularly under memory pressure, without displayspecific semantics from user space. Moreover, the complexity of frame rendering makes it infeasible to schedule them with real-time policies. To address these issues, we propose a new framework called REFROM that utilizes a history-based frame time estimator to analyze frame time samples from UI threads and predict the computation requirements of upcoming frames. Experimental results demonstrate that REFROM reduces the number of delayed frames by up to 40% and improves up to 4% energy efficiency compared to the existing approaches.

Extreme Event Discovery with Self-Attention for PM2.5 Anomaly Prediction

IEEE Intelligent Systems, To Appear

Hsin-Chih Yang, Ming-Chuan Yang, Guo-Wei~Wong, Meng Chang Chen

Ming-Chuan Yang Meng-Chang Chen


Fine particulate matter (PM2.5) values of a particular location form a time series, whose prediction is challenging due to the complicated interactions between numerous factors from meteorological measurements, terrain conditions, and industry and human habitation activities, and their predictions have attracted considerable attention from the deep learning community. Although the deep learning approach for PM2.5 prediction generally has an acceptable accuracy, it has difficulty in PM2.5 anomaly prediction, while mispredictions prevent the authority from issuing proper instructions to reduce the impact on general health. We use extreme value theory (EVT) to formulate the PM2.5 prediction problem with a self-attention-based neural network implementation. EVT-based loss accounts for the rarity of anomalous data, and self-attention captures global information. Experiments demonstrate that the proposed model obtains an improved performance of 478% in F1 score and 286% in Matthews correlation coefficient (MCC) over the fully connected network, and 229% in F1 and 148% in MCC over the typical Transformer trained with the traditional loss function.

Attention Discriminant Sampling for Point Clouds

International Conference on Computer Vision (ICCV), October 2023

Cheng-Yao Hong, Yu-Ying Chou and Tyng-Luh Liu

Tyng-Luh Liu


This paper describes an attention-driven approach to 3-D point cloud sampling. Our method is established based on a structure-aware attention discriminant analysis that explores geometric and semantic relations embodied among points and their clusters. The proposed {\em attention discriminant sampling} (ADS) starts by efficiently decomposing a given point cloud into clusters to implicitly encode its structural and geometric relatedness among points. By treating each cluster as a structural component, ADS then draws on evaluating two levels of self attention: within-cluster and between-cluster. The former reflects the semantic complexity entailed by the learned features of points within each cluster, while the latter reveals the semantic similarity between clusters. Driven by structurally preserving the point distribution, these two aspects of self attention help avoid sampling redundancy and decide the number of sampled points in each cluster.   Extensive experiments demonstrate that ADS significantly improves classification performance to \textbf{95.1\%} on ModelNet40 and \textbf{87.5\%} on ScanObjectNN and achieves \textbf{86.9\%} mIoU on ShapeNet Part Segmentation. For scene segmentation, ADS yields \textbf{91.1\%}  accuracy on S3DIS with higher mIoU to the state-of-the-art and \textbf{75.6\%} mIoU on ScanNetV2. Furthermore, ADS surpasses the state-of-the-art with \textbf{55.0\%} mAP$_{50}$ on ScanNetV2 object detection. 

Shape-Guided Dual-Memory Learning for 3D Anomaly Detection

40th International Conference on Machine Learning (ICML), July 2023

Yu-Min Chu, Chieh Liu, Ting-I Hsieh, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu


We present a shape-guided expert-learning framework to tackle the problem of unsupervised 3D anomaly detection. Our method is established on the effectiveness of two specialized expert models and their synergy to localize anomalous regions from color and shape modalities. The first expert utilizes geometric information to probe 3D structural anomalies by modeling the implicit distance fields around local shapes. The second expert considers the 2D RGB features associated with the first expert to identify color appearance irregularities on the local shapes. We use the two experts to build the dual memory banks from the anomaly-free training samples and perform shape-guided inference to pinpoint the defects in the testing samples. Owing to the per-point 3D representation and the effective fusion scheme of complementary modalities, our method efficiently achieves state-of-the-art performance on the MVTec 3D-AD dataset with better recall and lower false positive rates, as preferred in real applications. 

ABC-Norm Regularization for Fine-Grained and Long-Tailed Image Classification

IEEE Transactions on Image Processing, July 2023

Yen-Chi Hsu, Cheng-Yao Hong, Ming-Sui Lee, Davi Geiger and Tyng-Luh Liu

Tyng-Luh Liu


Image classification for real-world applications often involves complicated data distributions such as fine-grained and long-tailed. To address the two challenging issues simultaneously, we propose a new regularization technique that yields an adversarial loss to strengthen the model learning. Specifically, for each training batch, we construct an {\em adaptive batch prediction} (ABP) matrix and establish its corresponding {\em adaptive batch confusion norm} (ABC-Norm). The ABP matrix is a composition of two parts, including an adaptive component to class-wise encode the imbalanced data distribution, and the other component to batch-wise assess the softmax predictions. The ABC-Norm leads to a norm-based regularization loss, which can be theoretically shown to be an upper bound for an objective function closely related to rank minimization. By coupling with the conventional cross-entropy loss, the ABC-Norm regularization could introduce adaptive classification confusion and thus trigger adversarial learning to improve the effectiveness of model learning. Different from most of state-of-the-art techniques in solving either fine-grained or long-tailed problems, our method is characterized with its simple and efficient design, and most distinctively, provides a unified solution. In the experiments, we compare ABC-Norm with relevant techniques and demonstrate its efficacy on several benchmark datasets, including (CUB-LT, iNaturalist2018); (CUB, CAR, AIR); and (ImageNet-LT), which respectively correspond to the real-world, fine-grained, and long-tailed scenarios. 

A feature extraction free approach for protein interactome inference from co-elution data

Briefings in Bioinformatics, June 2023

Chen, Y.H., Chao, K.H., Wong, J.Y., Liu, C.F., Leu, J.Y. and Tsai, H.K.

Yu-Hsin Chen Jin Yung Wong Chien-Fu Liu Huai-Kuang Tsai


Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic PPI networks. Current methods usually first infer protein-protein interactions (PPIs) based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, SPIFFED, to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at:

Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023

Qian-Bei Hong, Chung-Hsien Wu, and Hsin-Min Wang

Chung-Hsien Wu Hsin-Min Wang


Speech content is closely related to the stability of speaker embeddings in speaker verification tasks. In this paper, we propose a novel architecture based on self-constraint learning (SCL) and reconstruction task (RT) to remove the influence of phonetic information on speaker embedding generation. First, SCL is used to reduce the divergence of frame-level features, which can avoid ambiguity between the resulting embeddings of the two utterances being compared. Second, RT is used to further remove phonetic information in frame-level layers, focusing on speaker-discriminative feature transformation. In our experiments, the speaker embedding models were trained on the VoxCeleb2 dataset and evaluated on the VoxCeleb1, Librispeech, SITW and VoxMovies datasets. Experimental results on VoxCeleb1 show that the proposed DROP-TDNN system reduced the EER by 7.5%, compared to the state-of-the-art ECAPA-TDNN system. Furthermore, the proposed DROP-TDNN system also outperformed the ECAPA-TDNN system in the experiments on SITW, Librispeech and VoxMovies under cross-dataset conditions. In the experiments on SITW, the proposed system reduced the EER by 3.4% compared to the ECAPA-TDNN system. In the experiments on Librispeech, the proposed system demonstrated the advantage of removing phonetic information under the clean speech condition, with a significant reduction of 25.5% in EER compared to the ECAPA-TDNN system. In the experiments on VoxMovies, the proposed system reduced the EER by up to 7.9% compared to the ECAPA-TDNN system under different pronunciation and background conditions.

Multi-target Extractor and Detector for Unknown-number Speaker Diarization

IEEE Signal Processing Letters, 2023

Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Yu Tsao Hsin-Min Wang


Strong representations of target speakers can help extract important information about speakers and detect corresponding temporal regions in multi-speaker conversations. In this study, we propose a neural architecture that simultaneously extracts speaker representations consistent with the speaker diarization objective and detects the presence of each speaker on a frame-by-frame basis regardless of the number of speakers in a conversation. A speaker representation (called z-vector) extractor and a time-speaker contextualizer, implemented by a residual network and processing data in both temporal and speaker dimensions, are integrated into a unified framework. Tests on the CALLHOME corpus show that our model outperforms most of the methods proposed so far. Evaluations in a more challenging case with simultaneous speakers ranging from 2 to 7 show that our model achieves 6.4% to 30.9% relative diarization error rate reductions over several typical baselines.

LLSM: A Lifetime-Aware Wear-Leveling for LSM-Tree on NAND Flash Memory

ACM/IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), October 2022

Dharamjeet, Yi-Shen Chen, Tseng-Yi Chen, Yuan-Hung Kuan, and Yuan-Hao Chang

Tseng-Yi Chen Yuan-Hao Chang


The advancement of nonvolatile memory (NVM) technology reduces the cost-per-unit of solid-state drives (SSDs). Flash memory-based SSDs have become ubiquitous because they provide better performance and energy efficiency than hard disk drives. However, it suffers from wear-out problems caused by the out-of-place updates that limit its lifetime. Log-structured merge tree (LSM-tree) is a level-based data structure that is widely used in many database systems because it eliminates the random write operations to the storage devices. By transferring the random write operations into sequential write operations, the write performance of hard disk drives can be improved. However, LSM-tree is not efficient for SSDs because it is not aware of the access characteristics of flash memory. Moreover, the level-based indexing strategy of the LSM-tree significantly shortens the lifetime of SSDs because the data must be frequently updated due to the compaction operations between different levels. In contrast to many previous works that focus on alleviating the write amplification on SSDs for the database systems implemented by LSM-tree, we propose LLSM, a lifetime-aware wear-leveling for LSM-tree on NAND flash memory with open-channel SSD. By considering the data access frequency of the LSM-tree between different levels, LLSM rethinks the block allocation strategy during the compaction to evenly erase all the blocks of SSD storage devices, prolonging the SSD lifetime. Moreover, a proactive swapping strategy is designed to reorganize the data blocks for resolving the potential wear-leveling issues caused by the behaviors of the LSM-tree. The extensive experiments show that the results of lifetime improvement are encouraging.

Designing Network Design Strategies Through Gradient Path Analysis

Journal of Information Science and Engineering, July 2023

C. Y. Wang, H. Y. Mark Liao, and I-Hau Yeh

Chien-Yao Wang Hong-Yuan Mark Liao


Designing a high-efficiency and high-quality expressive network architecture has always been the most important research topic in the field of deep learning. Most of today’s network design strategies focus on how to integrate features extracted from different layers, and how to design computing units to effectively extract these features, thereby enhancing the expressiveness of the network. This paper proposes a new network design strategy, i.e., to design the network architecture based on gradient path analysis. On the whole, most of today’s mainstream network design strategies are based on feed forward path, that is, the network architecture is designed based on the data path. In this paper, we hope to enhance the expressive ability of the trained model by improving the network learning ability. Due to the mechanism driving the network parameter learning is the backward propagation algorithm, we design network design strategies based on back propagation path. We propose the gradient path design strategies for the layer-level, the stage-level, and the network-level, and the design strategies are proved to be superior and feasible from theoretical analysis and experiments. The source code of this work is at:

Differential Hsp90-dependent gene expression is strain-specific and common among yeast strains

iScience, May 2023

Hung, P.H., Liao, C.W., Ko, F.H., Tsai, H.K.* and Leu, J.Y.*

Huai-Kuang Tsai


Enhanced phenotypic diversity increases a population’s likelihood of surviving catastrophic conditions. Hsp90, an essential molecular chaperone and a central network hub in eukaryotes, has been observed to suppress or enhance the effects of genetic variation on phenotypic diversity in response to environmental cues. Because many Hsp90-interacting genes are involved in signaling transduction pathways and transcriptional regulation, we tested how common Hsp90-dependent differential gene expression is in natural populations.Many genes exhibited Hsp90-dependent strain-specific differential expression in five diverse yeast strains. We further identified transcription factors (TFs) potentially contributing to variable expression. We found that on Hsp90 inhibition or environmental stress, activities or abundances of Hsp90-dependent TFs varied among strains, resulting in differential strain-specific expression of their target genes, which consequently led to phenotypic diversity. We provide evidence that individual strains can readily display specific Hsp90-dependent gene expression, suggesting that the evolutionary impacts of Hsp90 are widespread in nature.

Short human eccDNAs are predictable from sequences

Briefings in Bioinformatics, April 2023

Chang, K.L., Chen, J.H., Lin, T.C., Leu, J.Y., Kao, C.F., Wong, J.Y.*, Tsai, H.K.*

Kai-Li Chang Tzu-Chi Lin Jin Yung Wong Huai-Kuang Tsai


Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development.
To shed new light on the conflicting views on short eccDNAs’ randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65±4.7%, attention-based models: 83.31±4.18%).
The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

C. Y. Wang, Alexey Bochkovskiy, H. Y. Mark Liao

Chien-Yao Wang Hong-Yuan Mark Liao


YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 120 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (54 FPS V100, 55.9% AP) outperforms both transformer-based detector SWINL Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 587% in speed and 2% in accuracy, and convolutionalbased detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 628% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.

You Only Learn One Representation: Unified Network for Multiple Tasks

Journal of Information Science and Engineering, May 2023

C. Y. Wang, H. Y. Mark Liao, and I-Hau Yeh

Chien-Yao Wang Hong-Yuan Mark Liao


People “understand” the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience, as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks. The source code of this work is at :

Intelligent De Novo Design of Novel Antimicrobial Peptides Against Antibiotic-Resistant Bacteria Strains

International Journal of Molecular Sciences, April 2023

Tzu-Tang Lin, Li-Yen Yang, Chung-Yen Lin, Ching-Tien Wang, Chia-Wen Lai, Chi-Fong Ko, Yang-Hsin Shih, Shu-Hwa Chen

Tzu-Tang Lin Li-Yen Yang Chung-Yen Lin Shu-Hwa Chen


Because of the growing number of clinical antibiotic resistance cases in recent years, novel anti-microbial peptides (AMPs) may be ideal for next-generation antibiotics. This study trained a Wasserstein generative adversarial network with gradient penalty (WGAN-GP) based on known AMPs to generate novel AMP candidates. The quality of the GAN-designed peptides was evalu-ated in silico, and eight of them, named GAN-pep 1–8, were selected by AMP Artificial Intelligence (A.I.) classifier and synthesized for further experiments. Disc diffusion testing and minimum in-hibitory concentration (MIC) determinations were used to identify the antibacterial effects of the synthesized GAN-designed peptides. Seven of the eight synthesized GAN-designed peptides displayed antibacterial activity. Additionally, GAN-pep 3 and GAN-pep 8 presented a broad spectrum of antibacterial effects and effectively against antibiotic-resistant bacteria strains, such as methicillin-resistant Staphylococcus aureus and carbapenem-resistant Pseudomonas aeruginosa. GAN-pep 3, the most promising GAN-designed peptide candidate, had low MICs against all the tested bacteria. In brief, our approach shows an efficient way to discover effective AMPs against general and antibiotic-resistant bacteria strains. And such a strategy also allows other novel func-tional peptides to be quickly designed, identified and synthesized for validation on the wet bench.

Identification of Sexually Dimorphic Genes in Pectoral Fin as Molecular Markers for Assessing the Sex of Japanese Silver Eels (Anguilla japonica)

Zoological Studies, March 2023

Hsiang-Yi Hsu, Chia-Hsien Chuang, I-Hsuan Lu, Chung-Yen Lin, and Yu-San Han

Chia-Hsien Chuang I-HSUAN Lu Chung-Yen Lin


The Japanese eel (Anguilla japonica) is an important species in East Asian aquaculture. However, the production of seedlings for this purpose still depends on natural resources, as the commercial production of glass eels is not yet possible. Confusion about the sex of silver eels is one of the factors affecting the success rate of artificial maturation. This study sought to devise a harmless method to precisely assess the sex of silver eels. Partial pectoral fins were collected from females and males and the total RNA was extracted for transcriptomic analysis to identify sexually dimorphic genes as molecular markers for sex typing. An online database was constructed to integrate the annotations of transcripts and perform comparative transcriptome analysis. This analysis identified a total of 29 candidate sexually dimorphic genes. Ten were selected for a real-time quantitative polymerase chain reaction (RT-qPCR) to validate the transcriptomic data and evaluate their feasibility as markers. The transcriptomic analysis and RT-qPCR data implicated three potential markers (LOC111853410, kera, and dcn) in sex typing. The expression of LOC111853410 was higher in females than in males. In contrast, the expression of kera and dcn was higher in males than in females. The ΔCT values of three markers were analyzed to determine their inferred thresholds, which can be used to determine the sex of Japanese eels. The results suggested that if a silver eel had a pectoral fin with the pectoral fin having the ΔCT of LOC111853410 < 11.3, the ΔCT of kera > 11.4, or the ΔCT of dcn > 6.5 can be assessed it could be assessed as female. Males could be assessed by the ΔCT of LOC111853410 > 11.3, the ΔCT of kera < 11.4, or the ΔCT of dcn < 6.5 in their pectoral fins. The molecular functions of these markers and the biological significance of their differential expression require further exploration.

Exploring Synchronous Page Fault Handling

ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), October 2022

Yin-Chiuan Chen, Chun-Feng Wu, Yuan-Hao Chang, and Tei-Wei Kuo

Chun-Feng Wu Yuan-Hao Chang


The advance of nonvolatile memory in storage technology has presented challenges in redefining the ways in handling the main memory and the storage. This work is motivated by the strong demands in effective handling of page faults over ultralow-latency storage devices. In particular, we propose synchronous and asynchronous prefetching strategies to satisfy process executions with different memory demands in supporting of synchronous page fault handling. An adaptive CPU scheduling strategy is also proposed to cope with the needs of processes in maintaining their working sets in the main memory. Six representative benchmarks and applications were evaluated. It was shown that our strategy can effectively save 12.33% of the total execution time and reduce 13.33% of page faults, compared to the conventional demand paging strategy with nearly no sacrificing of process fairness.

GraphRC: Accelerating Graph Processing on Dual-addressing Memory with Vertex Merging

ACM/IEEE International Conference on Computer-Aided Design (ICCAD), October 2022

Wei Cheng, Chun-Feng Wu, Yuan-Hao Chang, and Ing-Chao Lin

Chun-Feng Wu Yuan-Hao Chang


Architectural innovation in graph accelerators attracts research attention due to foreseeable inflation in data sizes and the irregular memory access pattern of graph algorithms. Conventional graph accelerators ignore the potential of Non-Volatile Memory (NVM) crossbar as a dual-addressing memory and treat it as a traditional single-addressing memory with higher density and better energy efficiency. In this work, we present GraphRC, a graph accelerator that leverages the power of dual-addressing memory by mapping in-edge/out-edge requests to column/row-oriented memory accesses. Although the capability of dual-addressing memory greatly improves the performance of graph processing, some memory accesses still suffer from low-utilization issues. Therefore, we propose a vertex merging (VM) method that improves cache block utilization rate by merging memory requests from consecutive vertices. VM reduces the execution time of all 6 graph algorithms on all 4 datasets by 24.24% on average. We then identify the data dependency inherent in a graph limits the usage of VM, and its effectiveness is bounded by the percentage of mergeable vertices. To overcome this limitation, we propose an aggressive vertex merging (AVM) method that outperforms VM by ignoring the data dependency inherent in a graph. AVM significantly reduces the execution time of ranking-based algorithms on all 4 datasets while preserving the correct ranking of the top 20 vertices.