您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

Institute of Information Science, Academia Sinica

Research

Print

Press Ctrl+P to print from browser

Recent Research Results

:::

Interpretation of Machine Learning-Based Prediction Models and Functional Metagenomic Approach to Identify Critical Genes in HBCD Degradation

Journal of Hazardous Materials, March 2025

Yu-Jie Lin, Ping-Heng Hsieh, Chun-Chia Mao, Yang-Hsin Shih, Shu-Hwa Chen, Chung-Yen Lin

Chung-Yen Lin Yu-Jie Lin Ping-Heng Hsieh Chun-Chia Mao Shu-Hwa Chen

Abstract

Hexabromocyclododecane (HBCD) poses significant environmental risks, and identifying HBCD-degrading microbes and their enzymatic mechanisms is challenging due to the complexity of microbial interactions and metabolic pathways. This study aimed to identify critical genes involved in HBCD biodegradation through two approaches: functional annotation of metagenomes and the interpretation of machine learning-based prediction models. Our functional analysis revealed a rich metabolic potential in Chiang Chun soil (CCS) metagenomes, particularly in carbohydrate metabolism. Among the machine learning algorithms tested, random forest models outperformed others, especially when trained on datasets reflecting the degradation patterns of species like Dehalococcoides mccartyi and Pseudomonas aeruginosa. These models highlighted enzymes such as EC 1.8.3.2 (thiol oxidase) and EC 4.1.1.43 (phenylpyruvate decarboxylase) as inhibitors of degradation, while EC 2.7.1.83 (pseudouridine kinase) was linked to enhanced degradation. This dual-methodology approach not only deepens our understanding of microbial functions in HBCD degradation but also provides an unbiased view of the microbial and enzymatic interactions involved, offering a more targeted and effective bioremediation strategy.

Exploring Conversational Adaptability: Assessing the Proficiency of Large Language Models in Dynamic Alignment with Updated User Intent

in the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI 2025), February 2025

Yu-Chuan Chen, Hen-Hsen Huang

Hen-Hsen Huang Yu-Chuan Chen

Abstract

This paper presents a practical problem in dialogue systems: the capability to adapt to changing user intentions and resolve inconsistencies in conversation histories. It is crucial in scenarios like train ticket booking, where travel plans often change dynamically. Notwithstanding the advancements in NLP and large language models (LLMs), these systems struggle with real-time information updates during conversations. We introduce a specialized dataset to evaluate LLM-based chatbots on such conversational adaptability by asking a broad range of open-domain questions, focusing on scenarios where users modify their requests mid-conversation. Additionally, as LLMs are susceptible to generating superfluous sentences, we propose a novel, Chain-of-Thought-free evaluation framework to distill the user intention from their responses. Through extensive investigations on four LLMs, we observe that these contemporary LLMs are not well-aligned with the latest user intent in long-term conversations; they often fail to capture the nuances of natural conversations in a zero-shot setting. Interestingly, the results demonstrate that GPT-4, widely recognized as having the most advanced reasoning capabilities to date, is bested by GPT-3.5 in this task. This work aims to improve the practicality of LLM-based chatbots, bridging the gap between the current capabilities of dialogue systems and the fluidity of human interactions.

Multimodal Promptable Token Merging for Diffusion Models

The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), February 2025

Cheng-Yao Hong and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

SLIP: Spoof-aware One-class Face Anti-Spoofing with Language Image Pretraining

The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), February 2025

Pei-Kai Huang, Jun-Xiong Chong, Cheng-Hsuan Chiang, Tzu-Hsien Chen, Tyng-Luh Liu and Chiou-Ting Hsu

Tyng-Luh Liu

Abstract

Tracking Everything Everywhere across Multiple Cameras

The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), February 2025

Li-Heng Wang, YuJu Cheng and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

Optimizing Compute Core Assignment for Dynamic Batch Inference in AI Inference Accelerator

ACM Symposium on Applied Computing (SAC), March 2025

Ze-Wei Liou and Ding-Yong Hong

Ze-Wei Liou Ding-Yong Hong

Abstract

Modern AI inference accelerators offer high-performance and power-efficient computations for machine learning models. Most accelerators employ static inference to enhance performance, which requires models to be compiled with predetermined input batch sizes and intermediate tensor shapes. However, static inference can lead to program failures or inefficient execution when processing batched data of varying sizes, a scenario known as dynamic batch inference. This work addresses this challenge by focusing on the emerging multicore AI inference accelerators that offer flexible compute core assignment. We propose to dynamically partition the input batch data into smaller batches, and create multiple model instances to process each partition in parallel. The challenge lies in how to determine the optimal number of model instances, the proper batch size for each handling model, and the assignment of compute cores among the models, to minimize the inference time. To solve the problem, we construct an accurate profiling-based cost model and devise a dynamic programming algorithm to determine the best configuration. Experimental results indicate that our method achieves 3.05× higher throughput on average in multi- person pose estimation benchmarks, compared to the EdgeTPU-like inference strategy.

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

The Journal of the Acoustical Society of America, November 2024

Hsin-Tien Chiang, Szu-Wei Fu, Hsin-Min Wang, Yu Tsao, and John H. L. Hansen

Hsin-Min Wang Yu Tsao

Abstract

Because a reference signal is often unavailable in real-world scenarios, reference-free speech quality and intelligibility assessment models are important for many speech processing applications. Despite a great number of deep-learning models that have been applied to build non-intrusive speech assessment approaches and achieve promising performance, studies focusing on the hearing impaired (HI) subjects are limited. This paper presents HASA-Net+, a multi-objective non-intrusive hearing-aid speech assessment model, building upon our previous work, HASA-Net. HASA-Net+ improves HASA-Net in several ways: (1) inclusivity for both normal-hearing and HI listeners, (2) integration with pre-trained speech foundation models and fine-tuning techniques, (3) expansion of predictive capabilities to cover speech quality and intelligibility in diverse conditions, including noisy, denoised, reverberant, dereverberated, and vocoded speech, thereby evaluating its robustness, and (4) validation of the generalization capability using an out-of-domain dataset.

Predicting splicing patterns from the transcription factor binding sites in the promoter with deep learning

BMC Genomics, September 2024

Lin, C.H., Tsai, C.H., Shiau, C.K., Huang, J.H. and Tsai, H.K.*

H.K.

Abstract

Background Alternative splicing is a pivotal mechanism of post-transcriptional modification that contributes to the transcriptome plasticity and proteome diversity in metazoan cells. Although many splicing regulations around the exon/intron regions are known, the relationship between promoter-bound transcription factors and the downstream alternative splicing largely remains unexplored. Results In this study, we present computational approaches to unravel the regulatory relationship between promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine dataset that includes DNase I hypersensitive site sequencing and transcriptomes across fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to examine the associations between the promoter and downstream splicing events. While machine learning models demonstrated potential in predicting splicing patterns based on TFBS occupancies, the limitations in the generalization of predicting the splicing forms of singleton genes across diverse tissues was observed with carefully examination using different cross-validation methods. We further investigated the association between alterations in individual TFBS at promoters and shifts in exon splicing efficiency. Our results demonstrate that the convolutional neural network (CNN) models, trained on TF binding changes in the promoters, can predict the changes in splicing patterns. Furthermore, a systemic in silico substitutions analysis on the CNN models highlighted several potential splicing regulators. Notably, using empirical validation using K562 CTCFL shRNA knock-down data, we showed the significant role of CTCFL in splicing regulation. Conclusion In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.

Speculative Monte-Carlo Tree Search

Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024

Scott Cheng, Mahmut Kandemir, Ding-Yong Hong

Scott Cheng Ding-Yong Hong

Abstract

Monte-Carlo tree search (MCTS) is an influential sequential decision-making algorithm notably employed in AlphaZero. Despite its success, the primary challenge in AlphaZero training lies in its prolonged time-to-solution due to the high latency imposed by the sequential MCTS process. To address this challenge, this paper proposes and evaluates an inter-decision parallelization strategy called speculative MCTS, a new type of parallelism in AlphaZero which implements speculative execution. This approach allows for the parallel execution of future moves before the current MCTS computations are completed, thus reducing the latency. Additionally, we analyze factors contributing to the overall speedup by studying the synergistic effects of speculation and neural network caching in MCTS. We also provide an analytical model that can be used to evaluate the potential of different speculation strategies before they are implemented and deployed. Our empirical findings indicate that the proposed speculative MCTS can reduce training latency by 5.81x in 9x9 Go games. Moreover, our study shows that speculative execution can enhance the NN cache hit rate by 26% during midgame. Overall, our end-to-end evaluation indicates 1.91x speedup in 19x19 Go training time, compared to the state-of-the-art KataGo program.

Effective Compression of Language Models by Combining Pruning and Knowledge Distillation

IEEE International Conference on Computers, Software, and Applications (COMPSAC), April 2024

Chi-Yu Chiu, Ding-Yong Hong, Pangfeng Liu and Jan-Jan Wu

Ding-Yong Hong Jan-Jan Wu

Abstract

Weight pruning is a prominent model compression technique that removes some weights in a model. However, pruning on transformer models faces a challenge. After pruning, Transformer models require repeating the whole training process, including pre-training on a large generalized data set and fine-tuning on a small downstream data set, to recover the accuracy. The whole training process takes a long time and many computation resources. To address the challenge, we propose a pruning method that combines with knowledge distillation to avoid a long re-training time while recovering the accuracy. We use 2:4 pruning as our basic pruning method. 2:4 pruning is a method proposed by NVIDIA that keeps two larger absolute values in every four consecutive elements in every row in a weight matrix. We generalize 2:4 pruning to N:M pruning which refers to keeping N larger absolute values in every M consecutive elements in every row in a weight matrix. Knowledge distillation is another model compression method that makes a small model, which is referred to as a student, learn from a large model, which is referred to as a teacher. With our method, we use N:M pruning to uniformly prune the model into N:M structure. Next, we use two-stage fine-tuning on the downstream dataset with knowledge distillation. By using our method, the pruned models can achieve comparable accuracy by using only downstream datasets and take much less time than traditional retraining. We run our experiments on SQuAD and GLUE datasets using DistilBERT. The experimental results show that DistilBERT in a 1:4 structure can achieve comparable accuracy on the SQuAD v1.1 and SQuAD v2.0 datasets and 1.7x speedup on inference compared to the original dense model.

A Study on Incorporating Whisper for Robust Speech Assessment

IEEE International Conference on Multimedia and Expo (ICME), July 2024

Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, and Chiou-Shann Fuh

Yu Tsao Hsin-Min Wang

Abstract

This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model. We first investigate the effectiveness of Whisper in deploying a more robust speech assessment model. After that, we explore combining representations from Whisper and SSL models. The experimental results reveal that Whisper's embedding features can contribute to more accurate prediction performance achieved by MOSA-Net+. Moreover, combining the embedding features from Whisper and SSL models only leads to marginal improvement. As compared to intrusive methods, MOSA-Net, and other SSL-based speech assessment models, MOSA-Net+ yields notable improvements in estimating subjective quality and intelligibility scores across all evaluation metrics in Taiwan Mandarin Hearing In Noise test - Quality \& Intelligibility (TMHINT-QI) dataset. To further validate its robustness, MOSA-Net+ was tested in the noisy-and-enhanced track of the VoiceMOS Challenge 2023, where it obtained the top-ranked performance among nine systems.