您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

Institute of Information Science, Academia Sinica

Research

Print

Press Ctrl+P to print from browser

Recent Research Results

:::

DIffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition

IEEE Transactions on Information Forensics and Security , April 2026

Hanrui Wang, Shuo Wang, Chun-Shien Lu, and Isao Echizen

Chun-Shien Lu

Abstract

Face recognition poses serious privacy risks due to its reliance on sensitive and immutable biometric data. While modern systems mitigate privacy risks by mapping facial images to embeddings (commonly regarded as privacy-preserving), model inversion attacks reveal that identity information can still be recovered, exposing critical vulnerabilities. However, existing attacks are often computationally expensive and lack generalization, especially those requiring target-specific training. Even training-free approaches su er from limited identity controllability, hindering faithful reconstruction of nuanced or unseen identities. In this work, we propose Di MI, the first di usion-driven, training-free model inversion attack. Di MI introduces a novel pipeline combining robust latent code initialization, a ranked adversarial refinement strategy, and a statistically grounded, confidence-aware optimization objective. Di MI applies directly to unseen target identities and face recognition models, o ering greater adaptability than trainingdependent approaches while significantly reducing computational overhead. Our method achieves 84.42%–92.87% attack success rates against inversion-resilient systems and outperforms the best prior training-free GAN-based approach by 4.01%–9.82%. The implementation is available at https://github.com/azrealwang/ Di MI.

Harnessing Sequence Embedding and Ensemble Learning to Identify Antifungal Peptides with Low Hemolytic Risk

ACS OMEGA, April 2026

Chung-Yen Lin,Wen-Chih Cheng, U-Lin Chen, Tzu-Tang Lin, Li-Hang Hsu, Yang-Hsin Shih, I-Hsuan Lu, Ying-Lien Chen, Shu-Hwa Chen

Chung-Yen Lin Shu-Hwa Chen Wen-Chih Cheng I-Hsuan Lu

Abstract

The increasing prevalence of fungal infections represents a growing threat to human health, driven in part by the misuse of antibiotics and the rising incidence of resistance to conventional antifungal agents. Antifungal peptides (AFPs) have emerged as promising alternatives due to their diverse mechanisms of action and their relatively low propensity to develop resistance. To facilitate the systematic discovery of AFPs, we developed AI4AFP. This computational framework integrates curated antifungal peptide resources with advanced machine learning approaches to predict antifungal potential directly from peptide sequences.

Using a comprehensive dataset, we constructed a seven-model ensemble that combines multiple sequence encoding strategies, including ProtBERT-BFD, PC6, and Doc2Vec, with diverse learning algorithms, including random forests, support vector machines, convolutional neural networks, and fine-tuned BERT models. This ensemble demonstrated robust performance on an independent test set, achieving 0.94 in accuracy and 0.89 in Matthews correlation coefficient, outperforming existing AFP prediction methods. Importantly, the predicted AFP score is intended to reflect the general antifungal potential rather than species-specific potency.

Experimental validation against representative fungal pathogens, including Candida albicans, Candida glabrata, and Cryptococcus neoformans, revealed that peptides with high predicted AFP scores exhibited context-dependent antifungal activity. Several candidates displayed pronounced inhibitory effects against specific species, despite limited activity against others, highlighting the inherent species-dependence of antifungal efficacy and supporting the role of AI4AFP as a prioritization tool rather than a species-specific predictor.

To complement antifungal prediction, we further developed a hemolysis classifier that incorporates both peptide sequence and applied concentration as continuous inputs, enabling explicit modeling of the dose-dependent nature of hemolytic toxicity. Experimental determination of the minimum concentration inducing 10% hemolysis (MHC₁₀) provided an empirical safety reference, allowing antifungal activity to be interpreted alongside concentration-dependent toxicity. All models and validation results are implemented on a user-friendly web server, AI4AFP (https://axp.iis.sinica.edu.tw/AI4AFP), providing an accessible platform for the discovery and prioritization of antifungal peptides, with consideration of both efficacy and safety.

Overcoming Copyright Barriers in Corpus Distribution Through Non-Reversible Hashing

The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Main Conference, July 2026

Arthur Amalvy, Vincent Labatut, Xavier Bost, and Hen-Hsen Huang

Hen-Hsen Huang

Abstract

While annotated corpora are crucial in the field of natural language processing (NLP), those containing copyrighted material are difficult to exchange among researchers. Yet, such corpora are necessary to fully represent the diversity of data found in the wild in the context of NLP tasks. We tackle this issue by proposing a method to lawfully and publicly share the annotations of copyrighted literary texts. The corpus creator shares the annotations in clear, along with a non-reversible hashed version of the source material. The corpus user must own the source material, and apply the same hash function to their own tokens, in order to match them to the shared annotations. Crucially, our method is robust to reasonable divergences in the version of the copyrighted data owned by the user. As an illustration, we present alignment experiments on different editions of novels. Our results show that our method is able to correctly align 98.7 to 99.79% of tokens depending on the novel, provided the user version is sufficiently close to the corpus creator's version. We publicly release novelshare, a Python implementation of our method.

Rethinking Forgery Attacks on Semantic Watermarks in Black-Box Settings: A Geometric Distortion Perspective

Forty-third International Conference on Machine Learning (ICML), July 2026

Cheng-Yi Lee, Yichi Zhang, Yuchen Yang, and Chun-Shien Lu, and Jun-Cheng Chen

Chun-Shien Lu Cheng-Yi Lee Jun-Cheng Chen

Abstract

Recent studies have shown that semantic watermarks, which embed information into the initial noise of latent diffusion models (LDMs), are vulnerable to black-box forgery attacks. However, existing methods primarily rely on empirical evidence and lack a rigorous theoretical understanding of the conditions under which such attacks succeed or fail. To bridge this gap, we rethink the nature of such attacks through the lens of ratedistortion in the latent space. Our analysis identifies an irreducible distortion floor due to structural mismatches between proxy and target models, which fundamentally limits the fidelity of forged watermarks. We further characterize this distortion as structured geometric deviations on the latent manifold, in the form of global drift and local deformation rather than stochastic noise. Leveraging these insights, we propose a scheme-agnostic detection method that distinguishes forged samples before watermark verification. Extensive experiments demonstrate the effectiveness of our method across diverse black-box scenarios, while preserving robustness to common distortions.

Submodular Optimization for Minimal Augmentation in Robust Language Model Alignment

Forty-third International Conference on Machine Learning (ICML), July 2026

Ching-Chia Kao, Chia-Mu Yu, Chun-Shien Lu, and Chu-Song Chen

Ching-Chia Kao Chun-Shien Lu Chu-Song Chen Chia-Mu Yu

Abstract

Safety alignment of large language models is fragile: even small fine-tuning perturbations elastically revert behaviors toward those of the pretraining, with degradation inversely proportional to the size of the alignment set. We ask how to achieve safety alignment with minimal augmentation. To this end, we model augmentation as a set of group actions on sequences and formalize robustness gains as a normalized, monotone submodular function over transformations. We then leverage submodular optimization to select minimal augmentations that provably improve robustness. Experiments confirm that our approach efficiently restores safety alignment while minimizing the overhead of augmentation.

Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors, and Perceptual Insights

IEEE Computational Intelligence Magazine, May 2026

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, and Hsin-Min Wang

Yu Tsao Hsin-Min Wang Ammarah Hashmi Sahibzada Adil Shahzad

Abstract

Deep learning has been successfully applied in various fields, and its impact on deepfake detection is no exception. Deepfakes are fake, yet realistic synthetic content that can be used deceitfully for political impersonation, phishing, slander, or the spread of misinformation. Despite extensive research on unimodal deepfake detection, the identification of complex deepfakes through joint analysis of audio and visual streams remains relatively unexplored. To fill this gap, this survey first provides an overview of audiovisual deepfake generation techniques, applications, and their consequences, and then provides a comprehensive review of state-of-the-art methods that combine audio and visual modalities to increase detection accuracy, summarizing and critically analyzing their strengths and limitations. Furthermore, we discuss existing open source datasets for a deeper understanding, which can contribute to the research community and provide necessary information for beginners who want to analyze deep learning-based audiovisual methods for video forensics. By bridging the gap between unimodal and multimodal approaches, this paper aims to improve the effectiveness of deepfake detection strategies and guide future research on cybersecurity and media integrity.

Telomere-to-Telomere, Haplotype-Resolved Chromosome-Level Genome Assembly and Annotation of Taiwan Hard Clam (Meretrix taiwanica)

Scientific Data, May 2026

Ching-Huei Huang, Po-Cheng Hsu, San-Tzu Hsieh, Fu-Shen Tseng, Chung-Yen Lin

Po-Cheng Hsu Chung-Yen Lin

Abstract

Taiwan Hard Clam (Meretrix taiwanica) is an economically important aquaculture species in Taiwan, yet genomic resources for this species have remained fragmented. We present a telomere-to-telomere (T2T), haplotype-resolved, chromosome-level genome assembly for M. taiwanica, generated using PacBio HiFi long reads and Hi-C sequencing. The two haploid assemblies (hap1 and hap2) span 1,006.48 Mb and 1,007.28 Mb, comprising 126 and 66 sequences, respectively, and each containing 19 chromosomes. Hap1 and hap2 exhibit sequence N50 values of 53.87 Mb and 51.57 Mb, with average scaffold lengths of 7.99 Mb and 15.26 Mb, and contain 0.0176% and 0.1313% ambiguous bases. Comparative analyses revealed 81.59% and 83.78% syntenic regions between haplotypes and identified 10,175 structural variations. Repetitive elements constitute 47.06% and 47.02% of the hap1 and hap2 genomes. We annotated 23,320 and 23,598 protein-coding gene models, with median gene lengths of 7,721 bp and 7,657.5 bp, respectively. The mitochondrial genome was assembled at 21,164 bp and encodes 13 protein-coding genes, 22 tRNAs, and 2 rRNAs. Functional annotation covered 16.23% and 16.33% of the nuclear and mitochondrial gene sets. BUSCO analysis indicated genome completeness of 92.4% and 92.5%, and proteome completeness of 95.4% and 94.5% for hap1 and hap2. By providing the first T2T-level reference, this dataset enables precise identification of trait-associated markers for marker-assisted selection (MAS), thereby facilitating genetic improvement of growth and stress-resistance traits. Furthermore, it serves as a robust genomic framework for conservation genomics to assess the genetic diversity of both wild and hatchery populations of this economically vital species.


Regret-Guided Search Control for Efficient Learning in AlphaZero

the Fourteenth International Conference on Learning Representations (ICLR), April 2026

Yun-Jui Tsai, Wei-Yu Chen, Yan-Ru Ju, Yu-Hung Chang, Ti-Rong Wu

Yan-Ru Ju Yu-Hung Chang Ti-Rong Wu

Abstract

Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-efficient than humans. While RL agents require extensive self-play games to extract useful signals, humans often need only a few games, improving rapidly by repeatedly revisiting states where mistakes occurred. This idea, known as search control, aims to restart from valuable states rather than always from the initial state. In AlphaZero, prior work Go-Exploit applies this idea by sampling past states from self-play or search trees, but it treats all states equally, regardless of their learning potential. We propose Regret-Guided Search Control (RGSC), which extends AlphaZero with a regret network that learns to identify high-regret states, where the agent's evaluation diverges most from the actual outcome. These states are collected from both self-play trajectories and MCTS nodes, stored in a prioritized regret buffer, and reused as new starting positions. Across 9x9 Go, 10x10 Othello, and 11x11 Hex, RGSC outperforms AlphaZero and Go-Exploit by an average of 77 and 89 Elo, respectively. When training on a well-trained 9x9 Go model, RGSC further improves the win rate against KataGo from 69.3% to 78.2%, while both baselines show no improvement. These results demonstrate that RGSC provides an effective mechanism for search control, improving both efficiency and robustness of AlphaZero training. Our code is available at https://rlg.iis.sinica.edu.tw/papers/rgsc.

HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection

The Fourteenth International Conference on Learning Representations (ICLR), April 2026

Chin-Chia Yang, Yung-Yu Chuang, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

Synthetic image generators evolve rapidly, challenging detectors to generalize across current methods and adapt to new ones. We study domain-incremental synthetic image detection with a two-phase evaluation. Phase I trains on either diffusion- or GAN-based data and tests on the combined group to quantify bidirectional cross-generator transfer. Phase II sequentially introduces renders from 3D Gaussian Splatting (3DGS) head avatar pipelines, requiring adaptation while preserving earlier performance. We observe that CLIP-based detectors inherit text-image alignment semantics that are irrelevant to authenticity and hinder generalization. We introduce a Hilbert-Schmidt Independence Criterion (HSIC) bottleneck loss on intermediate CLIP ViT features, encouraging representations predictive of real versus synthetic while independent of generator identity and caption alignment. For domain-incremental learning, we propose HSIC-Guided Replay (HGR), which selects per-class exemplars via a hybrid score combining HSIC relevance with k-center coverage, yielding compact memories that mitigate forgetting. Empirically, the HSIC bottleneck improves transfer between diffusion and GAN families, and HGR sustains prior accuracy while adapting to 3DGS renders. These results underscore the value of information-theoretic feature shaping and principled replay for resilient detection under shifting generative regimes.

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement

IEEE Transaction on Audio, Speech and Language Processing, February 2026

Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, and Berlin Chen

Hsin-Min Wang

Abstract

Pre-trained models for automatic speech recognition (ASR) and speech enhancement (SE) have exhibited remarkable capabilities under matched noise and channel conditions. However, these models often suffer from severe performance degradation when confronted with domain shifts, particularly in the presence of unseen noise and channel distortions. In view of this, we in this paper present URSA-GAN, a unified and domain-aware generative framework specifically designed to mitigate mismatches in both noise and channel conditions. URSA-GAN leverages a dual-embedding architecture that consists of a noise encoder and a channel encoder, each pre-trained with limited in-domain data to capture domain-relevant representations. These embeddings condition a GAN-based speech generator, facilitating the synthesis of speech that is acoustically aligned with the target domain while preserving phonetic content. To enhance generalization further, we propose dynamic stochastic perturbation, a novel regularization technique that introduces controlled variability into the embeddings during generation, promoting robustness to unseen domains. Empirical results demonstrate that URSA-GAN effectively reduces character error rates in ASR and improves perceptual metrics in SE across diverse noisy and mismatched channel scenarios. Notably, evaluations on compound test conditions with both channel and noise degradations confirm the generalization ability of URSA-GAN, yielding relative improvements of 16.16% in ASR performance and 15.58% in SE metrics.

Cross-Attention Reprogramming for ASR: Bridging Discrete Speech Units and Pretrained Language Models

IEEE Access, January 2026

Pei-Jun Liao, Hung-Yi Lee, and Hsin-Min Wang

Hsin-Min Wang

Abstract

In automatic speech recognition (ASR), an emerging trend involves converting continuous speech features into sequences of discrete speech units (DSUs) via quantization. A key advantage of DSU representations is their compatibility with pretrained language models (PLMs), where DSUs are directly mapped to PLM token indices and the embedding layer is fine-tuned. However, this conventional strategy often relies heavily on large-scale training data to mitigate the inherent modality mismatch. In light of this, we explore a more effective way to exploit the PLM embedding dictionary. Drawing inspiration from Time-LLM, a recent time-series forecasting model, we propose a cross-attention reprogramming mechanism that incorporates codebook information from the DSU quantizer to better align the DSUs with the PLM embeddings. Compared to direct fine-tuning of PLM embeddings, our method consistently achieves improvements on the Discrete Audio and Speech Benchmark (DASB), reaching state-of-the-art performance across most DASB-style settings. We also evaluate our method on LibriSpeech-960, LibriLight-10, and Swedish, Czech, and Hungarian data from Common Voice, and observe similar trends. Notably, the proposed reprogramming method demonstrates significant gains over the fine-tuning baseline, particularly in cross-lingual and low-resource scenarios. This study proposes a new approach to using PLM embedding dictionaries in DSU-based ASR, and lays a foundation for combining speech representations with large language models in other discriminative tasks of speech processing such as speech emotion recognition and spoken question answering.

Can We Formalise Type Theory Intrinsically without Any Compromise? A Case Study in Cubical Agda

Proceedings of the 15th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP '26), January 2026

Liang-Ting Chen, Fredrik Nordvall Forsberg, Tzu-Chun Tsai

Liang-Ting Chen Fredrik Nordvall Forsberg

Abstract

We present an intrinsic representation of type theory in the proof assistant Cubical Agda, inspired by Awodey’s natural models of type theory. The initial natural model is defined as quotient inductive-inductive-recursive types, leading us to a syntax accepted by Cubical Agda without using any transports, postulates, or custom rewrite rules. We formalise some meta-properties such as the standard model, normalisation by evaluation for typed terms, and strictification constructions. Since our formalisation is carried out using Cubical Agda's native support for quotient inductive types, all our constructions compute at a reasonable speed. When we try to develop more sophisticated metatheory, however, the 'transport hell' problem reappears. Ultimately, it remains a considerable struggle to develop the metatheory of type theory using an intrinsic representation that lacks strict equations. The effort required is about the same whether or not the notion of natural model is used.

Efficient Column-Wise N:M Pruning on RISC-V CPU

Journal of Systems Architecture (JSA), March 2026

Chi-Wei Chu, Ding-Yong Hong, Jan-Jan Wu

Chi-Wei Chu Ding-Yong Hong Jan-Jan Wu

Abstract

In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate’s profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as 4×, and preserves ImageNet top-1 accuracy within 2.1% of the dense baseline.

Complete end-to-end learning from protein feature representation to protein interactome inference

GigaScience, November 2025

Yu-Hsin Chen, Chien-Fu Liu, Jun-Yi Leu*, and Huai-Kuang Tsai*

Huai-Kuang Tsai

Abstract

Co-fractionation coupled with mass spectrometry (CF-MS) is a powerful strategy for mapping protein-protein interactions (PPIs) under near-physiological conditions. Despite recent progress, existing analysis pipelines remain constrained by reliance on handcrafted features, sensitivity to experimental noise, and an inherent focus on pairwise interactions, which limit their scalability and generalizability. To address these difficulties, we introduce FREEPII (Feature Representation Enhancement End-to-End Protein Interaction Inference), a unified deep learning framework that integrates CF-MS data with sequence-derived features to learn biologically meaningful protein-level representations for accurate and efficient inference of PPIs and protein complexes. FREEPII employs a convolutional neural network (CNN) architecture to learn protein-level representations directly from raw data, enabling feature sharing across interaction pairs and reducing computational complexity. To enhance robustness against CF-MS noise, protein sequences are introduced as auxiliary input to enrich the feature space with complementary biological cues. The supervised protein embeddings further encode network-level context derived from complex annotations, allowing the model to capture higher-order interactions and enhance the expressive power of protein representations. Extensive benchmarking demonstrates that FREEPII consistently outperforms state-of-the-art CF-MS analysis tools, capturing more biologically coherent and discriminative protein features. Cross-dataset evaluations further reveal that integrating multi-modal data from diverse experimental contexts substantially improves the generalization and sensitivity of data-driven models, offering a scalable, cross-species strategy for reliable protein interaction inference.

GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm

IEEE Transactions on Information Forensics and Security , November 2025

Hanrui Wang, Ching-Chun Chang, Chun-Shien Lu, Christopher Leckie, and Isao Echizen

Chun-Shien Lu

Abstract

Deep neural networks are highly vulnerable to adversarial examples, which are inputs with small, carefully crafted perturbations that cause misclassification—making adversarial attacks a critical tool for evaluating robustness. Existing black-box methods typically entail a trade-o between precision and flexibility: pixel-sparse attacks (e.g., single- or fewpixel attacks) provide fine-grained control but lack adaptability, whereas patch- or frequency-based attacks improve eciency or transferability, but at the cost of producing larger and less precise perturbations. We present GreedyPixel, a fine-grained black-box attack method that performs brute-force-style, per-pixel greedy optimization guided by a surrogate-derived priority map and refined by means of query feedback. It evaluates each coordinate directly without any gradient information, guaranteeing monotonic loss reduction and convergence to a coordinate-wise optimum, while also yielding near white-box-level precision and pixel-wise sparsity and perceptual quality. On the CIFAR-10 and ImageNet datasets, spanning convolutional neural networks (CNNs) and Transformer models, GreedyPixel achieved state-ofthe- art success rates with visually imperceptible perturbations, e ectively bridging the gap between black-box practicality and white-box performance. The implementation is available at https://github.com/azrealwang/greedypixel

Chromosome-Level Genome Assembly and Annotation of the Japanese Cutlassfish (Trichiurus japonicus): A High-Quality Genomic Resource Featuring Nuclear and Mitochondrial Completeness for Future Studies

Scientific Data, November 2025

Po-Cheng Hsu, Chung-Yen Lin, Ping-Heng Hsieh, Wei-Hsuan Chuang, Mei-Yeh Lu, Chaolun A llen Chen, Shu-Hwa Chen

Chung-Yen Lin Ping-Heng Hsieh Shu-Hwa Chen Po-Cheng Hsu Wei-Hsuan Chuang

Abstract

The Japanese cutlassfish (Trichiurus japonicus) is a commercially important marine species across Asia. Here, we present a high-quality, chromosome-level genome assembly generated using PacBio HiFi, Hi-C, and Nanopore ONT reads. The nuclear genome comprised 24 chromosomes with 160 scaffolds totaling 1,138 Mb, with a scaffold N50 of 47.10 Mb and an average scaffold length of 6.18 Mb. A complete mitochondrial genome of 16,796 bp was also assembled, comprising 13 protein-coding and 23 non-coding RNA (ncRNA) genes, with 99.32% sequence identity to the reference in the NCBI database. The nuclear genome encodes 26,541 protein-coding genes (median length: 7,391 base pairs) and 16,383 non-coding RNA (ncRNA) genes. The ncRNA genes account for approximately 0.1694% of the genome's total length. BUSCO analysis indicated 99.4% and 99.2% completeness against the Actinopterygii ortholog set for the genome and proteome. Functional annotation covered 98.15% of genes. Recognized repeat elements and ncRNA regions accounted for 61.10% of the nuclear genome. With high mapping rates from external datasets, this assembly offers a valuable foundation for future sequencing-based studies.