Test-Time Stain Adaptation with Diffusion Models for Histopathology Image Classification
European Conference on Computer Vision (ECCV), September 2024
Cheng-Chang Tsai, Yuan-Chih Chen, and Chun-Shien Lu
Abstract
Stain shifts are prevalent in histopathology images, and typically dealt with by normalization or augmentation. Considering training-time methods are limited in dealing with unseen stains, we propose a test-time stain adaptation method (TT-SaD) with diffusion models that achieves stain adaptation by solving a nonlinear inverse problem during testing. TT-SaD is promising in that it only needs a single domain for training but can adapt well from other domains during testing, preventing models from retraining whenever there are new data available. For tumor classification, stain adaptation by TT-SaD outperforms state-of-the-art diffusion model-based test-time methods. Moreover, TT-SaD beats training-time methods when testing on data that are inaccessible during training. To our knowledge, the study of stain adaptation in diffusion model during testing time is relatively unexplored.
Transcriptomics and gut microbiome analysis of the edible herb Bidens pilosa as a functional feed additive to promote growth and metabolism in tilapia (Oreochromis spp.)
BMC Genomics, August 2024
Che-Chun Chen, Chung-Yen Lin, Hsin-Yun Lu, Chyng-Hwa Liou, Ying-Ning Ho, Chang-Wen Huang, Zhong-Fu Zhang, Chih-Hsin Kao, Wen-Chin Yang, Hong-Yi Gong
Abstract
Background
To reduce the use of antibiotics and chemicals in aquaculture, an edible herb - B. pilosa - has been selected as multifunctional feed additives to address this issue. Although there has been considerable research into the effects of B. pilosa on poultry, the wider effects, particularly on the growth and gut microbiota in fish, remain largely unexplored. We aim to investigate the interactive effects between the host on growth and the gut microbiota using transcriptomics and gut microbiota in B. pilosa-fed tilapia.
Results
In this study, we added 0.5% and 1% B. pilosa to the diet and observed that the growth performance of tilapia was significantly increased after 8 weeks of feeding. Comparative transcriptome analysis was performed on RNA sequence profiles obtained from liver and muscle tissues. Functional enrichment analysis showed that B. pilosa regulates several pathways and genes including amino acid metabolism, lipid metabolism, carbohydrate metabolism, endocrine system, signal transduction and metabolism of other amino acids. The expression of selected growth-associated genes was validated by qRT-PCR. The qRT-PCR result indicated that B. pilosa may enhance growth performance by activating the expression of liver igf1 and muscle igf1rb genes and inhibiting the expression of the muscle negative regulator myostatin b (mstnb). Enhancement of endocrine Igf1/Igf1rb signaling and suppression of Mstn signaling both induced the expression of myogenic regulatory factors (MRFs), myod1, myogenin and mrf4, to promote muscle growth in tilapia. The predicted function of the gut microbiota showed several significantly different pathways that overlapped with the KEGG enrichment results of differentially expressed genes in the liver transcriptomes. This suggests that gut microbiota may be able to influence liver metabolism through the gut-liver axis in B. pilosa-fed tilapia.
Conclusions
In conclusion, dietary B. pilosa can regulate endocrine igf1 signaling and myostatin signaling to activate expression of MRFs to promoter muscle growth, and alter the composition of gut bacteria, which can then affect liver amino acid metabolism, carbohydrate metabolism, the endocrine system, lipid metabolism, metabolism of other amino acids, and signal transduction of the host, ultimately enhancing growth performance. Our results suggest that B. pilosa has the potential to be a functional additive that can be an alternative to reduce antibiotics as growth promoter in aquaculture organisms.
Learnable Layer Selection and Model Fusion for Speech Self-Supervised Learning Models
Interspeech2024, September 2024
Sheng-Chieh Chiu, Chia-Hua Wu, Jih-Kang Hsieh, Yu Tsao, and Hsin-Min Wang
Abstract
In this paper, we investigate methods for fusing feature representations derived from multiple speech self-supervised learning (SSL) models, along with techniques to determine the optimal layer within each model. We evaluate five fusing strategies, finding that temporal interleaved concatenation is the most robust and effective for the SUPERB ASR task. Additionally, we demonstrate that Gumbel layer selection can automatically select the most appropriate SSL layer with better performance than the commonly used weighted sum method. Furthermore, dimension-wise Gumbel layer selection shows promise in adaptive combination of layers of a single SSL model. Finally, we show that joint SSL model fusion and dimension-wise Gumbel layer selection further enhances effectiveness.
Learning Diffusion Models for Multi-View Anomaly Detection
18th European Conference on Computer Vision (ECCV), September 2024
Chieh Liu, Yu-Min Chu, Ting-I Hsieh, Hwann-Tzong Chen and Tyng-Luh Liu
Abstract
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
Interspeech2024, September 2024
Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, and Yu Tsao
Abstract
Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper’s embeddings to create crossdomain acoustic features and includes metadata from speech signals by using a classifier that distinguishes different enhancement methods. Furthermore, MBI-Net+ integrates the hearingaid speech perception index (HASPI) as a supplementary metric into the objective function to further boost prediction performance. Experimental results demonstrate that MBI-Net+ surpasses several intrusive baseline systems and MBI-Net on the Clarity Prediction Challenge 2023 dataset, validating the effectiveness of incorporating Whisper embeddings, speech metadata, and related complementary metrics to improve prediction performance for HA.
Pseudo-Embedding for Generalized Few-Shot 3D Segmentation
18th European Conference on Computer Vision (ECCV), September 2024
Chih-Jung Tsai, Hwann-Tzong Chen and Tyng-Luh Liu
Abstract
Existing generalized few-shot 3D segmentation (GFS3DS) methods typically prioritize enhancing the training of base-class prototypes while neglecting the rich semantic information within background regions for future novel classes. We introduce a novel GFS3DS learner that strategically leverages background context to improve both base prototype training and few-shot adaptability. Our method employs foundation models to extract semantic features from background points and grounds on text embeddings to cluster background points into pseudo-classes. This approach facilitates clearer base/novel class differentiation and generates pseudo prototypes that effectively mimic novel support samples. Comprehensive experiments on S3DIS and ScanNet datasets demonstrate the state-of-the-art performance of our method in both 1-shot and 5-shot tasks. Our approach significantly advances GFS3DS by unlocking the potential of background context, offering a promising avenue for broader applications.
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Interspeech2024, September 2024
Chun Yin, Tai-Shih Chi, Yu Tsao, and Hsin-Min Wang
Abstract
Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance in assessing speaker voice similarity. Experimental results on the Voice Conversion Challenge 2018 and 2020 datasets show that SVSNet+ incorporating WavLM representations shows significant improvements compared to baseline models. In addition, while fine-tuning WavLM with a small dataset of the downstream task does not improve performance, using the same dataset to learn a weighted-sum representation of WavLM can substantially improve performance. Furthermore, when WavLM is replaced by other SFMs, SVSNet+ still outperforms the baseline models and exhibits strong generalization ability.
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen, Jun-Xiong Chong, Tyng-Luh Liu and Chiou-ting Hsu
Abstract
Bridging Actions: Generate 3D Poses and Shapes In-between Photos
IEEE Transactions on Pattern Analysis and Machine Intelligence, April 2024
Wen-Li Wei and Jen-Chun Lin
Abstract
Generating realistic 3D human motion has been a fundamental goal of the game/animation industry. This work presents a novel transition generation technique that can bridge the actions of people in the foreground by generating 3D poses and shapes in-between photos, allowing 3D animators/novice users to easily create/edit 3D motions. To achieve this, we propose an adaptive motion network (ADAM-Net) that effectively learns human motion from masked action sequences to generate kinematically compliant 3D poses and shapes in-between given temporally-sparse photos. Three core learning designs underpin ADAM-Net. First, we introduce a random masking process that randomly masks images from an action sequence and fills masked regions in latent space by interpolation of unmasked images to simulate various transitions under given temporally-sparse photos. Second, we propose a long-range adaptive motion (L-ADAM) attention module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in a sequence, along with a multi-head cross-attention. Third, we develop a short-range adaptive motion (S-ADAM) attention module that weightedly selects and integrates adjacent feature representations at different levels to strengthen temporal correlation. By coupling these designs, the results demonstrate that ADAM-Net excels not only in generating 3D poses and shapes in-between photos, but also in classic 3D human pose and shape estimation.
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024
Cheng-Yao Hong, Yen-Chi Hsu and Tyng-Luh Liu
Abstract
We propose a unified approach to simultaneously addressing the conventional setting of binary deepfake classification and a more challenging scenario of uncovering what facial components have been forged as well as the exact order of the manipulations. To solve the former task, we consider multiple instance learning (MIL) that takes each image as a bag and its patches as instances. A positive bag corresponds to a forged image that includes at least one manipulated patch (i.e., a pixel in the feature map). The formulation allows us to estimate the probability of an input image being a fake one and establish the corresponding contrastive MIL loss. On the other hand, tackling the component-wise deepfake problem can be reduced to solving multi-label prediction, but the requirement to recover the manipulation order further complicates the learning task into a multi-label ranking problem. We resolve this difficulty by designing a tailor-made loss term to enforce that the rank order of the predicted multi-label probabilities respects the ground-truth order of the sequential modifications of a deepfake image. Through extensive experiments and comparisons with other relevant techniques, we provide extensive results and ablation studies to demonstrate that the proposed method is an overall more comprehensive solution to deepfake detection.
Effective Compression of Language Models by Combining Pruning and Knowledge Distillation
IEEE International Conference on Computers, Software, and Applications (COMPSAC), July 2024
Chi-Yu Chiu, Ding-Yong Hong, Pangfeng Liu and Jan-Jan Wu
Abstract
Weight pruning is a prominent model compression technique that removes some weights in a model. However, pruning on transformer models faces a challenge. After pruning, Transformer models require repeating the whole training process, including pre-training on a large generalized data set and fine-tuning on a small downstream data set, to recover the accuracy. The whole training process takes a long time and many computation resources. To address the challenge, we propose a pruning method that combines with knowledge distillation to avoid a long re-training time while recovering the accuracy. We use 2:4 pruning as our basic pruning method. 2:4 pruning is a method proposed by NVIDIA that keeps two larger absolute values in every four consecutive elements in every row in a weight matrix. We generalize 2:4 pruning to N:M pruning which refers to keeping N larger absolute values in every M consecutive elements in every row in a weight matrix. Knowledge distillation is another model compression method that makes a small model, which is referred to as a student, learn from a large model, which is referred to as a teacher. With our method, we use N:M pruning to uniformly prune the model into N:M structure. Next, we use two-stage fine-tuning on the downstream dataset with knowledge distillation. By using our method, the pruned models can achieve comparable accuracy by using only downstream datasets and take much less time than traditional retraining. We run our experiments on SQuAD and GLUE datasets using DistilBERT. The experimental results show that DistilBERT in a 1:4 structure can achieve comparable accuracy on the SQuAD v1.1 and SQuAD v2.0 datasets and 1.7x speedup on inference compared to the original dense model.
A Study on Incorporating Whisper for Robust Speech Assessment
IEEE International Conference on Multimedia and Expo (ICME), July 2024
Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, and Chiou-Shann Fuh
Abstract
This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model. We first investigate the effectiveness of Whisper in deploying a more robust speech assessment model. After that, we explore combining representations from Whisper and SSL models. The experimental results reveal that Whisper's embedding features can contribute to more accurate prediction performance achieved by MOSA-Net+. Moreover, combining the embedding features from Whisper and SSL models only leads to marginal improvement. As compared to intrusive methods, MOSA-Net, and other SSL-based speech assessment models, MOSA-Net+ yields notable improvements in estimating subjective quality and intelligibility scores across all evaluation metrics in Taiwan Mandarin Hearing In Noise test - Quality \& Intelligibility (TMHINT-QI) dataset. To further validate its robustness, MOSA-Net+ was tested in the noisy-and-enhanced track of the VoiceMOS Challenge 2023, where it obtained the top-ranked performance among nine systems.
Generating Attractive and Authentic Copywriting from Customer Reviews
NAACL 2024, June 2024
Yu-Xiang Lin, Wei-Yun Ma
Abstract
The goal of product copywriting is to capture the interest of users and enhance their experience by emphasizing the features of products through text descriptions. As e-commerce platforms offer a wide range of services, it's becoming essential to dynamically adjust the styles of these auto-generated descriptions. Traditional approaches to copywriting generation often rely solely on specified product attributes, which may result in dull and repetitive content. To tackle this issue, we propose to generate copywriting based on customer reviews, as they provide firsthand practical experiences with the product, offering a richer source of information than just product attributes. We have developed a sequence-to-sequence framework, enhanced with reinforcement learning, to produce copywriting that is attractive, authentic, and rich in information. Our framework outperforms all existing baseline and zero-shot large language models, including LLaMA-2-chat-7B and GPT-3.5\footnote{In this work, we use gpt-3.5-turbo-0613. }, in terms of both attractiveness and faithfulness. Furthermore, this work features the use of LLMs for aspect-based summaries collection and argument allure assessment. Experiments demonstrate the effectiveness of using LLMs for marketing domain corpus construction. The dataset will be made available in the future.
Plug-in Language Model: Controlling Text Generation with a Simple Regression Model
NAACL 2024, June 2024
Nai-Chi Yang, Wei-Yun Ma, Pu-Jen Cheng
Abstract
Large-scale pre-trained language models have displayed unrivaled capacity in generating text that closely resembles human-written text. Nevertheless, generating texts adhering to specific conditions without fine-tuning or adding new parameters can be challenging.
Contemporary approaches commonly rely on either prompts or auxiliary models to avoid modifying the language models. These auxiliary models are designed to assess whether a generated token contributes to meeting the desired requirements. These approaches adjust the distribution of the next token during the inference phase by leveraging the prediction score of the desired attribute to calculate gradients. However, these auxiliary models typically require the language model's latent states. This prerequisite challenges integrating various existing black box attribute models or tools.
We present the Plug-in Language Model (PiLM) as a solution to address the limitations. PiLM leverages reinforcement learning to utilize black box tools directly, adjusting the latent state to control text generation. However, performing backpropagation during the inference phase is time-consuming for PiLM. By replacing backpropagation with a simple regression model, PiLM can achieve an inference time comparable to that of the original LLM.
Experiment results show that our approaches in this paper outperform existing state-of-the-art methods that rely on gradient-based, weighted decoding, or prompt-based methodologies.
Automatic Construction of a Chinese Review Dataset for Aspect Sentiment Triplet Extraction via Iterative Weak Supervision
LREC-COLING 2024, May 2024
Chia-Wen Lu, Ching-Wen Yang, Wei-Yun Ma
Abstract
Aspect Sentiment Triplet Extraction (ASTE), introduced in 2020, is a task that involves the extraction of three key elements: target aspects, descriptive opinion spans, and their corresponding sentiment polarity. This process, however, faces a significant hurdle, particularly when applied to Chinese languages, due to the lack of sufficient datasets for model training, largely attributable to the arduous manual labeling process. To address this issue, we present an innovative framework that facilitates the automatic construction of ASTE via Iterative Weak Supervision, negating the need for manual labeling, aided by a discriminator to weed out subpar samples. The objective is to successively improve the quality of this raw data and generate supplementary data. The effectiveness of our approach is underscored by our results, which include the creation of a substantial Chinese review dataset. This dataset encompasses over 60,000 Google restaurant reviews in Chinese and features more than 200,000 extracted triplets. Moreover, we have also established a robust baseline model by leveraging a novel method of weak supervision. Both our dataset and model are openly accessible to the public.
Decoding the genome of bloodsucking midge Forcipomyia taiwana (diptera: Ceratopogonidae): Insights into odorant receptor expansion
Insect Biochemistry and Molecular Biology, May 2024
Ming-Der Lin, Chia-Hsien Chuang, Chih-Hsin Kao, Shu-Hwa Chen, Szu-Chieh Wang, Ping-Heng Hsieh, Guan-Yu Chen, Chun-Chia Mao, Jeng-Yi Li, Mei-Yeh Jade Lu, Chung-Yen Lin
Abstract
Biting midges, notably those within the Ceratopogonidae family, have long been recognized for their epidemiological significance, both as nuisances and vectors for disease transmission in vertebrates. Despite their impact, genomic insights into these insects, particularly beyond the Culicoides genus, remain limited. In this study, we assembled the Forcipomyia taiwana (Shiraki) genome, comprising 113 scaffolds covering 130.4 Mbps—with the longest scaffold reaching 7.6 Mbps and an N50 value of 2.6 Mbps—marking a pivotal advancement in understanding the genetic architecture of ceratopogonid biting midges. Phylogenomic analyses reveal a shared ancestry between F. taiwana and Culicoides sonorensis Wirth & Jones, dating back approximately 124 million years, and highlight a dynamic history of gene family expansions and contractions within the Ceratopogonidae family. Notably, a substantial expansion of the odorant receptor (OR) gene family was observed, which is crucial for the chemosensory capabilities that govern biting midges' interactions with their environment, including host seeking and oviposition behaviors. The distribution of OR genes across the F. taiwana genome displays notable clusters on scaffolds, indicating localized tandem gene duplication events. Additionally, several collinear regions were identified, hinting at segmental duplications, inversions, and translocations, contributing to the olfactory system's evolutionary complexity. Among the 156 ORs identified in F. taiwana, 134 are biting midge-specific ORs, distributed across three distinct clades, each exhibiting unique motif features that distinguish them from the others. Through weighted gene co-expression network analysis, we correlated distinct gene modules with sex and reproductive status, laying the groundwork for future investigations into the interplay between gene expression and adaptive behaviors in F. taiwana. In conclusion, our study not only highlights the unique olfactory repertoire of ceratopogonid biting midges but also sets the stage for future studies into the genetic underpinnings of their unique biological traits and ecological strategies.
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP2024), April 2024
Ryandhimas Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, and Yu Tsao
Abstract
This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudolabel scores from a pretrained model and performing multitask learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (GMOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scratch and using a direct knowledge transfer mechanism. Second, the benefit of the Huber loss for improving the predictive ability of MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.
A formal treatment of bidirectional typing
33rd European Symposium on Programming (ESOP 2024), April 2024
Chen, Liang-Ting and Ko, Hsiang-Shang
Abstract
There has been much progress in designing bidirectional type systems and associated type synthesis algorithms, but mainly on a case-by-case basis. To remedy the situation, this paper develops a general and formal theory of bidirectional typing, and, as a by-product of our formalism, provides a verified generator of proof-relevant type synthesisers for simply typed languages: for every signature that specifies a mode-correct bidirectionally typed language, there exists a proof-relevant type synthesiser that for an input abstract syntax tree constructs a typing derivation if any, gives its refutation if not, or reports that the input does not have enough type annotations. Soundness, completeness, and mode-correctness are studied universally for all signatures, which are sufficient conditions for deriving a type synthesiser. We propose a preprocessing step called mode decoration, which helps the user to deal with missing type annotations in a given abstract syntax tree. The entire development is formalised in Agda and can be further integrated with other language-formalisation frameworks.