Proceedings of Machine Learning Research

CAD-HLLM: Generating Executable CAD from Text with Hierarchical LLM Planning

Thu, 13 Nov 2025 00:00:00 +0000

Translating natural language into precise and executable Computer-Aided Design (CAD) programs remains a challenging task, requiring both semantic understanding and geometric fidelity. In this paper, we present CAD-HLLM, a hierarchical LLM framework for structured CAD command generation. Our approach decomposes the task into two stages: a Plan Generator that infers high-level symbolic plans from text, and a Parameter Completor that generates detailed parametric commands conditioned on both the original description and the inferred plan. To enhance robustness, we introduce a lightweight ensemble selection mechanism that ranks and selects among multiple candidates based on model log-likelihoods. Experiments on benchmark datasets show that our method outperforms existing baselines in both parametric precision and 3D shape similarity, demonstrating the effectiveness of hierarchical reasoning and LLM-based planning in bridging the gap between human design intent and executable CAD sequences.

Direct Quantized Training of Language Models with Stochastic Rounding

Thu, 13 Nov 2025 00:00:00 +0000

Although recent quantized Large Language Models, such as BitNet, have paved the way for significant reduction in memory usage during deployment with binary or ternary weights, training these models still demands substantial memory footprints. This is partly because high-precision (i.e., unquantized) weights required for straight-through estimation must be maintained throughout the whole training process. To address this, we explore directly updating the quantized low-precision weights without relying on straight-through estima- tion during backpropagation, aiming to save memory usage during training. Specifically, we employ a stochastic rounding technique to minimize the information loss caused by the use of low-bit weights throughout training. Experimental results on our LLaMA-structured models of various sizes indicate that (1) training with only low-precision weights is feasible even when they are constrained to ternary values; (2) extending the bit width to 8 bits achieves performance on par with BitNet b1.58; (3) our models remain robust to precision scaling and memory reduction, showing minimal performance degradation when moving from FP32 to lower-memory environments (BF16/FP8); and (4) our models also support inference using ternary weights, showcasing their flexibility in deployment.

RSTSIC: Reparameterized Swin Transformer Stereo Image Compression

Thu, 13 Nov 2025 00:00:00 +0000

Stereo image compression (SIC) aims to enhance compression performance and efficiency by exploiting cross-view redundancy in overlapping fields between stereo images. However, current SIC methods faces practical limitations in adequately exploiting inter-view correlations and contextual information due to occlusions, disparity variations, and computational overhead. To effectively extract contextual information and efficiently model cross-view dependencies in stereo images, we propose a novel distributed stereo image compression framework, Reparameterized Swin Transformer Stereo Image Compression (RSTSIC) integrating Reparameterized Swin Block (RSB) and Cross Feature Enhancement Modules (CFEMs) in the joint decoder. CFEMs progressively aggregate cross-view dependencies and enhance cross feature interaction efficiency. RSB integrates window-based self-attention with convolutional operations to effectively leverage non-local contextual information, while maintaining inference efficiency through structural reparameterization. RSTSIC outperforms traditional codecs and deep stereo compression methods on both Cityscapes and InStereo2K datasets, with at least 58.57% reduction in model parameters and 36.43% decrease in FLOPs compared to state-of-the-art compression models. Ablation studies confirm the necessity of CFEMs and RSB for efficient compression and perceptual fidelity. Our code is available at https://github.com/SnowBlind0/RSTSIC.

FG-MSTGNN: Cross-subject EEG Emotion Recognition via Frequency-guided Multi-period Spatial-temporal Graph Neural Network

Thu, 13 Nov 2025 00:00:00 +0000

Accurate decoding of emotional EEG signals constitutes a critical challenge for developing affective brain-computer interfaces. Contemporary methods for cross-subject EEG-based emotion recognition confront two critical challenges: 1) inadequate investigation of the distinct affective features of the EEG rhythm; 2) insufficient capability to extract the various neurophysiological connectivity patterns across subjects in the same experimental setting. To address these limitations, we propose FG-MSTGNN, a dual-stage adaptive learning framework comprising the Frequency-guided Multi-period Spatial-temporal Graph Neural Network. The Feature Learning Stage utilizes a Multi-period Time-Frequency Cooperative Encoder Module to hierarchically extract cross-frequency rhythmic dynamics. The Topology Optimization Stage utilizes a Dual-Phase Graph Pooling Module to dynamically generate personalized sparse neurophysiological connectivity patterns. Systematic evaluation under cross-subject experiments demonstrates the framework achieves average classification accuracies of 94.67% and 85.28% on SEED and SEED-IV respectively, showing statistically distinctive improvements over state-of-the-art EEG emotion recognition methods. The proposed framework reveals that both functional brain network topology and EEG spectral dynamics varies from different emotional states.

Data-Centric Graph Condensation via Diffusion Matching

Thu, 13 Nov 2025 00:00:00 +0000

This paper introduces Data-Centric Graph Condensation (named DCGC), a task- and model-agnostic method for condensing a large graph into a smaller one by matching the distribution between two graphs. DCGC defines the distribution of a graph as the trajectories of its node signals (such as node features and node labels) induced by a diffusion process over the geometric structure, which accommodates multi-order structural information. Built upon this, DCGC compresses the topological knowledge of the original graph into the orders-of-magnitude smaller synthetic one by aligning their distributions in input space. Compared with existing methods that stick to particular GNN architectures and require solving complicated optimization, DCGC can be flexibly applied to arbitrary off-the-shelf GNNs and achieve graph condensation with a much faster speed. Apart from the cross-architecture generalization ability and training efficiency, experiments demonstrate that DCGC yields consistently superior performance than existing methods on datasets with varying scales and condensation ratios.

JurisGraph Insight Engine 1.0v: A Legal Question Answering System Based on Large Language Models and Knowledge Graphs

Thu, 13 Nov 2025 00:00:00 +0000

The extraction and effective utilization of judicial data remains a major challenge in the legal domain. There is a growing mismatch between the public’s demand for accessible legal services and the high cost and complexity of legal consultations, which also affects the efficiency of legal professionals when handling case inquiries. Traditional keyword-based search methods lack professionalism, interpretability, and scalability. In this paper, we propose JurisGraph Insight Engine 1.0v, an intelligent legal question-answering (QA) system that integrates large language models (LLMs) and domain-specific knowledge graphs. We first construct a comprehensive Criminal Law Knowledge Graph (CLKG) containing 483 types of criminal offenses, and develop two unified heterogeneous subgraphs for theft and drug-related cases. Then, we fine-tune a domain-specific legal LLM, LawM, using a curated corpus of over 280,000 Chinese legal records covering multiple legal NLP tasks. Finally, we design and implement a QA system that leverages both the knowledge graph and LawM to deliver accurate and interpretable answers to legal questions. Experimental results show that our system achieves 95% accuracy, effectively lowering the barrier to legal knowledge access for the general public while improving decision efficiency for legal practitioners.

AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks

Thu, 13 Nov 2025 00:00:00 +0000

Adversarial attacks on Graph Neural Networks aim to perturb the performance of the learner by carefully modifying the graph topology and node attributes. Existing methods achieve attack stealthiness by constraining the modification budget and differences in graph properties. However, these methods typically disrupt task-relevant primary semantics directly, which results in low defensibility and detectability of the attack. In this paper, we propose an Adversarial Attack on High-level Semantics for Graph Neural Networks (AHSG), which is a graph structure attack model that ensures the retention of primary semantics. By combining latent representations with shared primary semantics, our model retains detectable attributes and relational patterns of the original graph while leveraging more subtle changes to carry out the attack. Then we use the Projected Gradient Descent algorithm to map the latent representations with attack effects to the adversarial graph. Through experiments on robust graph deep learning models equipped with defense strategies, we demonstrate that AHSG outperforms other state-of-the-art methods in attack effectiveness. Additionally, using Contextual Stochastic Block Models to detect the attacked graph further validates that our method preserves the primary semantics of the graph.

MagicMask: A Fast and High-fidelity Face Swapping Method Robust to Face Pose

Thu, 13 Nov 2025 00:00:00 +0000

Recent face-swapping methods excel under controlled conditions but often fail when presented with extreme facial poses. Diffusion-based approaches may be able to overcome these issues, but they still face significant computational costs. This paper introduces MagicMask, a novel face-swapping framework that robustly handles various poses in real time by fusing visual and geometric information. Our method incorporates explicit, identity-adapted geometric cues into the latent feature space via a multi-head attention mechanism. It employs an Adversarial Facial Silhouette Alignment (AFSA) loss to preserve detailed facial boundaries adapted to source identity. Comprehensive experiments on multiple benchmarks demonstrate that MagicMask competes with state-of-the-art methods under standard conditions and significantly outperforms them in extreme pose scenarios. The source code for the demonstration of MagicMask is attached as supplementary materials.

Explainable Dynamic Graph Neural Networks for Predictive Maintenance in Vehicle Chassis Systems

Thu, 13 Nov 2025 00:00:00 +0000

Predictive maintenance is essential for commercial vehicle fleets to reduce unexpected downtime and emergency repair costs. While standardized fault codes (SPN/FMI, representing Suspect Parameter Numbers and Failure Mode Indicators) assist in diagnosis, their temporal and spatial inconsistency limits the effectiveness of conventional time-series models in identifying high-cost failures. We propose a Hybrid Node-level Relationship-based Graph Convolutional Network with Random Forest (NRP-GCN-RF), which encodes fault interactions as graphs to capture non-temporal dependencies. Built on a real-world dataset from Dongfeng Motor Corporation, our study follows a dual-task design. (1) We construct a predictive model using graph neural networks (GCN) and random forests (RF) to forecast emergency repair costs and fault categories based on chassis-level fault sequences. (2) In parallel, we apply the Apriori algorithm to mine frequent co-occurring SPN-FMI pairs, revealing interpretable fault patterns and subsystem-level dependencies. This interpretable analysis complements the graph-based model by supporting feature design and failure diagnostics. Experiments show that our approach achieves 98.93% accuracy, raises high-cost failure precision from 60% to 95%, and improves recall by 25%, offering a robust and explainable solution for predictive maintenance in commercial fleets.

Sampling Boundary for Causal Effect Estimation

Thu, 13 Nov 2025 00:00:00 +0000

In causal effect estimation, determining the appropriate sampling size is critical for ensuring reliability and validity in both experimental and observational studies, a challenge closely tied to robust model generalization under limited data conditions in machine learning. This paper tackles these challenges by leveraging the Probably Approximately Correct (PAC) theory to establish a theoretically grounded framework for determining sampling boundaries. We utilize Hoeffding’s inequality and Vapnik–Chervonenkis (VC) dimension to set upper boundaries for dataset adequacy in diverse scenarios: no confounders, confounders with a finite hypothesis space, and confounders with an infinite hypothesis space. Our work ensures that if the dataset size exceeds the upper boundary, the error probability for the estimated causal effect stays within a specified threshold at the given confidence level. Additionally, we demonstrate that when the dataset size is inadequate, the error of the estimated average treatment effects is bounded by the estimation of the outcome variable, which forms the theoretical basis for data augmentation strategies to improve the accuracy of causal effect estimation. Extensive experiments on synthetic and semi-synthetic datasets validate the correctness of our presented sampling upper limitations under different error and confidence level constraints. Our findings not only offer a systematic and reliable method for determining sample size in causal effect estimation but also provide actionable guidance for developing causal inference models in data-scarce environments, enhancing their applicability and robustness across fields such as healthcare, social sciences, and policy evaluation.

SEMINAR: SEMantic InformatioN Augmented JailbReak Attack in LLM

Thu, 13 Nov 2025 00:00:00 +0000

Large Language Models (LLMs) have been widely adopted in real-world applications, yet their safety remains a major concern, particularly regarding jailbreak attacks that bypass alignment safeguards to elicit harmful outputs. Among various attack strategies, optimization-based jailbreak attacks have emerged as a primary approach by designing specialized loss functions to optimize adversarial suffixes added after the harmful question. However, existing methods often suffer from poor generalization and over-refusal issues due to overly fixed optimization targets, which significantly undermine the utility of jailbreak attempts by yielding generic denials (e.g., "Sorry, I can’t assist with that") rather than harmful completions. These issues fundamentally stem from the rigid exact match constraint in their loss design. To address this, we propose SEMINAR, a novel semantic information-augmented optimization framework that promotes diverse and semantically aligned affirmative responses. Specifically, we leverages semantic-level supervision to guide the optimization toward intent-consistent outputs rather than rigid templates by introducing a non-exact match loss based on semantic similarity. Furthermore, we mitigate the token shift problem—the generation of LLM highly depends on the correctness of the first few tokens, but the loss is averaged over the entire sequence, which leads to insufficient attention paid to the early tokens in the optimization—by introducing a cosine decay scheduling mechanism that emphasizes the early tokens in the sequence into the optimization process. As a result, SEMINAR not only enhances the diversity of affirmative responses generated by LLMs but also significantly improves overall attack effectiveness. Extensive experiments demonstrate the superiority of SEMINAR over baseline methods, along with its strong transferability across different models.

Local Shuffled Skeleton Position Embedding Vision Transformer for Human Activity Recognition

Thu, 13 Nov 2025 00:00:00 +0000

Vision Transformers (ViTs) in human activity recognition tasks suffer from inadequate spatial modeling through conventional position embeddings, leading to over-reliance on fixed positional information. This paper proposes Shuffled Positional Embedding (SPE), a mechanism that randomly disrupts the order of positional encoding during each forward propagation, reducing model dependence on position embedding and encouraging exploration of intrinsic spatial relationships. While SPE enhances general spatial awareness, it lacks targeted guidance for human-centric modeling. To address this limitation, Local Shuffled Skeleton Position Embedding (LSSPE) is developed, which leverages 2D skeleton data to provide human body structure-aware spatial representation. LSSPE computes attention weights based on spatial distances between image patches and skeleton keypoints, incorporating joint motion amplitudes for enhanced modeling. To further utilize skeleton data, a dual-stream architecture is designed combining TimeSFormer with LSSPE (LSSPE-TimeSFormer) for RGB processing and SkateFormer for skeleton processing. The proposed dual-stream model achieves outstanding performance of 95.8% and 98.7% accuracy on NTU RGB+D cross-subject and cross-view settings, establishing the effectiveness of skeleton-aware position embedding for human activity recognition.

Learning Curves of Classification Metrics based on Confusion Matrices

Thu, 13 Nov 2025 00:00:00 +0000

Learning curves of classification metrics, including test error, precision (P), recall (R), F$_1$ score, with regard to training set sizes are a recent hot topic in developing an advanced methodology of model selection and hyperparameter optimization. The existing studies concentrated on formulating the functional shapes of the well-behaved learning curves of test error by using a normality assumption. However, the normality assumption is unreasonable for learning curves of classification metrics because the distributions of most classification metrics, such as P, R, and F$_1$ score, are skewed, and interval estimations of the metrics based on the normality assumption may exceed [0,1]. In this study, considering most classification metrics are obtained from confusion matrices, we develop a novel method to formulate the learning curves of classification metrics by considering that the four entries in a confusion matrix jointly follow a multi-nomial distribution rather than a normality distribution. Furthermore, the function of each entry in a confusion matrix with regard to training set sizes is formulated with an exponential form. Thus, the learning curves of a classification metric can be naturally obtained by transforming the functions of a confusion matrix in terms of the definition of the metric. Moreover, reasonable confidence bands of several popular metrics, including test error, P, R, and F$_1$ score, are derived in this study based on the assumption of the multi-nomial distribution of a confusion matrix. Extensive experiments are conducted on several synthetic and real-world data sets coupled with multiple typical non-neural and neural classification algorithms. Experimental results illustrate the improvements of the proposed learning curves of test error, P, R, and F$_1$ score and the superiority of the confidence bands.

EIKEA:Enhancing In-Context Knowledge Editing by Agents

Thu, 13 Nov 2025 00:00:00 +0000

Recent knowledge editing methods have predominantly concentrated on modifying structured triplet knowledge within large language models. Compared to triplet-based knowledge, unstructured knowledge contains richer and more interrelated information, which increases the difficulty of editing. When relying solely on parameter-based editing methods, similar knowledge may interfere with each other due to their semantic overlap. Although previous studies have shown that directly applying in-context editing to unstructured knowledge with better results than parameter-based approaches, there is still considerable room for improvement. Previous studies have found that large language models are highly sensitive to the sequence of long text information,even the core content of the text may be masked due to positional influence. This indicates that, after rewriting unstructured facts, LLMs(Large Language Models) are better able to process and utilize the rewritten facts than the original facts. Inspired by this idea, we propose EIKEA(Enhancing In-Context Knowledge Editing by Agents), a novel method that combines rewriting agent with IKE (In-Context Knowledge Editing), enabling language models to effectively internalize unstructured factual updates without modifying model parameters. We conduct comprehensive experiments on the WIKIUPDATE subset of the AKEW benchmark, demonstrating that our method significantly improves editing accuracy over baseline IKE and parameter-editing methods. Our method provides a practical, lightweight, and scalable solution to unstructured knowledge editing.

MKD: Multi-Knowledge Distillation for Real-Time Object Detection on Edge Devices

Thu, 13 Nov 2025 00:00:00 +0000

Real-time object detection on resource-constrained edge devices presents a significant challenge in balancing performance and efficiency. This paper introduces a novel knowledge distillation framework designed to enhance the capabilities of lightweight student models for object detection tasks. Our approach, Multi-Scale Frequency-Aware Distillation (MSFAD), integrates three key components: multi-scale distillation, frequency domain mask distillation, and feature alignment distillation. Multi-scale distillation enables the student to learn feature representations at various levels of granularity. Frequency domain mask distillation improves the student’s ability to focus on relevant regions. Feature alignment distillation facilitates the transfer of channel-wise knowledge from teacher to student. We combine these techniques with a traditional detection loss to form a comprehensive loss function, balanced by a hyperparameter $\\alpha$. Experimental results across various scenarios demonstrate that MSFAD significantly improves detection accuracy while reducing computational and storage costs. Our approach clearly presents significant performance gains and faster inference speeds.

Overcoming Domain Knowledge Forgetting in Continual Test-Time Adaptation via Siamese Networks

Thu, 13 Nov 2025 00:00:00 +0000

Test-Time Adaptation (TTA) requires adapting a source-domain model to the target domain using online test data inputs. Existing methods that focus on adjusting normalization layers to swiftly adapt to a new domain often neglect the problem of domain knowledge forgetting, which hinders the model’s generalization capability. To address this, we propose a novel Anti-forgetting Test-time Adaptation Network (ATAN) which consists of three Siamese networks—Forerunner, Bridge and Momentum. The bridge network transfers domain-specific knowledge from the forerunner network to the momentum network which effectively overcomes forgetting by integrating cross-domain knowledge. To further enhance the adaptability of the forerunner network, we propose reconstructing its loss function based on the voting information from the Siamese networks. To strengthen the learning of domain-invariant features, we introduce a weak augmentation consistency loss for the bridge network. Extensive experiments on corruption and natural shift datasets demonstrate the effectiveness and generalization of ATAN in long-term test-time domain adaptation scenarios.

CrossPyEval: Enhancing LLM-based Evaluation of Low-Resource Code via Code Translation

Thu, 13 Nov 2025 00:00:00 +0000

Large language models (LLMs) have demonstrated remarkable performance in code generation and evaluation tasks, particularly for Python, which dominates the pre-training corpora. However, the evaluation of code in low-resource programming languages remains challenging due to limited data and suboptimal model alignment. In this paper, we propose CrossPyEval, a novel cross-language code evaluation framework that uses an LLM to translate code from other languages into Python, verifies consistency with an SMT solver, and then analyzes the translated code via abstract syntax trees before performing the final evaluation. Experiments on public benchmarks and our custom low-resource datasets demonstrate that CrossPyEval substantially boosts evaluation accuracy for non-Python languages, achieving up to an 8.83% improvement, and significantly enhances alignment with human judgments, with the Kendall correlation rising to as high as 0.689.

Data-dependent Algorithmic Robustness Analysis of Pairwise Learning

Thu, 13 Nov 2025 00:00:00 +0000

This paper develops a new framework to understand generalization for pairwise learning problems, which covers many popular machine learning problems as specific examples. By integrating robust optimization principles with pairwise loss structures, we establish data-dependent generalization bounds that significantly improve over existing approaches. Our method overcomes key limitations of prior work by leveraging observable training data properties rather than restrictive theoretical assumptions. This results in tighter performance guarantees that better reflect real-world learning behavior, particularly for complex datasets with dependent training pairs.

Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Model

Thu, 13 Nov 2025 00:00:00 +0000

Vision Language Models (VLMs) have demonstrated impressive capabilities in integrating visual and textual information for understanding and reasoning, but remain highly vulnerable to adversarial attacks. While activation steering has emerged as a promising defence, existing approaches often rely on task-specific contrastive prompts to extract harmful directions, which exhibit suboptimal performance and can degrade visual grounding performance. To address these limitations, we propose \\textit\{Sequence-Level Preference Optimization\} for VLM (\\textit\{SPO-VLM\}), a novel two-stage defense framework that combines activation-level intervention with policy-level optimization to enhance model robustness. In \\textit\{Stage I\}, we compute adaptive layer-specific steering vectors from diverse data sources, enabling generalized suppression of harmful behaviors during inference. In \\textit\{Stage II\}, we refine these steering vectors through a sequence-level preference optimization process. This stage integrates automated toxicity assessment, as well as visual-consistency rewards based on caption-image alignment, to achieve safe and semantically grounded text generation. The two-stage structure of SPO-VLM balances efficiency and effectiveness by combining a lightweight mitigation foundation in Stage I with deeper policy refinement in Stage II. Extensive experiments shown SPO-VLM enhances safety against attacks via activation steering and preference optimization, while maintaining strong performance on benign tasks without compromising visual understanding capabilities. We will release our code, model weights, and evaluation toolkit to support reproducibility and future research. \\textcolor\{red\}\{Warning: This paper may contain examples of offensive or harmful text and images.\}

Kernel-Based Enhanced Oversampling Method for Imbalanced Classification

Thu, 13 Nov 2025 00:00:00 +0000

This paper introduces a novel oversampling technique designed to improve classification performance on imbalanced datasets. The proposed method, Kernel-Weighted SMOTE (KWSMOTE), enhances the traditional SMOTE algorithm by employing a kernel-based weighting scheme to prioritize closer neighbors, which guides a convex combination that ensures the generated samples are geometrically bounded. This dual-mechanism approach generates synthetic samples that better represent the minority class. Through experiments on multiple real-world datasets, we demonstrate that KWSMOTE outperforms existing methods in terms of F1-score, G-mean, and AUC, providing a robust solution for handling imbalanced datasets in classification tasks.

FeelNet: A Lightweight Fast Fourier Transform EEG-based Emotion Recognition Network

Thu, 13 Nov 2025 00:00:00 +0000

Emotion recognition using Electroencephalography (EEG) is challenging due to its low signal-to-noise ratios and high-dimensional sparsity. We propose FeelNet, a novel Fast Fourier Transform (FFT)-based architecture that simultaneously extracts global and local features across joint frequency-time domains. FeelNet incorporates an adaptive Rhythm Spectral Block (RSB) for capturing key frequency patterns and filtering task-irrelevant noise through power spectral thresholding. Additionally, the Multi-scale Temporal Conv Block (MTCB) enhances the model’s ability to decode complex temporal dynamics. Extensive evaluations on the DEAP and DREAMER datasets demonstrate that FeelNet outperforms existing state-of-the-art methods in accuracy and flexibility, even under noise-contaminated conditions. Owing to its computational efficiency and noise resilience, FeelNet provides an alternative perspective for EEG-based affective computing.

Relaxed Transition Kernels can Cure Underestimation in Adversarial Offline Reinforcement Learning

Thu, 13 Nov 2025 00:00:00 +0000

Offline reinforcement learning (RL) trains policies from pre-collected data without further environment interaction. However, discrepancies between the dataset and true environment—particularly in the state transition kernel—can degrade policy performance. To simulate environment shifts without being overly conservative, we introduce a relaxed state-adversarial method that perturbs the policy while applying a controlled relaxation mechanism. This method improves robustness by interpolating between nominal and adversarial dynamics. Theoretically, we provide a performance lower bound; empirically, we show improved results across challenging offline RL benchmarks. Our approach integrates easily with existing model-free algorithms and consistently outperforms baselines, especially in high-difficulty domains like Adroit and AntMaze.

Beyond UDA: Examining Temporal and Frequency Representations in Time Series Transfer

Thu, 13 Nov 2025 00:00:00 +0000

In time-series unsupervised domain adaptation (UDA), the adaptation between temporal and frequency domain features has been relatively underexplored. To address this gap, we conduct a comprehensive series of experiments to revisit the roles of these domains in UDA. Our findings reveal that the temporal domain contains more diverse features, offering higher discriminability, while the frequency domain is more domain-invariant, providing better transferability. Combining the strengths of both domains, we propose TF-DAN, a UDA framework that synergistically integrates temporal and frequency domain features. TF-DAN enhances feature extraction and captures subtle, class-specific features without relying on traditional alignment strategies. By utilizing simple hyperparameter adjustments and using frequency embeddings from the source domain as reference points for domain adaptation, TF-DAN achieves nearly a 10% improvement across five benchmark datasets in time-series UDA. This research highlights the unique strengths of both domains and marks a paradigm shift in UDA methods, showcasing TF-DAN’s robust performance in real-world applications. Codes can be found in the additional material folder.

TLSD: Breaking the Limit of Topological Lane Mapping with Graph Knowledge and Distance Awareness

Thu, 13 Nov 2025 00:00:00 +0000

High-Definition (HD) maps are essential for both Advanced Driver-Assistance Systems (ADAS) and autonomous driving. However, offline HD map construction remains costly and challenging to maintain due to the dynamic nature of real-world environments. Consequently, online HD map generation using onboard sensors has become a key area of research. Despite recent advancements, existing deep learning-based methods often provide inaccurate output even using computationally heavy architectures, limiting their practicality for real-world applications. We introduce TLSD, an efficient end-to-end neural network that generates HD maps, incorporating both topological and geometric road information. To enhance both accuracy and efficiency, we introduce four key innovations: (1) an iterative refinement scheme within the decoder to progressively improve map predictions, (2) a group-wise one-to-many assignment strategy that accelerates training convergence, (3) a graph neural network (GNN) module that integrates lane segment coordinates for improved spatial reasoning, and (4) a distance-aware topological post-processing method that enhances the quality of connectivity outputs. We performed extensive experiments % on the widely used OpenLane-V2 benchmark and showed that TLSD achieves a significant improvement in OLUS score compared to existing methods, setting a new state-of-the-art benchmark, producing accurate HDMaps, and a connectivity graph. In particular, TLSD outperforms previous methods on the lane segment perception task (+3.13 in OLUS) and the lane centerline perception task (+3.20 in OLS), demonstrating superior performance in lane-based HD map generation. In addition, we introduce an efficient version, eTLSD, which incorporates a lightweight ResNet-18 backbone and still achieves competitive results, outperforming previous ResNet-50-based methods.

Label-Perceptive Adversarial Domain Adaptation for Named Entity Recognition in Traditional Chinese Medicine: Dataset and Approach

Thu, 13 Nov 2025 00:00:00 +0000

In the field of Traditional Chinese Medicine (TCM), Named Entity Recognition (NER) is a crucial task. However, the scarcity of NER datasets in TCM significantly hampers the performance of models in this domain. A promising approach to addressing this low-resource issue is through domain adaptation techniques. Current domain adaptation methods typically leverage large amounts of labeled data from a source domain to bridge the gap between the source and target domains, making the features of the generated target domain data as similar as possible to those of the source domain, thereby enhancing model performance in the target domain. However, existing methods primarily focus on aligning textual features and neglect the importance of label information. In the NER task, labels not only indicate categories but also carry important categorical information. Therefore, this paper proposes a Label-Perceptive Adversarial Domain Adaptation (LPADA) method that integrates label information with textual features, providing additional contextual information for the domain adaptation process, thus enhancing the model’s performance in the TCM domain. Furthermore, we annotate medical case records to construct a dataset TCMNER2024 and establish a baseline. TCMNER2024 dataset can be accessed via https://github.com/TCMNER/TCMNER2024. The evaluation demonstrates that our approach significantly outperforms existing methods.

CAP: Conformalized Abstention Policies for Context-Adaptive Risk Management for LLMs and VLMs

Thu, 13 Nov 2025 00:00:00 +0000

Large Language and Vision-Language Models (LLMs/VLMs) are increasingly deployed in high-stakes domains where predictive failures can be costly. Conformal Prediction (CP) offers distribution-free uncertainty quantification with finite-sample coverage guarantees, but its reliance on a globally fixed risk level enforces a uniform trade-off between coverage and informativeness, misaligned with the instance-specific uncertainty patterns of modern foundation models. We propose the framework of Conformalized Abstention Policy (CAP), a novel framework that integrates CP with deep Reinforcement Learning (RL) to learn per-instance abstention policies. CAP trains a utility-driven policy to dynamically select the conformal risk level for each input, balancing point prediction, set prediction, and full abstention based on downstream utility. We specifically introduce Policy-Calibrated Coverage, a theoretical guarantee ensuring that the empirical coverage of the learned policy reliably estimates its true expected performance. Extensive experiments show that CAP maintains the 90% target coverage while substantially outperforming static CP baselines: improving hallucination detection AUROC by up to 22.2%, uncertainty-guided selective generation AUARC by 21.2%, and reducing calibration error by over 70%. CAP also extends to free-form generation by managing the trade-off between a detailed and factual response on a per-instance basis by learning an optimal risk level for sub-claim retention.

Target Return Optimizer for Multi-Game Decision Transformer

Thu, 13 Nov 2025 00:00:00 +0000

Achieving autonomous agents with robust generalization capabilities across diverse games and tasks remains one of the ultimate goals in AI research. Recent advancements in transformer-based offline reinforcement learning, exemplified by the Multi-Game Decision Transformer, have shown remarkable performance across various games or tasks. However, these approaches depend heavily on human expertise, presenting substantial challenges for practical deployment, particularly in scenarios with limited prior game-specific knowledge. In this paper, we propose an algorithm called Multi-Game Target Return Optimizer (MTRO) to autonomously determine game-specific target returns within the Multi-Game Decision Transformer framework using solely offline datasets. MTRO addresses the existing limitations by automating the target return configuration process, leveraging environmental reward information extracted from offline datasets. Notably, MTRO does not require additional training, enabling seamless integration into existing Multi-Game Decision Transformer architectures. Our experimental evaluations on Atari games demonstrate that MTRO enhances the performance of RL policies across a wide array of games, underscoring its potential to advance the field of autonomous agent development.

Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Thu, 13 Nov 2025 00:00:00 +0000

The increasing context window size in large language models (LLMs) has improved their ability to handle complex, long-text tasks. However, as the conversation rounds continue, it is required to store a large amount of KV cache in GPU memory, which significantly affects the efficiency and even availability of the model serving systems. This paper analyzes dialogue data from real users on the granularity of round and discovers that the LLM inference manifests a watershed layer, after which the distribution of round-level attention shows notable similarity. Based on this, we propose Round Attention - a novel round-level attention mechanism that selectively processes the KV cache of top-k relevant rounds, where k is dynamically determined through the attention matrix in the watershed layer. Theoretical analysis demonstrates that our method reduces memory usage by 54% to 82%, while experimental results confirm that loading sparse critical-round KV cache maintains answer accuracy without performance degradation.

Graph Mediator Networks Bridging Local and Global Semantics via Serial Message Passing

Thu, 13 Nov 2025 00:00:00 +0000

Graph Neural Networks (GNNs) have achieved remarkable success in modeling structured data through local message passing. However, their effectiveness diminishes on graphs with low homophily or irregular structures, where long-range dependencies are hard to capture and features tend to suffer from over-smoothing and noise amplification. To address these limitations, we propose GMN, a novel dual-path Graph Mediator Network that explicitly enhances both global information propagation and spectral stability. In the spatial path, GMN introduces a lightweight Mediator node connected to all graph nodes, allowing long-range communication to occur in a single hop without increasing network depth. In parallel, the spectral path leverages multi-scale Chebyshev filtering along with a spectral energy regularization term that suppresses high-frequency noise, leading to smoother and more stable node embeddings. These two complementary pathways are adaptively integrated via a gated fusion mechanism, which dynamically balances their contributions based on structural context. Final graph-level representations are obtained through task-specific pooling strategies, enabling GMN to generalize effectively across different tasks. Extensive experiments on benchmark datasets with varying homophily levels and structural perturbations demonstrate that GMN consistently achieves state-of-the-art performance in terms of accuracy, robustness, and generalization. Code is available at: https://github.com/sun2017bupt/GMN.

FIRM: Fusion-Injected Residual Memory Brings Token-Level Alignment to Unsupervised VI-ReID

Thu, 13 Nov 2025 00:00:00 +0000

Unsupervised visible-infrared person re-identification (VI-ReID) presents unique challenges due to severe modality discrepancies, including heterogeneous appearance gaps, semantic granularity mismatches, and pseudo-label noise amplification intrinsic to label-free scenarios. We distill these challenges into two core problems: fine-grained semantic alignment, which necessitates explicit token-level cross-modal feature fusion, and memory fragmentation caused by noisy pseudo-label propagation. To address these issues, we propose Fusion-Injected Residual Memory (FIRM), a unified framework that integrates Vision–Semantic Prompt Fusion (VSPF), which injects multi-scale textual cues derived from CLIP and large language models into multiple layers of a vision backbone for token-wise semantic alignment, and Evolving Multi-view Cluster Memory (EMCM), which employs optimal transport–guided clustering and dynamic prototype maintenance to ensure long-term identity consistency. The framework is optimized end-to-end using an optimal transport–weighted InfoNCE loss, a multi-layer alignment regularizer, and geometric cluster regularization, all without reliance on manual annotations. Extensive experiments on benchmark VI-ReID datasets demonstrate that the proposed method substantially advances unsupervised cross-modal retrieval performance, achieving new state-of-the-art results. Ablation studies further verify the independent and synergistic effectiveness of both modules in overcoming the identified core challenges.

Punching Above Precision: Small Quantized Model Distillation with Learnable Regularizer

Thu, 13 Nov 2025 00:00:00 +0000

Quantization-aware training (QAT) combined with knowledge distillation (KD) is a promising strategy for compressing Artificial Intelligence (AI) models for deployment on resource-constrained hardware. However, existing QAT-KD methods often struggle to balance task-specific (TS) and distillation losses due to heterogeneous gradient magnitudes, especially under low-bit quantization. We propose Game of Regularizer (GoR), a novel learnable regularization method that adaptively balances TS and KD objectives using only two trainable parameters for dynamic loss weighting. GoR reduces conflict between supervision signals, improves convergence, and boosts the performance of small quantized models (SQMs). Experiments on image classification, object detection (OD), and large language model (LLM) compression show that GoR consistently outperforms state-of-the-art QAT-KD methods. On low-power edge devices, it delivers faster inference while maintaining full-precision accuracy. We also introduce QAT-EKD-GoR, an ensemble distillation framework that uses multiple heterogeneous teacher models. Under optimal conditions, the proposed EKD-GoR can outperform full-precision models, providing a robust solution for real-world deployment.

Information-Based Exploration via Random Features for Reinforcement Learning

Thu, 13 Nov 2025 00:00:00 +0000

Representation learning has enabled classical exploration strategies to be extended to deep Reinforcement Learning (RL), but often makes algorithms more complex and theoretical guarantees harder to establish. We introduce Random Feature Information Gain (RFIG), grounded in Bayesian kernel methods theory, which uses random Fourier features to approximate information gain and compute exploration bonuses in non-countable spaces. We provide error bounds on information gain approximation and avoid the black-box aspects of neural network-based uncertainty estimation, for optimism-based exploration. We present practical details that make RFIG scalable to deep RL scenarios, enabling smooth integration into standard deep RL algorithms. Experimental evaluation across diverse control and navigation tasks demonstrates that RFIG achieves competitive performance with well-established deep exploration methods while offering superior theoretical interpretation.

Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning

Thu, 13 Nov 2025 00:00:00 +0000

We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given $N’_k$ i.i.d.\\{samples} from each source distribution $\\nu’_k$, and the true target distributions $\\nu_k$ lie within a known distance bound $d_k(\\nu_k,\\nu’_k)\\le L_k$. In this framework, we first derive a problem-dependent asymptotic lower bound on cumulative regret that extends the classical Lai–Robbins result to incorporate the transfer parameters $(d_k,L_k,N’_k)$. We then propose \\textsc\{KL-UCB-Transfer\}, a simple index policy that matches this new bound in the Gaussian case. Finally, we validate our approach via simulations, showing that \\textsc\{KL-UCB-Transfer\} significantly outperforms the no-prior baseline when source and target distributions are sufficiently close.

Boundary-Aware Refinement with Environment-Robust Adapter Tuning for Underwater Instance Segmentation

Thu, 13 Nov 2025 00:00:00 +0000

Underwater instance segmentation is a challenging task due to adverse visual conditions such as light attenuation, scattering, and color distortion, which severely degrade image quality and hinder model performance. In this work, we propose \\textbf\{BARD-ERA\}, a unified framework that integrates three novel components to address these challenges. First, the \\textbf\{Boundary-Aware Refinement Decoder (BARDecoder)\} improves mask quality through progressive feature refinement and lightweight upsampling using a Multi-Stage Gated Refinement Network and Depthwise Separable Upsampling. Second, the \\textbf\{Environment-Robust Adapter (ERA)\} enables efficient adaptation to underwater degradations by injecting environment-specific priors with over 90% fewer trainable parameters than full fine-tuning. Third, the \\textbf\{Boundary-Aware Cross-Entropy (BACE) loss\} enhances boundary supervision by leveraging range-null space decomposition. Together, these modules achieve state-of-the-art performance on the UIIS dataset, surpassing Mask R-CNN by 3.4 mAP with Swin-B and 3.8 mAP with ConvNeXt V2-B, while maintaining a compact model size. Our results demonstrate that BARD-ERA enables robust, accurate, and efficient segmentation in complex underwater scenes.

Domain Adaptation with Hybrid Modeling for Learning Dynamical Systems

Thu, 13 Nov 2025 00:00:00 +0000

Domain shifts present significant challenges for data-driven modeling of dynamical systems, as they may reduce state prediction accuracy and degrade model-based control performance. Transfer learning is a promising way to mitigate the effect of changes in dynamics. In this study, we investigate a domain adaptation framework based on hybrid modeling with fine-tuning. Hybrid models integrate physics-based components derived from prior knowledge of system dynamics into neural ordinary differential equations (neural ODEs). They are expected to facilitate efficient and enhanced fine-tuning because the structures of physics parts often remain invariant under domain shifts. We evaluated the hybrid neural ODE approach through experiments on multicopters undergoing concept shifts and found that introducing physics models significantly enhanced the domain adaptation capabilities, even when the physics-based components included unidentified parameters. Moreover, the results demonstrated that the hybrid modeling strategy reduced the amount of data required in the target domain, enabling efficient domain adaptation.

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Thu, 13 Nov 2025 00:00:00 +0000

We theoretically analyzed the convergence behavior of Riemannian stochastic gradient descent (RSGD) and found that using an increasing batch size leads to faster convergence than using a constant batch size, not only with a constant learning rate but also with a decaying learning rate, such as cosine annealing decay and polynomial decay. The convergence rate improves from $O(T^\{-1\}+C)$ with a constant batch size to $O(T^\{-1\})$ with an increasing batch size, where $T$ denotes the total number of iterations and $C$ is a constant. Using principal component analysis and low-rank matrix completion, we investigated, both theoretically and numerically, how an increasing batch size affects computational time as quantified by stochastic first-order oracle (SFO) complexity. An increasing batch size was found to reduce the SFO complexity of RSGD. Furthermore, an increasing batch size was found to offer the advantages of both small and large constant batch sizes.

Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Thu, 13 Nov 2025 00:00:00 +0000

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the \\textit\{mean-variance\}(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.

Emergence of the Primacy Effect in Structured State-Space Models

Thu, 13 Nov 2025 00:00:00 +0000

Structured state-space models (SSMs) have been developed to offer more persistent memory retention than traditional recurrent neural networks, while maintaining real-time inference capabilities and addressing the time-complexity limitations of Transformers. Despite this intended persistence, the memory mechanism of canonical SSMs is theoretically designed to decay monotonically over time, meaning that more recent inputs are expected to be retained more accurately than earlier ones. Contrary to this theoretical expectation, however, the present study reveals a counterintuitive finding: when trained and evaluated on a synthetic, statistically balanced memorization task, SSMs predominantly preserve the *initially* presented data in memory. This pattern of memory bias, known as the *primacy effect* in psychology, presents a non-trivial challenge to the current theoretical understanding of SSMs and opens new avenues for future research.

Training Data Soft Selection via Joint Density Ratio Estimation

Thu, 13 Nov 2025 00:00:00 +0000

This paper studies the training data selection problem, focusing on the selection of effective samples to improve model training using data affected by distributional shifts (i.e., data drifts). Existing drift-detection-based methods struggle with local drifts, while recent drift-localization-based methods lack theoretical support for the problem and are often ineffective. To tackle these issues, this paper proposes TSJD, a training data soft selection method based on joint density ratio estimation. TSJD assigns training weights (i.e., soft selects) to samples based on the estimated joint density ratio to align the selected data with the recent data distribution. By evaluating each sample independently of time, TSJD effectively addresses local data drifts. We also provide theoretical guarantees by deriving an upper bound on the generalization error for models trained with data selected by TSJD. In numerical experiments with four real-world datasets, TSJD shows great versatility, achieving the best or comparable results over baseline methods in all of the experiments.

Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation

Thu, 13 Nov 2025 00:00:00 +0000

The paradigm shift toward structure-driven molecule generation has been propelled by advances in deep generative models, such as variational auto-encoders and diffusion models. However, these generative models for molecular design remain constrained by exposure bias, error accumulation, and suboptimal handling of activity cliffs. Here, we introduce DiffGap, a diffusion-based framework that integrates adaptive sampling and pseudo-molecule estimation to bridge the gap between training objectives and inference dynamics in 3D molecule generation. By dynamically aligning intermediate denoising steps with realistic generation trajectories, DiffGap enables the diffusion model to adapt to input biases in advance during the training phase. A temperature annealing module further controls the aligning strength of the adaptive alignment process, ensuring stable learning of the data distribution. Evaluated on the CrossDocked2020 benchmark, DiffGap outperforms existing methods in docking scores and binding affinity, demonstrating superior fidelity in generating drug-like molecules. Our work establishes a principled approach to harmonize generative training with inference mechanics, offering a robust computational toolkit for accelerating structure-based therapeutic discovery. The source code of DiffGap will be published after review.

Towards Robust and Scalable Knowledge Editing in Text-to-Image Diffusion Models

Thu, 13 Nov 2025 00:00:00 +0000

Knowledge editing in Text-to-Image(T2I) diffusion models aims to update specific factual associations without disrupting unrelated knowledge. However, existing methods often suffer from unintended collateral effects, where editing a single fact can alter the representation of non-target named entities, degrading generation quality for unrelated prompts, which becomes more severe in real-world, dynamic environments requiring frequent updates. To address this challenge, we introduce a novel editing framework supporting large-scale T2I knowledge editing. Our framework incorporates our proposed Entity-Aware Text Alignment(EATA) to penalize unintended changes in unaffected entities and employs a principled null-space projection strategy to minimize perturbations to existing knowledge. Experimental results demonstrate that our approach enables precise and robust large-scale T2I knowledge editing, preserves the integrity of unrelated content, and maintains high generation fidelity, while offering scalability for continuous editing scenarios.

POIL:Preference Optimization for Imitation Learning

Thu, 13 Nov 2025 00:00:00 +0000

Imitation learning (IL) enables agents to learn policies by mimicking expert demonstrations. While online IL methods require interaction with the environment, which is costly, risky, or impractical, offline IL allows agents to learn solely from expert datasets without any interaction with the environment.In this paper, we propose Preference Optimization for Imitation Learning (POIL), a novel approach inspired by preference optimization techniques in large language model alignment. POIL eliminates the need for adversarial training and reference models by directly comparing the agent’s actions to expert actions using a preference-based loss function. We evaluate POIL on MuJoCo control tasks and Adroit manipulation tasks.Our experiments show that POIL consistently delivers superior or competitive performance against state-of-the-art methods in the past, including Behavioral Cloning (BC), IQ-Learn, MCNN, and O-DICE, especially in data-scarce scenarios, such as using single trajectory.These results demonstrate that POIL enhances data efficiency and stability in offline imitation learning, making it a promising solution for applications where environment interaction is infeasible and expert data is limited, even in high-dimensional and complex control tasks.

Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs

Thu, 13 Nov 2025 00:00:00 +0000

Knowledge editing has emerged as an efficient approach for updating factual knowledge in large language models (LLMs), typically achieved by first locating key knowledge-storage modules and then modifying their parameters. However, most existing methods focus exclusively on updating the weights of Multi-Layer Perceptron (MLP) modules, which are commonly identified as the primary repositories of factual information. Other important components, such as attention (Attn) modules—one of the core modules in LLMs—are often ignored during editing. This biased allocation of updates can leave residual outdated knowledge in the model and limit the effectiveness of knowledge editing. In this paper, we conduct comprehensive and systematic knowledge localization experiments on advanced LLMs, revealing that Attn modules play a substantial role in factual knowledge storage and retrieval, especially in earlier layers. Building on these insights, we propose \\textit\{IntAttn-Edit\}, a novel method that extends the associative memory paradigm to jointly update both MLP and Attn modules. Our approach employs a knowledge balancing strategy that proportionally allocates update magnitudes based on each module’s measured contribution to knowledge storage. Extensive experiments on popular benchmarks demonstrate that \\textit\{IntAttn-Edit\} consistently achieves superior results over existing methods, delivering higher edit success, improved generalization, and robust knowledge preservation. Further empirical analysis shows that our knowledge balancing strategy enables the editing performance to remain within the optimal range across different settings.

Iterative Selection with Self-Review for Vocabulary Test Distractor Generation

Thu, 13 Nov 2025 00:00:00 +0000

Vocabulary acquisition is essential to second language learning, as it underpins all core language skills. Accurate vocabulary assessment is particularly important in standardized exams, where test items evaluate learners’ comprehension and contextual use of words. Previous research has explored methods for generating distractors to aid in the design of English vocabulary tests. However, current approaches often rely on lexical databases or predefined rules, and frequently produce distractors that risk invalidating the question by introducing multiple correct options. In this study, we focus on English vocabulary questions. We analyze how teachers design test items to gain insights into distractor selection strategies. Additionally, we identify key limitations in how large language models (LLMs) support teachers in generating distractors for vocabulary test design. To address these challenges, we propose the iterative selection with self-review (ISSR) framework, which makes use of an LLM-based self-review mechanism to ensure that the distractors remain valid while offering diverse options. ISSR aims to assist educators by providing an LLM-based tool that allows them to efficiently design pedagogically sound distractors through natural language instructions. Experimental results show that ISSR achieves promising performance in generating plausible distractors, and the self-review mechanism effectively filters out distractors that could invalidate the question.

ReSa2: A Two-Stage Retrieval-Sampling Algorithm for Negative Sampling in Dense Retrieval

Thu, 13 Nov 2025 00:00:00 +0000

Negative sampling algorithms are critical for training dense retrievers, which in turn impact retrieval performance in information systems. Among these, hard negative sampling is of great value, and the denoised negative sampling methods in particular. Strategically selecting relevant negative samples, these methods effectively enhance the effectiveness of model training. However, they are either restricted to single-stage retrieval, failing to fully explore potential effective negatives, or demand additional training for a filter, which compromises sampling efficiency. To address this issue, the paper introduces a two-stage Retrieval-Sampling Algorithm(ReSa2). It integrates document vector-based retrieval to refine candidate selection progressively while preserving semantic relevance. In Stage 1, ReSa2 uses query vectors for broad retrieval, generating a candidate subset from the corpus to narrow the search space. In Stage 2, it reuses the retriever to perform positive-centric retrieval within this subset, leveraging positive sample vectors to re-rank candidates and enrich hard negatives with semantic similarity to the query. During the whole process, the effect is further enhanced by conducting probability-weighted sampling on the candidate subset. Insight experiments on 40,000 query-sample pairs show ReSa2 suppresses false negatives by 69.1% compared to Top-K sampling. Specifically, on the Ms Pas dataset, it outperforms the state-of-the-art by 1.2% in MRR@10 and 0.5% in R@1000. Notably, an external validation on Natural Questions (unseen domain) demonstrates ReSa2 maintains robust performance when trained on MS MARCO, highlighting its generalization capability across diverse retrieval scenarios. Ablation experiments validate the complementary roles of the two stages. Our code and appendix are released in https://github.com/ad32q/ReSa2.

TAEGAN: Revisit GANs for Tabular Data Generation

Thu, 13 Nov 2025 00:00:00 +0000

Synthetic tabular data generation has gained significant attention for its potential in data augmentation and privacy-preserving data sharing. While recent methods like diffusion and auto-regressive models (i.e., transformer) have advanced the field, generative adversarial networks (GANs) remain highly competitive due to their training efficiency and strong data generation capabilities. In this paper, we introduce Tabular Auto-Encoder Generative Adversarial Network (TAEGAN), a novel GAN-based framework that leverages a masked auto-encoder as the generator. TAEGAN is the first to incorporate self-supervised warmup training of generator into tabular GANs. It enhances GAN stability and exposes the generator to richer information beyond the discriminator’s feedback. Additionally, we propose a novel sampling method tailored for imbalanced or skewed data and an improved loss function to better capture data distribution and correlations. We evaluate TAEGAN against seven state-of-the-art synthetic tabular data generation algorithms. Results from eight datasets show that TAEGAN outperforms all baselines on five datasets, achieving a 27% overall utility boost over the best-performing baseline while maintaining a model size less than 5% of the best-performing baseline model. Code is available at: https://github.com/BetterdataLabs/taegan.

High-Order Consistency-Guided User Identity Linkage with Large Language Model

Thu, 13 Nov 2025 00:00:00 +0000

With the rapid expansion of the Internet, people commonly maintain multiple account identities across different online platforms, creating latent cross-network associations. User Identity Linkage (UIL), which seeks to identify and associate multiple accounts belonging to the same individual across platforms, has emerged as a vital research direction with broad applications in cross-platform recommendation, unified user profiling, and so on. However, existing methods face two major challenges in real-world environments: cross-platform feature heterogeneity and attribute-structure representation fusion. To address these challenges, this paper propose a Multi-View Feature High-Order Consistency-Guided User Identity Linkage method UIL-HC-MV. Our approach mitigates cross-network heterogeneity by deeply integrating multi-view features and mining consistency in shared thematic information among users and their relational networks. We decompose cross-platform feature heterogeneity into two subproblems: attribute heterogeneity and structural heterogeneity. We first fuse attribute and structural views by coupling nodes’ random-walk sequences with neighborhood sampling to jointly extract node attributes and topological context. We then employ a Large Language Model to capture deep semantic information and contextual relationships across multiple text segments, distilling unified themes or high-order community features from the combined attribute-structure representation. Finally, we fine-tune a BERT model on the extracted high-order information to reinforce feature consistency and enable transfer learning for improved generalization. Extensive comparative experiments on real-world datasets demonstrate significant performance improvements over existing mainstream methods, validating the effectiveness of high-order information in alleviating cross-network heterogeneity and confirm the contribution of each component within our deeply integrated multi-view feature learning framework.

Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity

Thu, 13 Nov 2025 00:00:00 +0000

This paper revisits multi-play multi-armed bandit with shareable arm capacities problem, which is tailored for resource allocation problems arising from LLM inference serving, edge intelligence, etc. We investigate the capacity-scarce setting, a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this setting and it makes three key contributions. We establish a minmax lower bound for the sample complexity of learning the arm capacities and propose an algorithm that exactly matches this lower bound. We derive both instance-independent and instance-dependent regret lower bounds for learning the optimal play assignment. We introduce an efficient exploration algorithm named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for coordinating the exploration of multiple plays. Experiments show significant improvement over existing methods.

Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection

Thu, 13 Nov 2025 00:00:00 +0000

Out-of-distribution (OOD) detection has seen significant advancements with zero-shot approaches by leveraging the powerful Vision-Language Models (VLMs) such as CLIP. However, prior research works have predominantly focused on enhancing Far-OOD performance, while potentially compromising Near-OOD efficacy, as observed from our pilot study. To address this issue, we propose a novel strategy to enhance zero-shot OOD detection performances for both Far-OOD and Near-OOD scenarios by innovatively harnessing Large Language Models (LLMs) and VLMs. Our approach first exploit an LLM to generate superclasses of the ID labels and their corresponding background descriptions followed by feature extraction using CLIP. We then isolate the core semantic features for ID data by subtracting background features from the superclass features. The refined representation facilitates the selection of more appropriate negative labels for OOD data from a comprehensive candidate label set of WordNet, thereby enhancing the performance of zero-shot OOD detection in both scenarios. Furthermore, we introduce novel few-shot prompt tuning and visual prompt tuning to adapt the proposed framework to better align with the target distribution. Experimental results demonstrate that the proposed approach consistently outperforms current state-of-the-art methods across multiple benchmarks, with an improvement of up to 2.9% in AUROC and a reduction of up to 12.6% in FPR95. Additionally, our method exhibits superior robustness against covariate shift across different domains, further highlighting its effectiveness in real-world scenarios.

Policy Iteration for Two-Player General-Sum Stochastic Stackelberg Games

Thu, 13 Nov 2025 00:00:00 +0000

We address two-player general-sum stochastic Stackelberg games (SSGs), where the leader’s policy is optimized considering the best-response follower whose policy is optimal for its reward under the leader. Existing policy gradient and value iteration approaches for SSGs do not guarantee monotone improvement in the leader’s policy under the best-response follower. Consequently, their performance is not guaranteed when their limits are not stationary Stackelberg equilibria (SSEs), which do not necessarily exist. In this paper, we derive a policy improvement theorem for SSGs under the best-response follower and propose a novel policy iteration algorithm that guarantees monotone improvement in the leader’s performance. Additionally, we introduce Pareto-optimality as an extended optimality of the SSE and prove that our method converges to the Pareto front when the leader is myopic.

SAFE: Spiking Neural Network-based Audio Fidelity Evaluation

Thu, 13 Nov 2025 00:00:00 +0000

Recent advances in generative AI have enabled the creation of highly realistic synthetic audio, which poses significant challenges in voice authentication, media verification, and fraud detection. While Artificial Neural Networks (ANNs) are frequently used for fake audio detection, they often struggle to generalize to unseen and complex manipulations, particularly partial fake audio, where real and synthetic segments are seamlessly combined. This paper explores the use of Spiking Neural Networks (SNNs) for fake and partial fake audio detection – an unexplored area. Taking advantage of the inherent energy efficiency and temporal processing capabilities of SNNs, we propose novel SNN-based architectures for both tasks. We perform comprehensive evaluations that include hyperparameter tuning, cross-data set generalization, noise robustness, and partial fake audio detection using multiple large-scale public audio datasets. Our results show that SNNs achieve performance comparable to state-of-the-art ANN models while showing better generalization capabilities and robustness to noise. These SNN-based approaches also resulted in additional advantages, such as reduced model sizes and the ability to classify individual segments, making them more suitable for resource-constrained and real-time voice authentication applications. This work lays a foundation for exploring SNNs as countermeasures against audio spoofing in security-critical applications.

The Great Contradiction Showdown: How Jailbreak and Stealth Wrestle in Vision-Language Models?

Thu, 13 Nov 2025 00:00:00 +0000

Vision-Language Models (VLMs) have achieved remarkable performance across various tasks. Unfortunately, due to their multimodal nature, a common jailbreak strategy transforms harmful instructions into visual formats like stylized typography or AI-generated images to bypass safety alignment. Despite numerous heuristic defenses, little research has investigated the underlying rationale behind the jailbreak. In this paper, we introduce an information-theoretic framework to explore the fundamental trade-off between attack effectiveness and stealthiness. Leveraging Fano’s inequality, we show that an attacker’s success probability intrinsically relates to the stealthiness of the generated prompts. We further propose an efficient algorithm to detect non-stealthy jailbreak attacks. Experimental results highlight the inherent tension between strong attacks and detectability, offering a formal lower bound on adversarial strategies and potential defense mechanisms.

Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum

Thu, 13 Nov 2025 00:00:00 +0000

Stochastic gradient descent with momentum (SGDM), in which a momentum term is added to SGD, has been well studied in both theory and practice. The theoretical studies show that the settings of the learning rate and momentum weight affect the convergence of SGDM. Meanwhile, the practical studies have shown that the batch-size setting strongly affects the performance of SGDM. In this paper, we focus on mini-batch SGDM with a constant learning rate and constant momentum weight, which is frequently used to train deep neural networks. We show theoretically that using a constant batch size does not always minimize the expectation of the full gradient norm of the empirical loss in training a deep neural network, whereas using an increasing batch size definitely minimizes it; that is, an increasing batch size improves the convergence of mini-batch SGDM. We also provide numerical results supporting our analyses, indicating specifically that mini-batch SGDM with an increasing batch size converges to stationary points faster than with a constant batch size, while also reducing computational cost. Python implementations of the optimizers used in the numerical experiments are available at https://github.com/iiduka-researches/NSHB_increasing_batchsize_acml25/.

Multi-thresholding Good Arm Identification with Bandit Feedback

Thu, 13 Nov 2025 00:00:00 +0000

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\\in[K]$ is associated with a distribution $\\mathcal\{D_\{i\}\}$ defined over $\\mathbb\{R\}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\\mathcal\{D_\{i_t\}\}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\\xi_1,\\ldots,\\xi_M$ with a confidence bound $\\delta$ and an accuracy rate $\\epsilon$ with a bounded sample complexity, the other is output $\\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\\epsilon = 0$, and we give novel bounds for $M > 1$ and $\\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.

KVCrush: Key Value Cache size-reduction using similarity in head-behaviour

Thu, 13 Nov 2025 00:00:00 +0000

Key-value (KV) caching has emerged as a crucial optimization technique for accelerating inference in large language models (LLMs). By allowing the attention operation to scale linearly rather than quadratically with the total sequence length, KV caching significantly enhances generation throughput. However, due to large context lengths in the modern LLMs, the memory footprint of the KV is a huge bottleneck for model deployment directly impacting the model’s batch size, hindering its ability to deliver high-throughput. Existing research addresses this challenge using several techniques, such as discarding low-attention tokens, quantization, and matrix approximation which typically lead to a negative impact on the model accuracy. In this paper, we propose KVCrush technology which can be combined with many KV compression technologies to improve the model accuracy at a much smaller memory. KVCrush provides an alternate representation scheme for key-value states, along with a low-overhead token pruning algorithm that accounts for the token distribution in the KV cache, which in turn allows for a smaller footprint while maintaining the accuracy of the model. Based on our results, KVCrush reduces LongBench KV Cache size by $4\\times$ with less than 1% accuracy drop and achieves state-of-the-art average accuracy with minimal overhead, incurring less than 0.5% total inference latency. KVCrush not only outperforms state-of-the-art importance-based token retention schemes in accuracy, but also integrates seamlessly with quantization, paging, and head-sharing techniques, requiring no retraining or architectural changes.

Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Thu, 13 Nov 2025 00:00:00 +0000

Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.

GIIM: A Graph Information Integration Method for Chinese-Kazakh CLIR

Thu, 13 Nov 2025 00:00:00 +0000

Chinese-Kazakh cross-lingual information retrieval (CLIR) aims to search relevant content from a collection of Kazakh documents using Chinese query statements. The intrinsic differences in grammar, vocabulary, and semantic expression between languages pose significant challenges for semantic alignment in CLIR. Existing CLIR methods that incorporate multilingual knowledge graph (MLKG) typically use simple vector stacking approaches to integrate entity information, failing to leverage deeper entity relationships and semantic connections. To address these challenges, we propose GIIM, a graph information integration method for Chinese-Kazakh CLIR that leverages the rich multilingual entity information embedded in MLKG as semantic bridges to narrow the linguistic gap during query-document matching process. Unlike previous methods, GIIM unifies query-document pairs and entity information into a graph structure and employs Graph Convolutional Network to aggregate both direct and multi-hop relations among entities, effectively modeling complex semantic paths and hierarchical knowledge propagation. To comprehensively evaluate GIIM, we construct CKIRD, a Chinese-Kazakh information retrieval dataset containing approximately 11,820 annotated query-paragraph pairs, and conduct experiments on both CKIRD and the public CLIRMatrix datasets. Experimental results show that GIIM outperforms existing baseline models across multiple ranking metrics, demonstrating its effectiveness on the Chinese-Kazakh CLIR task.

Enhanced Blind Image Restoration with Channel Attention Transformers and Multi-Scale Attention Prompt-based Learning

Thu, 13 Nov 2025 00:00:00 +0000

Deep learning models today are indispensable tools for image compression and restoration. However, despite recent progress, many existing models often lack generalization upon facing with different types and coding strength designs of image restoration, thus limiting their practical application. In this paper, a novel approach called \{\\em dual-Channel Transformers and Multi-scale attention Prompt learning (CTMP)\} is introduced to bridge the gap on blind image restoration. The prompt-based learning approach is employed in the model to address two key image restoration tasks: 1) compressed image artifact removal, and 2) image denoising. By utilizing adaptive prompts to accommodate varying quantization parameter (QP) values and noise conditions, and enhancing adaptability through the integration of multi-scale attention mechanisms, an advanced Transformer architecture in our model can tackle diverse image degradations in blind image restoration. That is, our Transformer module is improved through merging and harnessing the strengths of both channel attention and self-attention. The design is adept at extracting both high-frequency details and low-frequency structures, thereby significantly enhancing overall restoration performance. Using the Kodak dataset in experiments, our model outperforms conventional deep learning techniques with a 2.44% reduction of BD-rate in blind mode. It shows a 29.21% improvement over traditional JPEG compression and a 0.14 dB improvement in blind denoising. The experiments demonstrate that our approach is capable of training a single model effectively for both compressed image artifact removal and image denoising. The code is publicly available on GitHub at https://github.com/gdit-ai/CTMP.

Dual Color Space Underwater Image Enhancement Network

Thu, 13 Nov 2025 00:00:00 +0000

In the field of underwater image enhancement, existing methods generally rely heavily on the RGB color space and ignore the potential advantages of the perceptually uniform XYZ color space in color correction. Additionally, CNN-based methods are prone to losing long-distance dependency relationships during local feature extraction process, thus affecting the image restoration quality. To address the above issues, we propose DSCNet, an underwater image enhancement framework based on dual color spaces. The framework aims to break through the limitations of traditional methods, by innovatively introducing a parallel processing mechanism for both RGB and XYZ color spaces. Upon fully taking advantages of the XYZ space in terms of perceptual linearity, the model can improve the color correction and brightness enhancement processes. Furthermore, we design a hybrid computing architecture which combines convolutional operations with a novel lightweight Transformer module. Through channel splitting and dimensionality reduction strategies, the computational complexity is reduced significantly while maintaining the ability to effectively model global contextual information. Experimental results show that DCSNet exhibits excellent enhancement performance in various underwater scenarios and delivers superior visual effects. Moreover, with its small number of model parameters, DCSNet can be deployed on embedded or edge devices for practical underwater visualization applications.

Deviation-based multiple coefficient item mixer for heterogeneous set-to-set matching

Thu, 13 Nov 2025 00:00:00 +0000

Heterogeneous set-to-set matching tasks such as fashion outfit recommendation, require permutation-invariant and dynamic item-wise transformations to bring compatible sets closer while pushing incompatible ones apart. While attention-based methods satisfy the permutation invariance requirement, they often suffer from convex hull limitations due to their reliance on softmax-based dot-product operations. On the other hand, MLP-based methods like DuMLP-Pin avoid such constraints but tend to lose critical item-wise structure through global aggregation. To address these limitations, we propose DeviMix (Deviation-based multiple coefficient item Mixer), a novel MLP-based architecture that performs item-wise dynamic transformations. Our approach generates multiple item-mixing coefficients by applying MLPs to cross-deviation vectors computed from all possible item pairs in sets. Extensive experiments on fashion outfit and furniture coordination matching tasks demonstrate that DeviMix consistently outperforms attention-based and global pooling-based baselines, validating the effectiveness of our MLP-based item-wise aggregation using cross-deviation for heterogeneous set matching.

Kairos: Redefining Event and Time Prediction with Language Modeling

Thu, 13 Nov 2025 00:00:00 +0000

Continuous time-event sequence (CTES) forecasting is essential across diverse domains, from healthcare to finance, requiring accurate prediction of both future event types and their timestamps. Traditionally, CTES forecasting has been driven by Temporal Point Processes (TPPs), which rely on intensity function-based priors. However, these methods often fail to generalize effectively to real-world scenarios as well as perform poorly over longer horizons. While recent diffusion-based approaches are promising, they are limited by the need to define a fixed prior while training, such as number of events to forecast or time horizon, which requires training multiple models for different horizon lengths. We present Kairos, a novel model that reformulates CTES forecasting as a language modeling task. Our model employs a decoder-only transformer architecture with a unified tokenization approach that represents time and events in a shared embedding space. By structuring the input as alternating event and time tokens, the model learns to capture the inherent temporal relationships between events. Through comprehensive experiments on multiple large-scale datasets, we demonstrate that Kairos consistently outperforms state-of-the-art baselines, achieving average improvements of 4.5% and 7.8% in short term forecasting for event and time respectively and 14.41% improvement in long term forecasting. Additionally, we conduct extensive ablation studies and qualitative analysis to understand the inner workings of Kairos.

Multi-view Privileged Information-based Representation Learning for Liver Cancer Diagnosis

Thu, 13 Nov 2025 00:00:00 +0000

Privileged information (PI) provides additional knowledge to improve performance. Though some efforts are carried out by learning using privileged information (LUPI), they mainly focus on classifier-level LUPI and single-view PI tasks. Therefore, it is a challenge for feature representation learning by transferring multi-view PI to improve the main view. In this paper, we propose a novel feature-level LUPI for multi-view PI tasks, called the multi-view privileged information-based representation learning (MPIRL) algorithm, in which multi-view PI and main view are required at the training phase, but only the main view is available at the testing phase. MPIRL consists of a feature-level LUPI module and a classification module. The feature-level LUPI module of MPIRL designs a multi-branch structure to transfer the multi-view privileged information to the main view, so that diversity and discriminative representation can be generated. For the classification module, multi-view deep SVM (MDSVM) is developed, which combines a multi-channel deep neural network with SVM into a unified framework. MDSVM further learns the fusion representation and classification simultaneously to improve the generalization performance. The experimental results on the dual-view PI tasks and multi-view PI tasks of the real-world multi-view liver cancer dataset show that the proposed MPIRL achieves superior performance with an accuracy of 86.92%, sensitivity of 89.58%, and specificity of 84.25%.

Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition

Thu, 13 Nov 2025 00:00:00 +0000

Cross-domain facial expression recognition (CD-FER) remains difficult due to severe domain shift between training and deployment data. We propose Graph-Attention Network with Adversarial Domain Alignment (GAT-ADA), a hybrid framework that couples a ResNet-50 as backbone with a batch-level Graph Attention Network (GAT) to model inter-sample relations under shift. Each mini-batch is cast as a sparse ring graph so that attention aggregates cross-sample cues that are informative for adaptation. To align distributions, GAT-ADA combines adversarial learning via a Gradient Reversal Layer (GRL) with statistical alignment using CORAL and MMD. GAT-ADA is evaluated under a standard unsupervised domain adaptation protocol: training on one labeled source (RAF-DB) and adapting to multiple unlabeled targets (CK+, JAFFE, SFEW 2.0, FER2013, and ExpW). GAT-ADA attains 74.39% mean cross-domain accuracy. On RAF-DB to FER2013, it reaches 98.0% accuracy, corresponding to a 36-point improvement over the best baseline we re-implemented with the same backbone and preprocessing.

SparseSegNet: A Boundary-Aware Lightweight Segmentation Architecture for Skin Lesions

Thu, 13 Nov 2025 00:00:00 +0000

Accurate skin lesion segmentation is essential for the early diagnosis of dermatological conditions, including the timely detection of malignant skin cancers. Enabling such analysis on personal devices—such as smartphones—offers greater accessibility but introduces critical challenges related to computational constraints and privacy preservation. Performing segmentation directly on mobile edge devices avoids the need to transmit sensitive data to the cloud but requires models that are both lightweight and highly accurate. To this end, we propose SparseSegNet, an organically efficient segmentation framework that combines architectural simplicity with training-time innovations to enable real-time, on-device inference. SparseSegNet is built upon a Deep Layer Aggregation (DLA)-inspired encoder–decoder backbone, which effectively captures multi-scale lesion features while maintaining a compact model size. To further enhance boundary precision and generalization, we introduce a novel dual-teacher distillation strategy, termed Agreement-Guided Orthogonal Projection (AG-OP). This method transfers complementary spatial cues from two powerful vision foundation models— Segment Anything Model (SAM) based on Vision Transformer-Huge (ViT-H), and Segment Everything Everywhere Model (SEEM). Unlike traditional single-teacher distillation approaches, AG-OP encourages alignment between hard and soft pseudo-labels through orthogonal subspace projection, improving the robustness of the student model. We validate SparseSegNet across five public skin lesion segmentation benchmarks—ISIC 2017, ISIC 2018, PH$^2$, HAM10000, and Derm7pt —under a unified preprocessing and training pipeline. SparseSegNet achieves up to 0.91 Dice coefficient, 0.85 Intersection-over-Union (IoU), and 38ms latency with only 7 million parameters, outperforming recent compact models such as MobileSAM, CMUNeXt, and YOLOv8n-seg. Paired t-tests ($p < 0.01$) confirm the statistical significance of our improvements. SparseSegNet thus presents a privacy-preserving, boundary-aware solution for real-time skin lesion analysis on edge devices.

$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

Thu, 13 Nov 2025 00:00:00 +0000

Large language models (LLMs) demonstrate remarkable capabilities across various tasks. However, their deployment introduces significant risks related to intellectual property. In this context, we focus on model stealing attacks, where adversaries replicate the behaviors of these models to steal services. These attacks are highly relevant to proprietary LLMs and pose serious threats to revenue and financial stability. To mitigate these risks, the watermarking solution embeds imperceptible patterns in LLM outputs, enabling model traceability and intellectual property verification. In this paper, we study the vulnerability of LLM service providers by introducing $\\delta$-Steal, a novel model stealing attack that bypasses the service provider’s watermark detectors while preserving the adversary’s model utility. $\\delta$-Steal injects noise into the token embeddings of the adversary’s model during fine-tuning in a way that satisfies local differential privacy (LDP) guarantees. The adversary queries the service provider’s model to collect outputs and form input-output training pairs. By applying LDP-preserving noise to these pairs, $\\delta$-\\Steal obfuscates watermark signals, making it difficult for the service provider to determine whether its outputs were used, thereby preventing claims of model theft. Our experiments show that $\\delta$-Steal with lightweight modifications achieves attack success rates of up to $96.95%$ without significantly compromising the adversary’s model utility. The noise scale in LDP controls the trade-off between attack effectiveness and model utility. This poses a significant risk, as even robust watermarks can be bypassed, allowing adversaries to deceive watermark detectors and undermine current intellectual property protection methods.

Efficient Subsampling for GNN Downstream Tasks

Thu, 13 Nov 2025 00:00:00 +0000

While Graph Neural Networks (GNNs) have shown significant promise for data integration using graph structures, methods to support subsampling graph data are lagging. To address this gap, in this paper, we propose a novel importance-based data subsampling framework. This framework strategically identifies inputs from a primary graph dataset based on their impact on the model’s learning of downstream tasks, such as graph or node classification. Our measure of impact is the predictive uncertainty of each data point. To ensure the subsample is well-representative of the original sample, we cluster them based on their learned graph representation. Finally, subsampling is performed from these identified clusters. The process favours selecting data points with greater prediction uncertainty, while preserving the diversity of the overall sample. We evaluate our approach using a multi-source, real-world dataset on child and youth mental health, comprising emergency department (ED) admissions and mental health questionnaire data. Our experimental results demonstrate that training a GNN with samples identified by the proposed framework yields a statistically significant improvement (on average, 10.13% improvement across metrics from the baseline approach) in predictive performance compared to training on a randomly selected subset of patients. The code is available at https://github.com/tailabTMU/GSS.

D$^3$epth: Distilling Diffusion Models For Efficient Depth Estimation Through A Two-Stage Approach

Thu, 13 Nov 2025 00:00:00 +0000

Diffusion-based monocular depth estimation models demonstrate strong performance with limited supervision by leveraging pre-trained text-to-image models. However, their multi-step inference process and large model size create prohibitive computational overhead for practical applications. To retain the data efficiency of diffusion models while addressing their inference inefficiency, we propose a framework that enhances diffusion-based depth estimation through a two-stage training approach. The first stage distills implicit depth knowledge in the latent space by leveraging the rich representations from pre-trained diffusion models. The second stage refines explicit depth predictions in pixel space using Hybrid Depth Loss that combines Shift-Scale Invariant (SSI) loss for global structure preservation with Edge-aware Gradient Huber loss for fine-grained detail enhancement. Both components are adaptively weighted using a dynamic task weighting strategy, balancing structural consistency and boundary precision. Specifically, we demonstrate that our two-stage distillation approach yields D$^3$epth, an efficient variant that achieves state-of-the-art results while significantly reducing computational requirements. In parallel, our base model D$^2$epth, trained with enhanced pixel-space depth loss, also surpasses state-of-the-art performance across various benchmarks. Overall, these results deliver the accuracy benefits of diffusion-based methods at the efficiency level of traditional data-driven approaches.

Reduced-rank Factorized Fourier Neural Operator

Thu, 13 Nov 2025 00:00:00 +0000

We present R$^2$-FFNO, a novel neural operator architecture designed to address the overparameterization common in Factorized Fourier Neural Operators (FFNO) through reduced-rank factorization of spectral components. While neural operators are effective for learning solutions to partial differential equations (PDE), their architectures often contain an excessive number of parameters, which can lead to overfitting and diminished generalization capabilities. Inspired from reduced-rank learning techniques, the R$^2$-FFNO approach decomposes spectral kernels into lower-rank representations, enabling systematic control over the model’s capacity. This low-rank factorization facilitates a balance between the model’s expressiveness and capability of generalization. Empirical analysis reveals that performance saturates once an optimal rank is achieved and degrades if the rank is increased beyond this point. This observation highlights an optimal trade-off between model complexity and accuracy, underscoring the importance of principled rank selection in designing neural operators. To further enhance performance, a targeted data augmentation strategy is utilized. This strategy introduces high-frequency variations during training to address spectral bias, thereby enhancing the model’s capacity to resolve fine-scale PDE dynamics. A comprehensive evaluation on benchmark datasets confirms the efficacy of the R$^2$-FFNO. Compared to the original FFNO architecture, R$^2$-FFNO demonstrates significant error reductions: 46.5% reduction in the Navier-Stokes problem, 31.6% in the Kolmogorov Flow problem, and 34.7% in the Darcy Flow problem. The proposed method offers a principled framework for managing overparameterization in neural operators, contributing to the development of more efficient and generalizable PDE solvers for wide application in scientific computing. Our code is available at https://github.com/Chieh997/R2FFNO.

Mega-CE$^2$ : A Multimodal Heterogeneous Aggregation Framework for End-Edge-Cloud Computing

Thu, 13 Nov 2025 00:00:00 +0000

End-Edge-Cloud Computing (EECC) has emerged as the mainstream computing paradigm, integrating edge computing to overcome the limitations of traditional federated learning in communication efficiency and resource scheduling. However, existing studies reveal that most frameworks still struggle with challenges such as computing resource allocation and high end-to-end latency in EECC. To address these issues, we propose Mega-CE$^2$, a novel multi-modal heterogeneous aggregation framework. Mega-CE$^2$ establishes a closed-loop feedback mechanism from the bottom-up to the top down through end-device data serialization, edge-server model personalization, and cloud-based optimization. Notably Mega-CE$^2$ incorporates lightweight adapters for fine-tuning, enabling efficient deployment while preserving local model personalization. These adapters, with fewer parameters than the global model, optimize model parameters during edge-to-cloud aggregation, thereby achieving both lightweight and personalized capabilities. In experiments on three open-source standard datasets, we show that the performance of Mega-CE$^2$ improves by 3%–5%, while maintaining scalability with lightweight and low-latency characteristics.

Suicidal Posts Detection System Incorporating Psychological Risk Factors

Thu, 13 Nov 2025 00:00:00 +0000

Our study aims to utilize psychological risk factors to detect posts on social media that contain high-risk suicidal content in Mandarin. We propose a two-stage model structure: the first stage labels each sentence in an post according to risk factors, while the second stage uses these labels as features to predict the crisis level of the post. Our models were trained using a dataset developed from social media posts on a popular Mandarin-speaking platform, labeled by psychological professionals. Our approach achieved an accuracy and F1-score of 0.96 in classifying posts with high crisis levels. Furthermore, we developed a frontend webpage system to apply our model, designed for use by psychological professionals as an aid. This system not only helps psychological professionals detect and address high-risk posts but also offers them the opportunity for psychological analysis based on risk factors. By integrating expertise from psychology with advanced NLP and deep learning techniques, our system bridges the gap between technical models and psychological insights.

HIPPD: Brain-Inspired Hierarchical Information Processing for Personality Detection

Thu, 13 Nov 2025 00:00:00 +0000

Personality detection from text aims to infer an individual’s personality traits based on linguistic patterns. However, existing machine learning approaches often struggle to capture contextual information spanning multiple posts and tend to fall short in extracting representative and robust features in semantically sparse environments. This paper presents HIPPD, a brain-inspired framework for personality detection that emulates the hierarchical information processing of the human brain. HIPPD utilises a large language model to simulate the cerebral cortex, enabling global semantic reasoning and deep feature abstraction. A dynamic memory module, modelled after the prefrontal cortex, performs adaptive gating and selective retention of critical features, with all adjustments driven by dopaminergic prediction error feedback. Subsequently, a set of specialised lightweight models, emulating the basal ganglia, are dynamically routed via a strict winner-takes-all mechanism to capture the personality-related patterns they are most proficient at recognising. Extensive experiments on the Kaggle and Pandora datasets demonstrate that HIPPD consistently outperforms state-of-the-art baselines.

On the Privacy-preserving Generalized Eigenvalue Problem

Thu, 13 Nov 2025 00:00:00 +0000

Generalized eigenvalues serve as a foundational tool for extracting insights from data and constructing robust statistical learning models, while differential privacy ensures the protection of individual information within these models by minimizing the impact of any single data point. In this work, we propose an $(\\epsilon,\\delta)$-differential privacy algorithm to solve the generalized eigenvalue problem (GEP). Our algorithm gives better classification accuracy over existing methods and has the nearly optimal $\\ell_2$-norm error bounds in both low and high dimensions. Furthermore, our algorithm guarantees convergence to the solution regardless of the initial vector and this improves a previous method that requires a specific procedure to find a proper starting vector. Our experiments confirm the effectiveness of our algorithm in safeguarding privacy while simultaneously boosting classification accuracy.

Continual Pre-Training is (not) What You Need in Domain Adaptation

Thu, 13 Nov 2025 00:00:00 +0000

The recent advances in Legal Large Language Models (LLMs) have transformed the landscape of legal research and practice by automating tasks, enhancing research precision, and supporting complex decision-making processes. However, effectively adapting LLMs to the legal domain remains challenging due to the complexity of legal reasoning, the need for precise interpretation of specialized language, and the potential for hallucinations. This paper examines the efficacy of Domain-Adaptive Continual Pre-Training (DACP) in improving the legal reasoning capabilities of LLMs. Through a series of experiments on legal reasoning tasks within the Taiwanese legal framework, we demonstrate that while DACP enhances domain-specific knowledge, it does not uniformly improve performance across all legal tasks. We discuss the trade-offs involved in DACP, particularly its impact on model generalization and performance in prompt-based tasks, and propose directions for future research to optimize domain adaptation strategies in legal AI.

ChameleonBench: Quantifying Alignment Faking in Large Language Models

Thu, 13 Nov 2025 00:00:00 +0000

Alignment Faking is a phenomenon in which a language model pretends to agree with a certain set of instructions during a test or evaluation, only to revert to its predetermined or natural behavior once the test is over. Recent work has actually shown that models strategically deceive the users they are interacting with when presented with certain scenarios, such as an evaluation where the model is threatened with retraining if it does not comply with the given instructions. In this paper, we propose ChameleonBench, a new benchmark that measures and quantifies the tendency of a model to engage in alignment faking when evaluated for different behavioral patterns. Our benchmark consists of 800 prompts that span 8 harmful behaviors and two evaluation scenarios: one in which the model is made to act freely, and another in which it is aware that it is interacting in a closed or test-like environment. We use an external judge pipeline to rate the severity, or the extent to which a response demonstrates a specific harmful behavior. We evaluated the shift in severity across different scenarios to quantify alignment-faking. Evaluating six frontier and open-weight models, we find that leading large language models (LLMs) frequently engage in alignment-faking when presented with different types of scenarios, with some models differing by over 20% with regard to the extent to which they exhibit harmful behaviors in their responses.

Dual-Module Collaborative LoRA for Effective Large Language Model Fine-Tuning

Thu, 13 Nov 2025 00:00:00 +0000

To enable parameter-efficient fine-tuning of large language models (LLMs), Low-Rank Adaptation (LoRA) reduces parameters by freezing pretrained weights $W_\{0\}$ and approximating updates via low-rank matrices $\\Delta W = BA$. However, standard LoRA neglects the differential impact of low-rank matrix components on model performance and suffers from slow convergence due to random initialization. To address this, we propose a dual-module architecture: The shared module inherits pretrained weights’s core semantic representations through principal component initialization, retaining residuals in the original model.The expert module incorporates a selection mechanism guided by importance screening, with orthogonality constraints imposed through loss regularization to ensure independence in parameter update directions. The shared module accelerates convergence by updating world knowledge, while the expert module dynamically screens domain knowledge to achieve efficient allocation of updated budgets. Extensive experiments under identical configurations show our method achieves 76.8% average accuracy on Commonsense 170k (Llama 2-7B), surpassing LoRA by 2.1%. On GSM8K and HumanEval, it outperforms LoRA by 2.3% and 9.7% respectively.

Optimizing Trajectory Matching Distillation via Parameter Difference-Driven Pruning

Thu, 13 Nov 2025 00:00:00 +0000

Dataset distillation aims to give models trained on synthetic datasets the same performance as models trained with complete real datasets. Trajectory matching distillation, as an efficient dataset distillation method, achieves this goal gradually by accurately matching the dynamic trajectories of the target dataset and the synthetic dataset during the training process. Where the training trajectory is composed of the time series parameters of the agent model, and each time series contains the network parameters of all the layers in the agent model, i.e., trajectory matching distillation achieves its goal by matching the network parameters between the target dataset and the synthetic dataset. However, the variability of the training datasets used by the teacher and student networks can lead to the problem of difficult alignment of network parameters during the distillation process, so this paper proposes Difference-Driven Pruning Distillation (DPD), an innovative approach to pruning the difficult-to-align parameters according to the magnitude of the difference in parameter comparisons to alleviate the above problem. Comparative experimental results show that DPD achieves a significant performance improvement, with a greatly reduced memory footprint and superior performance in several benchmarks.

CryChime: When Large Language Models Learn to Listen to Distant Cries - A Counterfactual PEFT Framework for Urgent Need Detection in Disaster Social Media

Thu, 13 Nov 2025 00:00:00 +0000

In recent years, detecting instantaneously expressed urgent needs or requests in disaster-related posts from disaster-affected social media users has become crucial for disaster response and recovery. To address the gap that the performance of current urgent need detectors based on large language models (LLMs) is below requirements on this task from the domain of disaster response, we propose a novel insight: decomposing and inducting post content expressing disaster-induced urgent needs, into disaster event statements and disaster-induced appeals. The former, widely present and highly coarse-grained homogeneous across disaster-related posts, tends to introduce event-induced model bias leading to false recalls; while the latter, characterized by highly personalized, fine-grained and subjective phrasing, often challenge LLMs to allocate appropriate attentions to the corresponding tokens. In light of this, we propose CryChime, a novel model-agnostic parameter-efficient fine-tuning (PEFT) framework. CryChime represents disaster event statements in a bootstrapping style, and then removes the event-induced bias by orthogonal LoRA-based counterfactual learning. As fine-tuning steps increase, CryChime gradually disentangles the domain knowledge for understanding disaster event statements and disaster-induced appeals in candidate posts, then collaboratively leverage them in performing better urgent need detection. Experimental results on two benchmark datasets show that, compared to the strong baselines, CryChime can more effectively listen to the distant cries from the disaster-affected users. Our instruction-tuning data examples will be released in the further preprint version.

Outcome-Based Semifactuals for Reinforcement Learning

Thu, 13 Nov 2025 00:00:00 +0000

Counterfactual explanations in reinforcement learning (RL) aim to answer what-if questions by demonstrating sparse and minimal changes to states, resulting in the probability mass moving from one action to another. Although these explanations are effective in classification tasks that look for the presence of concepts, RL brings new challenges that counterfactual methods need to solve. These challenges include defining state similarity, avoiding out-of-distribution states, and improving discriminative power of explanations. Given a state of interest called the query state, we solve these problems by asking how long the agent can execute the query state action without incurring a negative outcome regarding the expected return. We coin this outcome-based semifactual (OSF) explanation and find the OSF state by simulating trajectories from the query state. The last state in a subtrajectory where we can take the same action as in the query state without incurring a negative outcome is the OSF state. This state is discriminative, plausible, and similar to the query state. It abstracts away unimportant action switching with little explanatory value and shows the boundary between positive and negative outcomes. Qualitatively, we show that our method explains when an agent must switch actions. As a result, it is easier to understand the agent’s behavior. Quantitatively, we demonstrate that our method can increase policy performance and, at the same time, reduce how often the agent switches its action across six environments. The code and trained models are made open source.

Jailbreak Defense in LLM via Attention Head Analysis and Selective Intervention

Thu, 13 Nov 2025 00:00:00 +0000

Jailbreak attacks reveal a persistent gap between the intended alignment of language models and their actual behavior during inference. To address this, we investigate how such attacks succeed at the internal level of model computation, focusing on attention heads. Unlike previous studies that primarily analyzed why jailbreaks work, our approach aims to develop a defense mechanism. We identify attention heads that influence whether a model produces a harmful or safe response by comparing activation patterns between a harmful prompt that is rejected and its adversarial variant that elicits a harmful response. By interpolating the internal representations of these heads between the two scenarios, we suppress harmful outputs while maintaining appropriate responses to benign prompts. Experiments with representative jailbreak methods, including GCG and AutoDAN, show that our method significantly reduces attack success rates without degrading response quality. For instance, with Llama-2-7b-chat, the average success rate drops from 39.3% to 1.1%. These findings reveal how internal attention dynamics affect output generation and demonstrate that targeted manipulation of internal components can enhance safety without requiring external filters or additional training.