Research Publications
A comprehensive collection of our peer-reviewed papers, articles, and reports. For the full record of our scholarly contributions.
1. Generative image reconstruction from gradients
IEEE Transactions on Neural Networks and Learning Systems
2. Vision-amplified semantic entropy for hallucination detection in medical visual question answering
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
3. Cross-modal obfuscation for jailbreak attacks on large vision-language models
arXiv preprint arXiv:2506.16760
4. A survey and evaluation of adversarial attacks in object detection
IEEE Transactions on Neural Networks and Learning Systems
5. Intention Analysis Makes LLMs A Good Jailbreak Defender
In Proceedings of the 31st International Conference on Computational Linguistics.
6. Revisiting Catastrophic Forgetting in Large Language Model Tuning
In Findings of the Association for Computational Linguistics: EMNLP 2024
7. NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
In Proceedings of the Thirteenth International Conference on Learning Representations
8. Breaking the false sense of security in backdoor defense through re-activation attack
Advances in Neural Information Processing Systems
9. Inform: Mitigating reward hacking in rlhf via information-theoretic reward modeling
Advances in Neural Information Processing Systems
10. Learning from models beyond fine-tuning
Nature Machine Intelligence
11. Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks
In Proceedings of the Thirteenth International Conference on Learning Representations
12. Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 9477-9486)
13. ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks
International Conference on Machine Learning
14. Copyrightshield: Spatial similarity guided backdoor defense against copyright infringement in diffusion models
IEEE/CVF International Conference on Computer Vision
15. Elba-bench: An efficient learning backdoor attacks benchmark for large language models
The 63rd Annual Meeting of the Association for Computational Linguistics
16. Cot-valve: Length-compressible chain-of-thought tuning
The 63rd Annual Meeting of the Association for Computational Linguistics
17. Open-World Authorship Attribution
In Findings of the Association for Computational Linguistics: ACL 2025 (pp. 17744-17758)
