Lin Tan's picture
14 9

Lin Tan

lin-tan

AI & ML interests

AI-Software Synergy. LLM4Code (binary and source code). Mary J. Elmore New Frontiers Professor Purdue University

Recent Activity

reacted to their post with 🔥 9 days ago
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj) Highlights: 1. Nova is built with hierarchical attention specially designed for binary and contrastive learning. 2. Nova is pre-trained on 3B binary and source code tokens. 3. Models: https://huggingface.co/lt-asset/nova-6.7b https://huggingface.co/lt-asset/nova-6.7b-bcr 4. Smaller 1.3B models https://huggingface.co/lt-asset/nova-1.3b… https://huggingface.co/lt-asset/nova-1.3b-bcr Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code. Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand. #LLM4Code #LLM #BinaryAnalysis #Security @jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin
View all activity

Organizations

Purdue ASSET Research Group's profile picture

Posts 3

view post
Post
990
🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471

We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖

💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.

📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.

@yiwu @jiang719

#ICRA2025 #LLM #Robotics #Agent #LLMPlanner
view post
Post
1554
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj)

Highlights:
1. Nova is built with hierarchical attention specially designed for binary and contrastive learning.
2. Nova is pre-trained on 3B binary and source code tokens.
3. Models: lt-asset/nova-6.7b lt-asset/nova-6.7b-bcr
4. Smaller 1.3B models lt-asset/nova-1.3b lt-asset/nova-1.3b-bcr

Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code.

Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand.

#LLM4Code #LLM #BinaryAnalysis #Security

@jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin

models

None public yet

datasets

None public yet