Lin Tan's picture
14 9

Lin Tan

lin-tan

AI & ML interests

AI-Software Synergy. LLM4Code (binary and source code). Mary J. Elmore New Frontiers Professor Purdue University

Recent Activity

reacted to their post with 🔥 9 days ago
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj) Highlights: 1. Nova is built with hierarchical attention specially designed for binary and contrastive learning. 2. Nova is pre-trained on 3B binary and source code tokens. 3. Models: https://huggingface.co/lt-asset/nova-6.7b https://huggingface.co/lt-asset/nova-6.7b-bcr 4. Smaller 1.3B models https://huggingface.co/lt-asset/nova-1.3b… https://huggingface.co/lt-asset/nova-1.3b-bcr Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code. Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand. #LLM4Code #LLM #BinaryAnalysis #Security @jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin
View all activity

Organizations

Purdue ASSET Research Group's profile picture

lin-tan's activity

reacted to their post with 🔥 about 21 hours ago
view post
Post
1211
🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471

We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖

💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.

📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.

@yiwu @jiang719

#ICRA2025 #LLM #Robotics #Agent #LLMPlanner
posted an update about 21 hours ago
view post
Post
1211
🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471

We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖

💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.

📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.

@yiwu @jiang719

#ICRA2025 #LLM #Robotics #Agent #LLMPlanner
reacted to their post with 🔥 9 days ago
view post
Post
1556
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj)

Highlights:
1. Nova is built with hierarchical attention specially designed for binary and contrastive learning.
2. Nova is pre-trained on 3B binary and source code tokens.
3. Models: lt-asset/nova-6.7b lt-asset/nova-6.7b-bcr
4. Smaller 1.3B models lt-asset/nova-1.3b lt-asset/nova-1.3b-bcr

Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code.

Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand.

#LLM4Code #LLM #BinaryAnalysis #Security

@jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin
posted an update 9 days ago
view post
Post
1556
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj)

Highlights:
1. Nova is built with hierarchical attention specially designed for binary and contrastive learning.
2. Nova is pre-trained on 3B binary and source code tokens.
3. Models: lt-asset/nova-6.7b lt-asset/nova-6.7b-bcr
4. Smaller 1.3B models lt-asset/nova-1.3b lt-asset/nova-1.3b-bcr

Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code.

Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand.

#LLM4Code #LLM #BinaryAnalysis #Security

@jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin
reacted to their post with 🔥🤗 3 months ago
view post
Post
1435
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 1 reply
·
posted an update 3 months ago
view post
Post
1435
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 1 reply
·