Purdue ASSET Research Group

university
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

shanchao  updated a dataset 1 day ago
lt-asset/REPOCOD_Lite_Unified
shanchao  updated a model 23 days ago
lt-asset/Waffle_VLM_WebSight
jiang719  updated a dataset about 1 month ago
lt-asset/tab2latex
View all activity

lt-asset's activity

lin-tan 
posted an update about 24 hours ago
view post
Post
1417
🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471

We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖

💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.

📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.

@yiwu @jiang719

#ICRA2025 #LLM #Robotics #Agent #LLMPlanner
lin-tan 
posted an update 9 days ago
view post
Post
1559
Introducing Nova (ICLR’25), foundation models for binary/assembly code. We have also released fine-tuned models for binary code decompilation. Preprint: arxiv.org/pdf/2311.13721 This is our follow-up work on binary analysis after our CCS'24 distinguished paper (https://www.linkedin.com/posts/lintan_resym-harnessing-llms-to-recover-variable-activity-7231749452154159105-sEgj)

Highlights:
1. Nova is built with hierarchical attention specially designed for binary and contrastive learning.
2. Nova is pre-trained on 3B binary and source code tokens.
3. Models: lt-asset/nova-6.7b lt-asset/nova-6.7b-bcr
4. Smaller 1.3B models lt-asset/nova-1.3b lt-asset/nova-1.3b-bcr

Binaries are a form of code. Do not forget about binaries when you work on #LLM4Code.

Why binaries and binary models? Binary code plays an irreplaceable role in crucial tasks, including vulnerability detection, malware detection, binary recovery, and legacy software maintenance. For example, when performing tasks such as identifying attacks and malware, security analysts often only have access to assembly, i.e., the human-readable representation of binary code, which is extremely difficult to understand. Thus, combined with the increasing sophistication of cybercrime that poses significant threats worldwide (e.g., cybercrime is predicted to cost the world $10.5 trillion annually by 2025 (Sausalito, 2020)), effective binary analysis techniques are in high demand.

#LLM4Code #LLM #BinaryAnalysis #Security

@jiang719 Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Xiangyu Zhang, @pbabkin
lin-tan 
posted an update 3 months ago
view post
Post
1435
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 1 reply
·