chansung park's picture

chansung park PRO

chansung

·

AI & ML interests

None yet

Recent Activity

updated a Space 1 day ago

adaptsum/demo

liked a Space 3 days ago

adaptsum/demo

published a Space 3 days ago

adaptsum/demo

View all activity

Organizations

Posts 17

Post

2635

Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161

Articles 4

Article

7

dstack to manage clusters of on-prem servers for AI workloads with ease

View all Articles

Papers 2

arxiv:2412.06071

arxiv:2408.13467

spaces 43

Paper Q&A

Explore papers with auto generated Q&As

Llama2 With Gradio Chat

Zero2Story

Co Write With Llama2

LLMs As Chatbot

No application file

Adaptsum

models 79

chansung/gemma7b-gpt4o_1k_summarize-lora2

Updated Sep 25, 2024

chansung/gemma7b-gpt4o_1k_summarize-kasalora-auxloss

Updated Sep 24, 2024 • 2

chansung/gemma7b-gpt4o_1k_summarize-kasalora

Updated Sep 24, 2024 • 3

chansung/gemma7b-gpt4o_1k_summarize-lora

Updated Sep 24, 2024 • 7

chansung/flux-lora-test

Updated Aug 16, 2024 • 2 • 1

chansung/mental_health_counseling_merged_v0.1

Text Generation • Updated May 29, 2024 • 24 • 3

chansung/coding_llamaduo_result111

Updated May 27, 2024

chansung/mental_health_counseling_v0.1_merged

Text Generation • Updated May 18, 2024 • 9

chansung/mental_health_counseling_v0.1

Updated May 17, 2024 • 3

chansung/llamaduo_synth_ds_v0.1

Updated Apr 30, 2024 • 4 • 1

datasets 56

chansung/cqa_synth_ds

Viewer • Updated Jun 3, 2024 • 111k • 36

chansung/coding_synth_ds

Viewer • Updated Jun 3, 2024 • 116k • 44 • 1

chansung/classification_synth_ds

Viewer • Updated Jun 2, 2024 • 92.3k • 39

chansung/classification_synth_ds2

Viewer • Updated Jun 1, 2024 • 424 • 36

chansung/aaa3

Updated Jun 1, 2024 • 3

chansung/aaa2

Updated Jun 1, 2024 • 6

chansung/synth_summarize_dataset

Viewer • Updated May 31, 2024 • 880k • 73

chansung/new_summarize_synth_ds3

Viewer • Updated May 31, 2024 • 301k • 38

chansung/new_summarize_synth_ds

Viewer • Updated May 29, 2024 • 300k • 38

chansung/new_summarize_synth_ds2

Viewer • Updated May 29, 2024 • 301k • 66