arxiv:2502.04327

Value-Based Deep RL Scales Predictably

Published on Feb 6

· Submitted by

orybkin on Feb 10

Upvote

Authors:

Oleh Rybkin ,

Aviral Kumar

Abstract

Scaling data and compute is critical to the success of machine learning. However, scaling demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from small-scale runs, without running the large-scale experiment. In this paper, we show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. First, we show that data and compute requirements to attain a given performance level lie on a Pareto frontier, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can predict this data requirement when given more compute, and this compute requirement when given more data. Second, we determine the optimal allocation of a total resource budget across data and compute for a given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling behavior is enabled by first estimating predictable relationships between hyperparameters, which is used to manage effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.

View arXiv page View PDF Add to collection

Community

orybkin

Paper author Paper submitter 4 days ago

We establish that value-based online RL can be scaled predictably to larger data, larger compute, or generally larger budget

panikov

4 days ago

thanks, very interesting.

at my opinion RL will be more progressing to more predefined world models, like more physical laws for real world. And for RL the bigger the model is, the more complex it's to small-scale

orybkin

Paper author 4 days ago

Combining this with pretrained models would definitely be very interesting! One big question there is how much pretraining vs finetuning helps you, so how to allocate compute across both.

orybkin

Paper author Paper submitter 4 days ago

Combining this with pretrained models would definitely be very interesting! One big question there is how much pretraining vs finetuning helps you, so how to allocate compute across both.

librarian-bot

3 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.04327 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.04327 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.04327 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.