Parallelization support

by yigitbekir - opened 4 days ago

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

-1

yigitbekir

4 days ago

Added the ability to run with accelerate or torchrun on multiple GPUs by replacing the line 495 with x = self.transformer.adapter(torch.cat([x, input_embeds.to(x.device)], dim=-1))

Parallelization supporte83683da

JonasGeiping

Tom Goldstein's Lab at University of Maryland, College Park org 2 days ago

yeah that is valid - Although if you are parallelizing by auto-mapping the layers of the core block onto different devices, you are going to have a very bad/slow time. Better to keep the prelude on one GPU, the core block on another, the head on a 3rd, and maybe store the KV cache on a 4th.

JonasGeiping changed pull request status to merged 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment