Parallelization support
#5
by
yigitbekir
- opened
Added the ability to run with accelerate
or torchrun
on multiple GPUs by replacing the line 495 with x = self.transformer.adapter(torch.cat([x, input_embeds.to(x.device)], dim=-1))
yeah that is valid - Although if you are parallelizing by auto-mapping the layers of the core block onto different devices, you are going to have a very bad/slow time. Better to keep the prelude on one GPU, the core block on another, the head on a 3rd, and maybe store the KV cache on a 4th.
JonasGeiping
changed pull request status to
merged