paper_note/lynx/experements.ipynb
2025-03-14 09:06:19 +00:00

87 lines
2.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 设备设置\n",
"* 单机 8 卡 A800 设备\n",
"* 两机 16 卡 A800 设备\n",
"\n",
"# 工作负载设置\n",
"* 四个模型(参数量不同、架构不同)\n",
" - Llama 系列 2 个参数量不一样的\n",
" - MoE 系列 2 个参数量不一样的\n",
"* 输入序列长度固定512\n",
" - batchsize 设置 / inputsize 设置 \n",
" - 输入大小固定好,输入大小取最大可行的批大小??\n",
"\n",
"# 基线设置\n",
"* XLA : ALL\n",
"* SimpleFSDP : 测试单一并行策略 FSDP \n",
"* Megatron-LM : 测试混合并行策略 EP, DP, TP???"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 统计指标\n",
"* 显存使用\n",
"* 吞吐率samples/s\n",
"* 编译时间\n",
"* iteration time\n",
"* MFU\n",
"* 通信比例,计算比例\n",
"* 通信/计算重叠比例\n",
"* 通信量\n",
"\n",
"# 测试目标\n",
"* Lynx能够在计算图上发现更多重叠机会因此能够超过现在用于不同模型的不同并行策略的SOTA\n",
"\n",
"# 测试方法\n",
"* 单一并行方法MoE和non-Moe测试 \n",
" - lynx + SimpleFSDP\n",
" - XLA + SimpleFSDP\n",
" - SimpleFSDP\n",
"* 混合并行方法non-MoE下测试 ?TP + ?DPMoE下测试 ?TP + ?EP + ?DP\n",
" - Lynx + Megatron-LM\n",
" - XLA + Megatron-LM\n",
" - Megatron-LM\n",
"\n",
"\n",
"# 测试结果:\n",
"总得来说Lynx能在不同配置中发现更多的重叠机会以实现最大的重叠效率。\n",
"编译时间结果 - 额外开销时间不大\n",
"内存开销结果 - 不会引入太多的内存 cost\n",
"模型收敛 - 模型在训练时能够收敛"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 可扩展性\n",
"\n",
"# 测试目标\n",
"测试Lynx的可拓展性节点变多时的性能增加\n",
"\n",
"# 测试方法:\n",
"选 FSDP 测试 LYNX逐渐增加节点basehsize 还是取最大可行的批大小)\n",
"\n",
"# 测试结果:\n",
" - iteration time\n",
" - MFU\n",
" - 编译时间"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}