[lynx] add

This commit is contained in:
yezhengmao1 2025-03-14 09:06:19 +00:00
parent 758536a5b3
commit 326895f14f
3 changed files with 456 additions and 0 deletions

86
lynx/experements.ipynb Normal file
View File

@ -0,0 +1,86 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 设备设置\n",
"* 单机 8 卡 A800 设备\n",
"* 两机 16 卡 A800 设备\n",
"\n",
"# 工作负载设置\n",
"* 四个模型(参数量不同、架构不同)\n",
" - Llama 系列 2 个参数量不一样的\n",
" - MoE 系列 2 个参数量不一样的\n",
"* 输入序列长度固定512\n",
" - batchsize 设置 / inputsize 设置 \n",
" - 输入大小固定好,输入大小取最大可行的批大小??\n",
"\n",
"# 基线设置\n",
"* XLA : ALL\n",
"* SimpleFSDP : 测试单一并行策略 FSDP \n",
"* Megatron-LM : 测试混合并行策略 EP, DP, TP???"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 统计指标\n",
"* 显存使用\n",
"* 吞吐率samples/s\n",
"* 编译时间\n",
"* iteration time\n",
"* MFU\n",
"* 通信比例,计算比例\n",
"* 通信/计算重叠比例\n",
"* 通信量\n",
"\n",
"# 测试目标\n",
"* Lynx能够在计算图上发现更多重叠机会因此能够超过现在用于不同模型的不同并行策略的SOTA\n",
"\n",
"# 测试方法\n",
"* 单一并行方法MoE和non-Moe测试 \n",
" - lynx + SimpleFSDP\n",
" - XLA + SimpleFSDP\n",
" - SimpleFSDP\n",
"* 混合并行方法non-MoE下测试 ?TP + ?DPMoE下测试 ?TP + ?EP + ?DP\n",
" - Lynx + Megatron-LM\n",
" - XLA + Megatron-LM\n",
" - Megatron-LM\n",
"\n",
"\n",
"# 测试结果:\n",
"总得来说Lynx能在不同配置中发现更多的重叠机会以实现最大的重叠效率。\n",
"编译时间结果 - 额外开销时间不大\n",
"内存开销结果 - 不会引入太多的内存 cost\n",
"模型收敛 - 模型在训练时能够收敛"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 可扩展性\n",
"\n",
"# 测试目标\n",
"测试Lynx的可拓展性节点变多时的性能增加\n",
"\n",
"# 测试方法:\n",
"选 FSDP 测试 LYNX逐渐增加节点basehsize 还是取最大可行的批大小)\n",
"\n",
"# 测试结果:\n",
" - iteration time\n",
" - MFU\n",
" - 编译时间"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

185
lynx/scalability.ipynb Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long