[lynx] add
This commit is contained in:
parent
758536a5b3
commit
326895f14f
86
lynx/experements.ipynb
Normal file
86
lynx/experements.ipynb
Normal file
@ -0,0 +1,86 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 设备设置\n",
|
||||
"* 单机 8 卡 A800 设备\n",
|
||||
"* 两机 16 卡 A800 设备\n",
|
||||
"\n",
|
||||
"# 工作负载设置\n",
|
||||
"* 四个模型(参数量不同、架构不同)\n",
|
||||
" - Llama 系列 2 个参数量不一样的\n",
|
||||
" - MoE 系列 2 个参数量不一样的\n",
|
||||
"* 输入序列长度固定:512??\n",
|
||||
" - batchsize 设置 / inputsize 设置 ??\n",
|
||||
" - 输入大小固定好,输入大小取最大可行的批大小??\n",
|
||||
"\n",
|
||||
"# 基线设置\n",
|
||||
"* XLA : ALL\n",
|
||||
"* SimpleFSDP : 测试单一并行策略 FSDP \n",
|
||||
"* Megatron-LM : 测试混合并行策略 EP, DP, TP???"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 统计指标\n",
|
||||
"* 显存使用\n",
|
||||
"* 吞吐率(samples/s)\n",
|
||||
"* 编译时间\n",
|
||||
"* iteration time\n",
|
||||
"* MFU\n",
|
||||
"* 通信比例,计算比例\n",
|
||||
"* 通信/计算重叠比例\n",
|
||||
"* 通信量\n",
|
||||
"\n",
|
||||
"# 测试目标\n",
|
||||
"* Lynx能够在计算图上发现更多重叠机会,因此能够超过现在用于不同模型的不同并行策略的SOTA\n",
|
||||
"\n",
|
||||
"# 测试方法\n",
|
||||
"* 单一并行方法,MoE和non-Moe测试 \n",
|
||||
" - lynx + SimpleFSDP\n",
|
||||
" - XLA + SimpleFSDP\n",
|
||||
" - SimpleFSDP\n",
|
||||
"* 混合并行方法,non-MoE下测试 ?TP + ?DP,MoE下测试 ?TP + ?EP + ?DP\n",
|
||||
" - Lynx + Megatron-LM\n",
|
||||
" - XLA + Megatron-LM\n",
|
||||
" - Megatron-LM\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 测试结果:\n",
|
||||
"总得来说Lynx能在不同配置中发现更多的重叠机会,以实现最大的重叠效率。\n",
|
||||
"编译时间结果 - 额外开销时间不大\n",
|
||||
"内存开销结果 - 不会引入太多的内存 cost\n",
|
||||
"模型收敛 - 模型在训练时能够收敛"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 可扩展性\n",
|
||||
"\n",
|
||||
"# 测试目标\n",
|
||||
"测试Lynx的可拓展性,节点变多时的性能增加\n",
|
||||
"\n",
|
||||
"# 测试方法:\n",
|
||||
"选 FSDP 测试 LYNX,逐渐增加节点(basehsize 还是取最大可行的批大小)\n",
|
||||
"\n",
|
||||
"# 测试结果:\n",
|
||||
" - iteration time\n",
|
||||
" - MFU\n",
|
||||
" - 编译时间"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
185
lynx/scalability.ipynb
Normal file
185
lynx/scalability.ipynb
Normal file
File diff suppressed because one or more lines are too long
185
lynx/scalability_compile_time.ipynb
Normal file
185
lynx/scalability_compile_time.ipynb
Normal file
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user