{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 设备设置\n", "* 单机 8 卡 A800 设备\n", "* 两机 16 卡 A800 设备\n", "\n", "# 工作负载设置\n", "* 四个模型(参数量不同、架构不同)\n", " - Llama 系列 2 个参数量不一样的\n", " - MoE 系列 2 个参数量不一样的\n", "* 输入序列长度固定:512??\n", " - batchsize 设置 / inputsize 设置 ??\n", " - 输入大小固定好,输入大小取最大可行的批大小??\n", "\n", "# 基线设置\n", "* XLA : ALL\n", "* SimpleFSDP : 测试单一并行策略 FSDP \n", "* Megatron-LM : 测试混合并行策略 EP, DP, TP???" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 统计指标\n", "* 显存使用\n", "* 吞吐率(samples/s)\n", "* 编译时间\n", "* iteration time\n", "* MFU\n", "* 通信比例,计算比例\n", "* 通信/计算重叠比例\n", "* 通信量\n", "\n", "# 测试目标\n", "* Lynx能够在计算图上发现更多重叠机会,因此能够超过现在用于不同模型的不同并行策略的SOTA\n", "\n", "# 测试方法\n", "* 单一并行方法,MoE和non-Moe测试 \n", " - lynx + SimpleFSDP\n", " - XLA + SimpleFSDP\n", " - SimpleFSDP\n", "* 混合并行方法,non-MoE下测试 ?TP + ?DP,MoE下测试 ?TP + ?EP + ?DP\n", " - Lynx + Megatron-LM\n", " - XLA + Megatron-LM\n", " - Megatron-LM\n", "\n", "\n", "# 测试结果:\n", "总得来说Lynx能在不同配置中发现更多的重叠机会,以实现最大的重叠效率。\n", "编译时间结果 - 额外开销时间不大\n", "内存开销结果 - 不会引入太多的内存 cost\n", "模型收敛 - 模型在训练时能够收敛" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 可扩展性\n", "\n", "# 测试目标\n", "测试Lynx的可拓展性,节点变多时的性能增加\n", "\n", "# 测试方法:\n", "选 FSDP 测试 LYNX,逐渐增加节点(basehsize 还是取最大可行的批大小)\n", "\n", "# 测试结果:\n", " - iteration time\n", " - MFU\n", " - 编译时间" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }