{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 设备设置\n",
    "* 单机 8  卡 A800 设备\n",
    "* 两机 16 卡 A800 设备\n",
    "\n",
    "# 工作负载设置\n",
    "* 四个模型(参数量不同、架构不同)\n",
    "    - Llama 系列 2 个参数量不一样的\n",
    "    - MoE 系列 2 个参数量不一样的\n",
    "* 输入序列长度固定:512??\n",
    "    - batchsize 设置 / inputsize 设置 ??\n",
    "    - 输入大小固定好,输入大小取最大可行的批大小??\n",
    "\n",
    "# 基线设置\n",
    "* XLA         : ALL\n",
    "* SimpleFSDP  : 测试单一并行策略 FSDP \n",
    "* Megatron-LM : 测试混合并行策略 EP, DP, TP???"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 统计指标\n",
    "* 显存使用\n",
    "* 吞吐率(samples/s)\n",
    "* 编译时间\n",
    "* iteration time\n",
    "* MFU\n",
    "* 通信比例,计算比例\n",
    "* 通信/计算重叠比例\n",
    "* 通信量\n",
    "\n",
    "# 测试目标\n",
    "* Lynx能够在计算图上发现更多重叠机会,因此能够超过现在用于不同模型的不同并行策略的SOTA\n",
    "\n",
    "# 测试方法\n",
    "* 单一并行方法,MoE和non-Moe测试 \n",
    "  - lynx + SimpleFSDP\n",
    "  - XLA + SimpleFSDP\n",
    "  - SimpleFSDP\n",
    "* 混合并行方法,non-MoE下测试 ?TP + ?DP,MoE下测试 ?TP + ?EP + ?DP\n",
    "  - Lynx + Megatron-LM\n",
    "  - XLA + Megatron-LM\n",
    "  - Megatron-LM\n",
    "\n",
    "\n",
    "# 测试结果:\n",
    "总得来说Lynx能在不同配置中发现更多的重叠机会,以实现最大的重叠效率。\n",
    "编译时间结果 - 额外开销时间不大\n",
    "内存开销结果 - 不会引入太多的内存 cost\n",
    "模型收敛 - 模型在训练时能够收敛"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 可扩展性\n",
    "\n",
    "# 测试目标\n",
    "测试Lynx的可拓展性,节点变多时的性能增加\n",
    "\n",
    "# 测试方法:\n",
    "选 FSDP 测试 LYNX,逐渐增加节点(basehsize 还是取最大可行的批大小)\n",
    "\n",
    "# 测试结果:\n",
    "  - iteration time\n",
    "  - MFU\n",
    "  - 编译时间"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}