<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>HuckOps</title>
  
  
  <link href="http://www.huckops.xyz/atom.xml" rel="self"/>
  
  <link href="http://www.huckops.xyz/"/>
  <updated>2026-06-06T17:04:29.931Z</updated>
  <id>http://www.huckops.xyz/</id>
  
  <author>
    <name>Huck</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Gitlab Runner原理及运维</title>
    <link href="http://www.huckops.xyz/2026/03/20/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/Gitlab%20Runner%E5%8E%9F%E7%90%86%E5%8F%8A%E8%BF%90%E7%BB%B4/"/>
    <id>http://www.huckops.xyz/2026/03/20/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/Gitlab%20Runner%E5%8E%9F%E7%90%86%E5%8F%8A%E8%BF%90%E7%BB%B4/</id>
    <published>2026-03-20T15:55:14.000Z</published>
    <updated>2026-06-06T17:04:29.931Z</updated>
    
    <content type="html"><![CDATA[<p>在生产项目中，所有可执行文件和项目源码包都不是由开发人员在本地手动构建的，其主要原因有以下几点：</p><ol><li>本地构建的产物可能和线上环境不一致，可能会出现奇怪的问题。</li><li>针对于微服务来说，一次发版可能会涉及到多个服务，每个服务都需要手动构建，这会增加运维成本。</li></ol><p>所以针对这一问题，Gitlab Pipeline 可以将项目构建以及其他一些流程化操作使用流水线的方式进行自动化。</p><h1 id="Gitlab-CI-原理"><a href="#Gitlab-CI-原理" class="headerlink" title="Gitlab CI 原理"></a>Gitlab CI 原理</h1><p>老规矩，要了解一个产品的原理首先要从架构入手。我们先来看一下 Gitlab CI 的工作流。</p><p><img src="https://s3.huckops.xyz/1780765077465.png" alt="1780765077465.png"></p><h2 id="横向理解"><a href="#横向理解" class="headerlink" title="横向理解"></a>横向理解</h2><p>从工作流的过程图可以看出，Gitlab Runner 的架构主要分为三部分：</p><ol><li>Gitlab: 即为 Gitlab 的主站，负责管理项目的代码仓库、项目配置、构建触发等。</li><li>Runner：Gitlab Runner 的客户端工具，负责启动任务。</li><li>Executor：执行器，主要完成 Gitlab CI 脚本中定义的任务。</li></ol><p>其简化的架构可以理解为：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Gitlab ------&gt; Runner(任务管理器) ------&gt; Executor(任务执行器)</span><br></pre></td></tr></table></figure><h2 id="纵向理解"><a href="#纵向理解" class="headerlink" title="纵向理解"></a>纵向理解</h2><h3 id="初始化阶段"><a href="#初始化阶段" class="headerlink" title="初始化阶段"></a>初始化阶段</h3><p>Gitbal Runner 携带 token 向 Gitlab 注册，注册成功后，Gitlab 会返回一个唯一的 ID，Runner 会将这个 ID 存储在本地。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"># gitlab-runner register --url https://gitlab.com/ --registration-token &lt;token&gt;</span><br><span class="line">Runtime platform                                    arch=amd64 os=linux pid=2509 revision=07e534ba version=18.9.0</span><br><span class="line">Running in system-mode.</span><br><span class="line"></span><br><span class="line">Enter the GitLab instance URL (for example, https://gitlab.com/):</span><br><span class="line">[https://gitlab.com/]:</span><br><span class="line">Enter the registration token:</span><br><span class="line">[token]:</span><br><span class="line">Enter a description for the runner:</span><br><span class="line">[debian]:</span><br><span class="line">Enter tags for the runner (comma-separated):</span><br><span class="line"></span><br><span class="line">Enter optional maintenance note for the runner:</span><br><span class="line"></span><br><span class="line">WARNING: Support for registration tokens and runner parameters in the &#x27;register&#x27; command has been deprecated in GitLab Runner 15.6 and will be replaced with support for authentication tokens. For more information, see https://docs.gitlab.com/ci/runners/new_creation_workflow/</span><br><span class="line">Registering runner... succeeded                     correlation_id=9df253558c3d5e55-SJC runner=XyxCKGHg9 runner_name=debian</span><br><span class="line">Enter an executor: shell, ssh, virtualbox, docker, docker-windows, custom, parallels, docker+machine, kubernetes, docker-autoscaler, instance:</span><br><span class="line">docker</span><br><span class="line">Enter the default Docker image (for example, ruby:3.3):</span><br><span class="line">debian:13.0</span><br><span class="line">Runner registered successfully. Feel free to start it, but if it&#x27;s running already the config should be automatically reloaded!</span><br><span class="line"></span><br><span class="line">Configuration (with the authentication token) was saved in &quot;/etc/gitlab-runner/config.toml&quot;</span><br></pre></td></tr></table></figure><h3 id="运行阶段"><a href="#运行阶段" class="headerlink" title="运行阶段"></a>运行阶段</h3><p>Gitlab Runner 在运行阶段，循环监听 Gitlab 端发送的任务。当监听到有自己的任务时将通知 Executor 执行 CI 脚本，并将运行日志及运行结果回报给 Gitlab。</p><h1 id="常见-Runner-模式"><a href="#常见-Runner-模式" class="headerlink" title="常见 Runner 模式"></a>常见 Runner 模式</h1><h2 id="docker-模式"><a href="#docker-模式" class="headerlink" title="docker 模式"></a>docker 模式</h2><p>在使用 docker 模式时，CI 运行时会在本地运行一个 docker 容器运行 CI 脚本。针对于普通类型的编译（如 npm build，go build 等），直接在容器中执行即可。一下为一个简单的 docker 模式下编译的配置：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">stages:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="string">build</span></span><br><span class="line"></span><br><span class="line"><span class="attr">build:</span></span><br><span class="line">  <span class="attr">tags:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">shared</span></span><br><span class="line">  <span class="attr">stage:</span> <span class="string">build</span></span><br><span class="line">  <span class="attr">image:</span> <span class="string">node:16-alpine</span></span><br><span class="line"></span><br><span class="line">  <span class="attr">before_script:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">npm</span> <span class="string">install</span></span><br><span class="line"></span><br><span class="line">  <span class="attr">script:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">npm</span> <span class="string">run</span> <span class="string">build</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">tar</span> <span class="string">-czvf</span> <span class="string">$CI_PROJECT_NAME-$CI_COMMIT_SHA8x.tar.gz</span> <span class="string">dist</span></span><br><span class="line"></span><br><span class="line">  <span class="attr">artifacts:</span></span><br><span class="line">    <span class="attr">paths:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">$CI_PROJECT_NAME-$CI_COMMIT_SHA8x.tar.gz</span></span><br><span class="line">    <span class="attr">when:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">expire_in:</span> <span class="number">1</span> <span class="string">hour</span></span><br></pre></td></tr></table></figure><p>可以看到，这里直接在 runner 上运行了一个 node:16-alpine 的 docker 容器，容器中执行了 npm install 和 npm run build 命令，最后将编译产生的 dist 目录打包成 tar.gz 文件暴露给用户。</p><p>但是可以考虑一个问题，如果编译的目标项目是需要打包成 docker 镜像，直接这样可以吗？</p><p>我们可以尝试用以下脚本进行一次 CI 构建：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">stages:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="string">build</span></span><br><span class="line"></span><br><span class="line"><span class="attr">variables:</span></span><br><span class="line">  <span class="attr">DOCKER_TLS_CERTDIR:</span> <span class="string">&quot;&quot;</span></span><br><span class="line">  <span class="attr">IMAGE_NAME:</span> <span class="string">my-golang-app</span></span><br><span class="line">  <span class="attr">IMAGE_TAG:</span> <span class="string">$CI_COMMIT_SHORT_SHA</span></span><br><span class="line">  <span class="attr">DOCKERFILE_PATH:</span> <span class="string">./Dockerfile</span></span><br><span class="line"></span><br><span class="line"><span class="attr">build-docker-image:</span></span><br><span class="line">  <span class="attr">tags:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">shared</span></span><br><span class="line">  <span class="attr">stage:</span> <span class="string">build</span></span><br><span class="line">  <span class="attr">image:</span> <span class="string">docker:latest</span></span><br><span class="line">  <span class="attr">script:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">docker</span> <span class="string">info</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">docker</span> <span class="string">build</span> <span class="string">-t</span> <span class="string">$IMAGE_NAME:$IMAGE_TAG</span> <span class="string">-f</span> <span class="string">$DOCKERFILE_PATH</span> <span class="string">.</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">docker</span> <span class="string">images</span> <span class="string">|</span> <span class="string">grep</span> <span class="string">$IMAGE_NAME</span></span><br><span class="line">  <span class="attr">artifacts:</span></span><br><span class="line">    <span class="attr">paths:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">build.log</span></span><br><span class="line">    <span class="attr">when:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">expire_in:</span> <span class="number">1</span> <span class="string">hour</span></span><br></pre></td></tr></table></figure><p>触发构建后，会发现 CI 抛出了一段错误：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">$ docker info</span><br><span class="line">Client:</span><br><span class="line"> Version:    29.3.0</span><br><span class="line"> Context:    default</span><br><span class="line"> Debug Mode: false</span><br><span class="line"> Plugins:</span><br><span class="line">  buildx: Docker Buildx (Docker Inc.)</span><br><span class="line">    Version:  v0.32.1</span><br><span class="line">    Path:     /usr/local/libexec/docker/cli-plugins/docker-buildx</span><br><span class="line">  compose: Docker Compose (Docker Inc.)</span><br><span class="line">    Version:  v5.1.0</span><br><span class="line">    Path:     /usr/local/libexec/docker/cli-plugins/docker-compose</span><br><span class="line">Server:</span><br><span class="line">failed to connect to the docker API at tcp://docker:2375: lookup docker on 8.8.8.8:53: no such host</span><br></pre></td></tr></table></figure><p>docker 拉起的 docker 容器并非是一个 docker 的完整体，其本质只是一个 docker 的客户端。从 docker 的原理我们可以知道，docker 是由 docker daemon 和 docker client 组成，docker daemon 负责管理 docker 容器，docker client 负责与 docker daemon 通信，在默认情况下，client 是通过 sock 与 docker daemon 通信的。所以，在这里我们需要对 Runner 进行配置，将本机 docker 的 sock 挂载给容器，让拉起的 CI docker 容器可以操作本机的 Docker daemon。</p><figure class="highlight toml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="section">[[runners]]</span></span><br><span class="line">  <span class="attr">name</span> = <span class="string">&quot;debian&quot;</span></span><br><span class="line">  <span class="attr">url</span> = <span class="string">&quot;https://gitlab.com/&quot;</span></span><br><span class="line">  <span class="attr">id</span> = <span class="number">5</span></span><br><span class="line">  <span class="attr">token</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">  <span class="attr">token_obtained_at</span> = <span class="number">2026</span>-<span class="number">03</span>-<span class="number">20</span>T06:<span class="number">12</span>:<span class="number">02</span>Z</span><br><span class="line">  <span class="attr">token_expires_at</span> = <span class="number">0001</span>-<span class="number">01</span>-<span class="number">01</span>T00:<span class="number">00</span>:<span class="number">00</span>Z</span><br><span class="line">  <span class="attr">executor</span> = <span class="string">&quot;docker&quot;</span></span><br><span class="line">  <span class="section">[runners.cache]</span></span><br><span class="line">    <span class="attr">MaxUploadedArchiveSize</span> = <span class="number">0</span></span><br><span class="line">    <span class="section">[runners.cache.s3]</span></span><br><span class="line">    <span class="section">[runners.cache.gcs]</span></span><br><span class="line">    <span class="section">[runners.cache.azure]</span></span><br><span class="line">  <span class="section">[runners.docker]</span></span><br><span class="line">    <span class="attr">tls_verify</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">image</span> = <span class="string">&quot;debian:13&quot;</span></span><br><span class="line">    <span class="attr">privileged</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">disable_entrypoint_overwrite</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">oom_kill_disable</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">disable_cache</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">volumes</span> = [<span class="string">&quot;/cache&quot;</span>, <span class="string">&quot;/var/run/docker.sock:/var/run/docker.sock&quot;</span>]</span><br><span class="line">    <span class="attr">shm_size</span> = <span class="number">0</span></span><br><span class="line">    <span class="attr">network_mtu</span> = <span class="number">0</span></span><br></pre></td></tr></table></figure><p>这就是典型的 Docker in Docker 模式（DinD）。</p><h2 id="kubernetes-模式"><a href="#kubernetes-模式" class="headerlink" title="kubernetes 模式"></a>kubernetes 模式</h2><p>kubernetes 模式和 docker 类似，但是 kubernetes 只需要在任何可以访问到 api server 的节点上运行 Runner，而 Docker 模式是需要在每个节点上运行一个 Runner 的。</p><p>无论是使用 docker 还是 kubernetes 模式，使用的 CI 脚本理论上都是一样的，而且一样需要配置 Docker daemon 的 sock 挂载。</p><figure class="highlight toml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="section">[[runners]]</span></span><br><span class="line">  <span class="attr">name</span> = <span class="string">&quot;k8s&quot;</span></span><br><span class="line">  <span class="attr">url</span> = <span class="string">&quot;https://gitlab.com/&quot;</span></span><br><span class="line">  <span class="attr">id</span> = <span class="number">5</span></span><br><span class="line">  <span class="attr">token</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">  <span class="attr">token_obtained_at</span> = <span class="number">2026</span>-<span class="number">03</span>-<span class="number">20</span>T07:<span class="number">04</span>:<span class="number">17</span>Z</span><br><span class="line">  <span class="attr">token_expires_at</span> = <span class="number">0001</span>-<span class="number">01</span>-<span class="number">01</span>T00:<span class="number">00</span>:<span class="number">00</span>Z</span><br><span class="line">  <span class="attr">executor</span> = <span class="string">&quot;kubernetes&quot;</span></span><br><span class="line">  <span class="section">[runners.cache]</span></span><br><span class="line">    <span class="attr">MaxUploadedArchiveSize</span> = <span class="number">0</span></span><br><span class="line">    <span class="section">[runners.cache.s3]</span></span><br><span class="line">    <span class="section">[runners.cache.gcs]</span></span><br><span class="line">    <span class="section">[runners.cache.azure]</span></span><br><span class="line">  <span class="section">[runners.kubernetes]</span></span><br><span class="line">    <span class="attr">host</span> = <span class="string">&quot;https://127.0.0.1:6443&quot;</span></span><br><span class="line">    <span class="attr">cert_file</span> = <span class="string">&quot;/root/k8s/admin.crt&quot;</span></span><br><span class="line">    <span class="attr">key_file</span> = <span class="string">&quot;/root/k8s/admin.key&quot;</span></span><br><span class="line">    <span class="attr">ca_file</span> = <span class="string">&quot;/root/k8s/ca.crt&quot;</span></span><br><span class="line">    <span class="attr">bearer_token_overwrite_allowed</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">image</span> = <span class="string">&quot;debian:13&quot;</span></span><br><span class="line">    <span class="attr">namespace</span> = <span class="string">&quot;default&quot;</span></span><br><span class="line">    <span class="attr">namespace_overwrite_allowed</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    <span class="attr">namespace_per_job</span> = <span class="literal">false</span></span><br><span class="line">    <span class="attr">node_selector_overwrite_allowed</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    <span class="attr">node_tolerations_overwrite_allowed</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    <span class="attr">pod_labels_overwrite_allowed</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    <span class="attr">service_account_overwrite_allowed</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    <span class="attr">pod_annotations_overwrite_allowed</span> = <span class="string">&quot;&quot;</span></span><br><span class="line">    <span class="section">[runners.kubernetes.init_permissions_container_security_context]</span></span><br><span class="line">      <span class="section">[runners.kubernetes.init_permissions_container_security_context.capabilities]</span></span><br><span class="line">    <span class="section">[runners.kubernetes.build_container_security_context]</span></span><br><span class="line">      <span class="section">[runners.kubernetes.build_container_security_context.capabilities]</span></span><br><span class="line">    <span class="section">[runners.kubernetes.helper_container_security_context]</span></span><br><span class="line">      <span class="section">[runners.kubernetes.helper_container_security_context.capabilities]</span></span><br><span class="line">    <span class="section">[runners.kubernetes.service_container_security_context]</span></span><br><span class="line">      <span class="section">[runners.kubernetes.service_container_security_context.capabilities]</span></span><br><span class="line">    <span class="section">[runners.kubernetes.volumes]</span></span><br><span class="line">      <span class="section">[[runners.kubernetes.volumes.host_path]]</span></span><br><span class="line">        <span class="attr">name</span> = <span class="string">&quot;docker-sock&quot;</span></span><br><span class="line">        <span class="attr">mount_path</span> = <span class="string">&quot;/var/run/docker.sock&quot;</span></span><br><span class="line">        <span class="attr">host_path</span> = <span class="string">&quot;/var/run/docker.sock&quot;</span></span><br><span class="line">        <span class="attr">read_only</span> = <span class="literal">false</span></span><br><span class="line">    <span class="section">[runners.kubernetes.dns_config]</span></span><br></pre></td></tr></table></figure><p><strong>注意：Runner 配置不支持使用 kubeconfig 文件，只能使用 cert_file、key_file、ca_file 参数配置。证书需要进行 base64 编码。</strong></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> <span class="string">&quot;cert&quot;</span> | <span class="built_in">base64</span> -d &gt; admin.crt</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;在生产项目中，所有可执行文件和项目源码包都不是由开发人员在本地手动构建的，其主要原因有以下几点：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;本地构建的产物可能和线上环境不一致，可能会出现奇怪的问题。&lt;/li&gt;
&lt;li&gt;针对于微服务来说，一次发版可能会涉及到多个服务，每个服务都需要手动构建，</summary>
      
    
    
    
    <category term="运维技术" scheme="http://www.huckops.xyz/categories/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/"/>
    
    
  </entry>
  
  <entry>
    <title>高可用业务全链路监控方案</title>
    <link href="http://www.huckops.xyz/2026/02/19/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/%E9%AB%98%E5%8F%AF%E7%94%A8%E7%9B%91%E6%8E%A7%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/"/>
    <id>http://www.huckops.xyz/2026/02/19/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/%E9%AB%98%E5%8F%AF%E7%94%A8%E7%9B%91%E6%8E%A7%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/</id>
    <published>2026-02-19T15:55:14.000Z</published>
    <updated>2026-06-06T17:04:29.931Z</updated>
    
    <content type="html"><![CDATA[<h1 id="监控方案介绍"><a href="#监控方案介绍" class="headerlink" title="监控方案介绍"></a>监控方案介绍</h1><p>在 SRE 工程中，服务可观测性是一个重要的服务可靠性治理手段，其中包括但不限于：</p><ul><li>服务器资源用量监测</li><li>服务状态及可用性指标监测</li><li>服务全链路跟踪监测</li><li>服务日志监测</li><li>报警、issues 闭环管理和事件通知系统</li></ul><p>下面将分模块对服务业务全链路监控做一下规划以及介绍。</p><h1 id="服务器资源和服务状态监测工具"><a href="#服务器资源和服务状态监测工具" class="headerlink" title="服务器资源和服务状态监测工具"></a>服务器资源和服务状态监测工具</h1><p>在服务器资源和服务状态监测方面，我们经常使用 Zabbix、Prometheus、falcon 和夜莺等工具，但是在实际业务场景中更多的是使用 Prometheus，其原因有以下几点：</p><ul><li><p>**<em>Zabbix 不能很好的适用于云原生环境监控</em>**。Zabbix 其优势在于有良好的 snmp、ipmi 和 jmx 等协议支持，但是对于云原生环境其扩展性和易用性相对较差（如在 k8s 环境中，其默认只能采集 node 的性能参数，对于 pod 性能参数以及业务指标等采集完全依赖于插件二次开发）。所以 zabbix 当今更多的是被用到监控网络设备等支持 snmp 和 ipmi 的设备场景。</p></li><li><p>**<em>Falcon 和夜莺针对性过强</em>**。Falcon 和夜莺都是云原生环境下的监控工具，但是这两种监控工具都是由国内厂商定制开发后放出的开源版本，其在某些特定领域可能有非常好的适配性，但是在通用场景下可能不是最优选择。</p></li><li><p>**<em>Prometheus 是云原生环境下的标准监控工具</em>**。Prometheus 是云原生环境下的标准监控工具，其优势在于有良好的可扩展性和易用性，同时支持插件化扩展，并且支持多种服务发现和指标采集方式，基本可以满足云原生业务和传统业务的监控需求。</p></li></ul><h2 id="Prometheus-在小型生产环境的应用"><a href="#Prometheus-在小型生产环境的应用" class="headerlink" title="Prometheus 在小型生产环境的应用"></a>Prometheus 在小型生产环境的应用</h2><p>在小型生产环境中，特别在稳定性要求不是很高的场景中，通常只需要部署 Prometheus 服务端和 exporter 即可。其架构如下：</p><p><img src="https://i.imgs.ovh/2026/02/19/yB81y9.png" alt="yB81y9.png"></p><p>这是一个最简单的 prometheus 架构，在被监控节点中部署 exporter，然后 Prometheus 以 http 请求的方式从 exporter 暴露的 metrics 接口中采集指标。很明显，这样的架构在大型生产环境中，单机 Prometheus 无法满足大量 metrics 指标采集任务，所以需要设计另外的监控架构。</p><h2 id="Prometheus-主动上报模式"><a href="#Prometheus-主动上报模式" class="headerlink" title="Prometheus 主动上报模式"></a>Prometheus 主动上报模式</h2><p>在传统的 Prometheus 架构中，数据采集的压力完全被压到 Prometheus 服务端，其服务的负载会非常大，那么是否能将数据采集的压力分散到多个节点上去呢？当然可以。</p><p>Prometheus 提供了一个 PushGateway 方案，可以看这里<a href="https://github.com/prometheus/pushgateway">PushGateway</a>。在引入 PushGateway 后，数据采集架构变为如下：</p><p><img src="https://i.imgs.ovh/2026/02/19/yBeyug.png" alt="yBeyug.png"></p><p>本方式仅适用于需要自主上报指标的场景，大部分的 exporter 基本基本都不支持基于 PushGateway 的指标上报，所以侧重于使用开源 exporter 采集指标需求的场景请绕道。</p><p>本架构体系中，其原理分为以下两部分:</p><ol><li>node 上，监控插件主动以 http 的方式向 PushGateway 上报指标，如<code>echo &quot;some_metric 3.14&quot; | curl --data-binary @- http://127.0.0.1:9091/metrics/job/some_job</code>。上报后的指标由 PushGateway 进行暂存，并在 metrics 中进行暴露，如下：</li></ol><p><img src="https://i.imgs.ovh/2026/02/19/yBejKb.png" alt="yBejKb.png"></p><ol start="2"><li>Prometheus 服务端像在传统的 Prometheus 架构中一样，以 http 请求的方式从 PushGateway 暴露的 metrics 接口中采集指标。</li></ol><h3 id="不妨考虑一个问题"><a href="#不妨考虑一个问题" class="headerlink" title="不妨考虑一个问题"></a>不妨考虑一个问题</h3><p>从上面的架构介绍里可以看到，不管是传统 Prometheus 架构还是 PushGateway 架构，都完全依赖于 http 请求，我们知道 http 请求有以下一些特点：</p><ol><li>基于 TCP 协议，请求可靠，添加 https 后数据传输安全有保证。</li><li>在 http1.1 中，默认情况下，http 每发送一次请求都会和客户端构建一个连接。</li></ol><p>但是这两个特点在监控场景下也许是非常致命的。试想一下，当下游节点数量过多时，即便 PushGateway 支持指标批量上报，在高并发上报场景下仍会出现连接数、文件描述符、处理队列等资源瓶颈，进而导致指标上报超时、丢弃甚至服务不可用。</p><p>已知在大部分场景下，监控数据的完整程度可能不那么重要（比如采样周期为 1min 的监控指标，可能丢失 1-2min 的监控数据或者数据有小范围的延迟也是可以容忍的，对全链路的观测性也是没要下降的），所以，我们可以把数据上报的方式用一种高效但是不那么可靠的方式——UDP。</p><p>试想一下如下的监控架构设计：</p><p><img src="https://i.imgs.ovh/2026/02/19/yBMU0q.png" alt="yBMU0q.png"></p><p>节点通过 UDP 的方式向中间服务报送数据，UDP Listener 接收到数据后写入到队列中，Consumer 消费队列中的数据，然后将数据写入到缓存中，再由一个 exporter 从缓存中读取数据，暴露为 metrics 接口。同时使用 Kubernetes Operator 定义一个 CRD，当队列中 Lag 超过阈值时自动增加 Consumer 的数量，保证监控数据即时落库。并配合 HPA，动态对 UDP Listener 进行扩缩容。</p><p>当然如果监控数据的完整度比较重要时，可以将 UDP Listener 换成类似 PushGateway 的 HTTP Listener。但是本方案的核心思想就是，数据上报和数据暴露分离，以确保数据代理节点的性能以及可靠性。</p><p>本方案目前无开源方案可用，运维团队可以根据自己的实际需求进行定制开发。</p><h2 id="Prometheus-被动模式"><a href="#Prometheus-被动模式" class="headerlink" title="Prometheus 被动模式"></a>Prometheus 被动模式</h2><p>其实在生产环境中更常用的是被动采集模式（就是 Prometheus+exporter 的模式），但是在大型生产环境中明显不能用 Prometheus 的经典架构，通常会引入 Prometheus 的联邦集群来采集压力分散，其架构图如下：</p><p><img src="https://i.imgs.ovh/2026/02/19/yBMWMO.png" alt="yBMWMO.png"></p><p>通俗的理解就是，将集群划分为 Region，每个 Region 中分别部署一套 Prometheus，从本 Region 中 node 上的 exporter 中采集指标。单独搭建一套核心 Prometheus，负责从所有 Region 的 Prometheus 中采集指标并汇总。</p><p>本方案的优点在于：</p><ol><li><p>分 Region 部署，单 Region 的 Prometheus 出现故障时不会影响到全局异常。同时如果核心 Prometheus 出现故障时，扔可以直接从 Region 的 Prometheus 中查看本 Region 的监控数据。</p></li><li><p>核心 Prometheus 的指标采集压力分散到多个节点上去，虽然增加了监控的运维成本，但是保证了核心 Prometheus 的性能。</p></li></ol><h2 id="自动发现"><a href="#自动发现" class="headerlink" title="自动发现"></a>自动发现</h2><p>在传统的 Prometheus 架构中，exporter 是需要注册到 prometheus.yaml 文件中的，虽然 Prometheus 目前已经支持了配置热重载，但是当大量节点需要导入时处理起来也是非常麻烦的。看到 Prometheus 官方文档有说到支持注册中心的目标群组，我们按照本方案进行一下自动化方案研讨。</p><p>目前开源的主流监控 exporter 基本都是不支持自动注册的，所以我们需要在 exporter 启动之前进行一次注册，可以直接使用以下脚本一键启动（以 node_exporter 为例）</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#!/bin/bash</span></span><br><span class="line"><span class="built_in">set</span> -euo pipefail</span><br><span class="line"></span><br><span class="line">CONSUL_ADDR=<span class="string">&quot;http://localhost:8500&quot;</span></span><br><span class="line">SERVICE_NAME=<span class="string">&quot;node_exporter&quot;</span></span><br><span class="line">SERVICE_TAGS=<span class="string">&quot;node_exporter,prometheus,auto-deploy&quot;</span></span><br><span class="line">CHECK_INTERVAL=<span class="string">&quot;10s&quot;</span></span><br><span class="line">METRICS_PATH=<span class="string">&quot;/metrics&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">log</span></span>() &#123;</span><br><span class="line">    <span class="built_in">echo</span> <span class="string">&quot;[<span class="subst">$(date +&#x27;%Y-%m-%d %H:%M:%S&#x27;)</span>] <span class="variable">$1</span>&quot;</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">error_exit</span></span>() &#123;</span><br><span class="line">    <span class="built_in">log</span> <span class="string">&quot;ERROR: <span class="variable">$1</span>&quot;</span></span><br><span class="line">    <span class="built_in">exit</span> 1</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">check_command</span></span>() &#123;</span><br><span class="line">    <span class="keyword">if</span> ! <span class="built_in">command</span> -v <span class="string">&quot;<span class="variable">$1</span>&quot;</span> &amp;&gt; /dev/null; <span class="keyword">then</span></span><br><span class="line">        error_exit <span class="string">&quot;命令 <span class="variable">$1</span> 未找到，请先安装&quot;</span></span><br><span class="line">    <span class="keyword">fi</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">check_command <span class="string">&quot;wget&quot;</span></span><br><span class="line">check_command <span class="string">&quot;curl&quot;</span></span><br><span class="line">check_command <span class="string">&quot;tar&quot;</span></span><br><span class="line">check_command <span class="string">&quot;awk&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> [ <span class="variable">$#</span> -ne 2 ]; <span class="keyword">then</span></span><br><span class="line">    error_exit <span class="string">&quot;使用方式: <span class="variable">$0</span> &lt;版本号&gt; &lt;端口号&gt;</span></span><br><span class="line"><span class="string">    示例: <span class="variable">$0</span> 1.8.2 9100&quot;</span></span><br><span class="line"><span class="keyword">fi</span></span><br><span class="line"></span><br><span class="line">VERSION=<span class="string">&quot;<span class="variable">$1</span>&quot;</span></span><br><span class="line">PORT=<span class="string">&quot;<span class="variable">$2</span>&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> ! [[ <span class="string">&quot;<span class="variable">$PORT</span>&quot;</span> =~ ^[0-9]+$ ]]; <span class="keyword">then</span></span><br><span class="line">    error_exit <span class="string">&quot;端口号必须是数字，你输入的是: <span class="variable">$PORT</span>&quot;</span></span><br><span class="line"><span class="keyword">fi</span></span><br><span class="line"></span><br><span class="line">IP=$(hostname -I | awk <span class="string">&#x27;&#123;for(i=1;i&lt;=NF;i++)&#123;if($i!~/^127/ &amp;&amp; $i!~/^172\.17/)&#123;print $i;exit&#125;&#125;&#125;&#x27;</span>)</span><br><span class="line"><span class="keyword">if</span> [ -z <span class="string">&quot;<span class="variable">$IP</span>&quot;</span> ]; <span class="keyword">then</span></span><br><span class="line">    error_exit <span class="string">&quot;无法获取本机有效IP地址&quot;</span></span><br><span class="line"><span class="keyword">fi</span></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;获取到本机IP: <span class="variable">$IP</span>&quot;</span></span><br><span class="line"></span><br><span class="line">SERVICE_ID=<span class="string">&quot;<span class="variable">$&#123;SERVICE_NAME&#125;</span>_<span class="variable">$&#123;IP&#125;</span>_<span class="variable">$&#123;PORT&#125;</span>&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;清理旧文件...&quot;</span></span><br><span class="line"><span class="built_in">rm</span> -rf prometheus-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64.tar.gz prometheus-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64 node_exporter-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64</span><br><span class="line"></span><br><span class="line">DOWNLOAD_URL=<span class="string">&quot;https://gh-proxy.com/https://github.com/prometheus/node_exporter/releases/download/v<span class="variable">$&#123;VERSION&#125;</span>/node_exporter-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64.tar.gz&quot;</span></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;开始下载node_exporter v<span class="variable">$&#123;VERSION&#125;</span>...&quot;</span></span><br><span class="line">wget --quiet --show-progress <span class="string">&quot;<span class="variable">$DOWNLOAD_URL</span>&quot;</span> -O <span class="string">&quot;node_exporter-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64.tar.gz&quot;</span> || error_exit <span class="string">&quot;下载失败，请检查版本号是否正确&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;解压安装包...&quot;</span></span><br><span class="line">tar -zxf <span class="string">&quot;node_exporter-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64.tar.gz&quot;</span> || error_exit <span class="string">&quot;解压失败&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;启动node_exporter，端口: <span class="variable">$PORT</span>...&quot;</span></span><br><span class="line"><span class="built_in">cd</span> <span class="string">&quot;node_exporter-<span class="variable">$&#123;VERSION&#125;</span>.linux-amd64&quot;</span> || error_exit <span class="string">&quot;进入目录失败&quot;</span></span><br><span class="line"><span class="built_in">nohup</span> ./node_exporter --web.listen-address <span class="string">&quot;:<span class="variable">$&#123;PORT&#125;</span>&quot;</span> &gt; /tmp/node_exporter_<span class="variable">$&#123;PORT&#125;</span>.<span class="built_in">log</span> 2&gt;&amp;1 &amp;</span><br><span class="line"><span class="built_in">sleep</span> 2</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> ! curl -s <span class="string">&quot;http://<span class="variable">$&#123;IP&#125;</span>:<span class="variable">$&#123;PORT&#125;</span><span class="variable">$&#123;METRICS_PATH&#125;</span>&quot;</span> &gt; /dev/null; <span class="keyword">then</span></span><br><span class="line">    error_exit <span class="string">&quot;node_exporter启动失败，请检查日志: /tmp/node_exporter_<span class="variable">$&#123;PORT&#125;</span>.log&quot;</span></span><br><span class="line"><span class="keyword">fi</span></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;node_exporter启动成功&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">log</span> <span class="string">&quot;注册服务到Consul: <span class="variable">$SERVICE_ID</span>&quot;</span></span><br><span class="line">curl -s -X PUT <span class="string">&quot;<span class="variable">$&#123;CONSUL_ADDR&#125;</span>/v1/agent/service/register&quot;</span> \</span><br><span class="line">  -H <span class="string">&quot;Content-Type: application/json&quot;</span> \</span><br><span class="line">  -d <span class="string">&#x27;&#123;</span></span><br><span class="line"><span class="string">    &quot;ID&quot;: &quot;&#x27;</span><span class="string">&quot;<span class="variable">$&#123;SERVICE_ID&#125;</span>&quot;</span><span class="string">&#x27;&quot;,</span></span><br><span class="line"><span class="string">    &quot;Name&quot;: &quot;&#x27;</span><span class="string">&quot;<span class="variable">$&#123;SERVICE_NAME&#125;</span>&quot;</span><span class="string">&#x27;&quot;,</span></span><br><span class="line"><span class="string">    &quot;Address&quot;: &quot;&#x27;</span><span class="string">&quot;<span class="variable">$&#123;IP&#125;</span>&quot;</span><span class="string">&#x27;&quot;,</span></span><br><span class="line"><span class="string">    &quot;Port&quot;: &#x27;</span><span class="string">&quot;<span class="variable">$&#123;PORT&#125;</span>&quot;</span><span class="string">&#x27;,</span></span><br><span class="line"><span class="string">    &quot;Tags&quot;: [&quot;&#x27;</span><span class="string">&quot;<span class="variable">$&#123;SERVICE_TAGS&#125;</span>&quot;</span><span class="string">&#x27;&quot;],</span></span><br><span class="line"><span class="string">    &quot;Check&quot;: &#123;</span></span><br><span class="line"><span class="string">      &quot;HTTP&quot;: &quot;http://&#x27;</span><span class="string">&quot;<span class="variable">$&#123;IP&#125;</span>&quot;</span><span class="string">&#x27;:&#x27;</span><span class="string">&quot;<span class="variable">$&#123;PORT&#125;</span>&quot;</span><span class="string">&#x27;&#x27;</span><span class="string">&quot;<span class="variable">$&#123;METRICS_PATH&#125;</span>&quot;</span><span class="string">&#x27;&quot;,</span></span><br><span class="line"><span class="string">      &quot;Interval&quot;: &quot;&#x27;</span><span class="string">&quot;<span class="variable">$&#123;CHECK_INTERVAL&#125;</span>&quot;</span><span class="string">&#x27;&quot;,</span></span><br><span class="line"><span class="string">      &quot;Timeout&quot;: &quot;5s&quot;</span></span><br><span class="line"><span class="string">    &#125;</span></span><br><span class="line"><span class="string">  &#125;&#x27;</span> || error_exit <span class="string">&quot;注册到Consul失败，请检查Consul是否运行&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> curl -s <span class="string">&quot;<span class="variable">$&#123;CONSUL_ADDR&#125;</span>/v1/agent/service/<span class="variable">$&#123;SERVICE_ID&#125;</span>&quot;</span> &gt; /dev/null; <span class="keyword">then</span></span><br><span class="line">    <span class="built_in">log</span> <span class="string">&quot;✅ 全部操作完成！</span></span><br><span class="line"><span class="string">    - node_exporter版本: v<span class="variable">$&#123;VERSION&#125;</span></span></span><br><span class="line"><span class="string">    - 运行地址: http://<span class="variable">$&#123;IP&#125;</span>:<span class="variable">$&#123;PORT&#125;</span><span class="variable">$&#123;METRICS_PATH&#125;</span></span></span><br><span class="line"><span class="string">    - Consul服务ID: <span class="variable">$&#123;SERVICE_ID&#125;</span></span></span><br><span class="line"><span class="string">    - 可通过Prometheus的consul_sd发现该实例&quot;</span></span><br><span class="line"><span class="keyword">else</span></span><br><span class="line">    error_exit <span class="string">&quot;Consul注册验证失败&quot;</span></span><br><span class="line"><span class="keyword">fi</span></span><br></pre></td></tr></table></figure><p>在 prometheus 中修改配置使用 consul 自动发现：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">scrape_configs:</span><br><span class="line">  - job_name: &quot;node_exporter&quot;</span><br><span class="line">    metrics_path: &quot;/metrics&quot;</span><br><span class="line">    consul_sd_configs:</span><br><span class="line">      - server: &quot;127.0.0.1:8500&quot;</span><br><span class="line">        services: [&quot;node_exporter&quot;]</span><br><span class="line">        refresh_interval: 30s</span><br><span class="line">    relabel_configs:</span><br><span class="line">      - source_labels: [__meta_consul_metadata_dc]</span><br><span class="line">        target_label: dc</span><br><span class="line">      - source_labels: [__meta_consul_address, __meta_consul_service_port]</span><br><span class="line">        separator: &quot;:&quot;</span><br><span class="line">        target_label: instance</span><br></pre></td></tr></table></figure><p>在脚本运行时，首先会下载对应版本的 exporter，并且确保服务拉起后将服务注册到 consul 中，Prometheus 自动从 consul 中获取监控列表并进行采集。</p><p>针对于自研的 exporter 中，我们完全可以在插件中进行侵入式注册以确保全生命周期的管理（懒癌犯了，代码是 AI 生成的，如果有问题各位自行修正）：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line">SERVICE_ID = <span class="string">&quot;&quot;</span></span><br><span class="line">CONSUL_HOST = <span class="string">&quot;localhost&quot;</span></span><br><span class="line">CONSUL_PORT = <span class="number">8500</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">get_local_ip</span>():</span><br><span class="line">    <span class="string">&quot;&quot;&quot;获取本机有效IP&quot;&quot;&quot;</span></span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)</span><br><span class="line">        s.connect((<span class="string">&quot;8.8.8.8&quot;</span>, <span class="number">80</span>))</span><br><span class="line">        ip = s.getsockname()[<span class="number">0</span>]</span><br><span class="line">        s.close()</span><br><span class="line">        <span class="keyword">return</span> ip</span><br><span class="line">    <span class="keyword">except</span> Exception:</span><br><span class="line">        <span class="keyword">return</span> <span class="string">&quot;127.0.0.1&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">register_service</span>(<span class="params">service_name, service_port</span>):</span><br><span class="line">    <span class="string">&quot;&quot;&quot;注册服务&quot;&quot;&quot;</span></span><br><span class="line">    <span class="keyword">global</span> SERVICE_ID</span><br><span class="line">    local_ip = get_local_ip()</span><br><span class="line">    SERVICE_ID = <span class="string">f&quot;<span class="subst">&#123;service_name&#125;</span>_<span class="subst">&#123;local_ip&#125;</span>_<span class="subst">&#123;service_port&#125;</span>&quot;</span></span><br><span class="line"></span><br><span class="line">    c = consul.Consul(host=CONSUL_HOST, port=CONSUL_PORT)</span><br><span class="line">    check = consul.Check.http(</span><br><span class="line">        url=<span class="string">f&quot;http://<span class="subst">&#123;local_ip&#125;</span>:<span class="subst">&#123;service_port&#125;</span>/metrics&quot;</span>,</span><br><span class="line">        interval=<span class="string">&quot;10s&quot;</span>,</span><br><span class="line">        timeout=<span class="string">&quot;5s&quot;</span></span><br><span class="line">    )</span><br><span class="line">    c.agent.service.register(</span><br><span class="line">        name=service_name,</span><br><span class="line">        service_id=SERVICE_ID,</span><br><span class="line">        address=local_ip,</span><br><span class="line">        port=service_port,</span><br><span class="line">        tags=[<span class="string">&quot;prometheus&quot;</span>, service_name],</span><br><span class="line">        check=check</span><br><span class="line">    )</span><br><span class="line">    <span class="built_in">print</span>(<span class="string">f&quot;✅ 服务注册成功（ID: <span class="subst">&#123;SERVICE_ID&#125;</span>）&quot;</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">deregister_service</span>():</span><br><span class="line">    <span class="string">&quot;&quot;&quot;注销服务&quot;&quot;&quot;</span></span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> SERVICE_ID:</span><br><span class="line">        <span class="built_in">print</span>(<span class="string">&quot;⚠️  无已注册的服务ID，无需注销&quot;</span>)</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        c = consul.Consul(host=CONSUL_HOST, port=CONSUL_PORT)</span><br><span class="line">        c.agent.service.deregister(SERVICE_ID)</span><br><span class="line">        <span class="built_in">print</span>(<span class="string">f&quot;✅ 服务 <span class="subst">&#123;SERVICE_ID&#125;</span> 注销成功&quot;</span>)</span><br><span class="line">    <span class="keyword">except</span> Exception <span class="keyword">as</span> e:</span><br><span class="line">        <span class="built_in">print</span>(<span class="string">f&quot;❌ 注销服务失败: <span class="subst">&#123;e&#125;</span>&quot;</span>)</span><br></pre></td></tr></table></figure><h2 id="高可用"><a href="#高可用" class="headerlink" title="高可用"></a>高可用</h2><p>试想一个问题，Prometheus 的设计模式中，虽然可以实现服务与存储隔离强行实现分布式部署（如存储可以使用 influxdb 等时序性数据库），但是如果真的将 Prometheus 改成分布式，则会出现以下一些问题：</p><ol><li>多个 Prometheus 服务端会执行一样的采集任务，可能会导致数据库中的指标数据重复。且如果用 influxdb 等中间件存储数据，可能会一个新的可用性工程层。</li><li>如果使用分片法让 Prometheus 采集指定分片的数据，那么某一节点故障一定会导致整个分片数据无法收集。</li></ol><p>基于以上一些痛点，我们需要找一个能支持高可用的集群方案。</p><p>Thanos 是一个高可用的 Prometheus 集群方案，本质上来说是在 Prometheus 的上层实现了一个 sidecar，通过 sidecar 的方式优化了数据可用性与稳定性。其原理如下：</p><ol><li>Prometheus 的指标数据与中心节点分离，上传到 S3 存储中，确保数据不丢失。</li><li>多 Promethues 节点，数据仍会重复采集，但是在 Thanos 查询时会做去重。所以是以在存储用量上，遵循 N(节点数)*S(单机指标量)。</li></ol><p>如果对存储成本比较介意的话，也可以使用<a href="https://grafana.com/docs/mimir">Grafana Mimir</a>。</p><p>Mimir 是 Prometheus 的一个远程存储的指标数据库，在多 Prometheus 副本的环境中，Mimir 会接收所有 Prometheus 的数据，只选取一个 active 的节点存入数据库，如果单点发生故障时，Mimir 监听数据上报时间超时，则会从其他节点中重新选出一个 active，并接收数据。这其实和主从集群有一些类似，但是和主从的区别在于，主从只有主才会写，而 mimir 是都会写。</p><h1 id="服务性能全链路观测"><a href="#服务性能全链路观测" class="headerlink" title="服务性能全链路观测"></a>服务性能全链路观测</h1><p>在业务运营的过程中，往往需要一种手段对业务代码全链路的性能进行观测，包括但不限于：</p><ul><li>服务调用链路</li><li>数据库查询链路</li><li>缓存查询链路</li><li>消息队列查询路</li><li>文件系统查询路</li><li>网络查询链路</li></ul><p>目前最主流的方案是使用 Opentelemtry 的 SDK，并配合 Jaeger 等全链路观测平台进行全链路追踪。目前类似的追踪平台主要的数据上报方式都是 gRPC 和 HTTP，显然在大流量场景下有很大的瓶颈（如上报量太大时 Jaeger 阻塞，导致 span 上传失败）。</p><h2 id="解决并发问题的终极奥义就是解耦"><a href="#解决并发问题的终极奥义就是解耦" class="headerlink" title="解决并发问题的终极奥义就是解耦"></a>解决并发问题的终极奥义就是解耦</h2><p>不管是 gRPC 还是 HTTP，虽然对于 SDK 来说都是异步上传，不阻塞主进程的，但是针对于 Jaeger 来说却都是同步请求。所以这里就需要将 Jaeger 收到的同步请求也转换为异步的。</p><p>解耦的第一步应该想到的就是队列，这里也不例外。可以将 Span 信息全部打到队列中，然后由一个 Consumer 异步消费 Span 并上报到 Jaeger 的 Collector 中。通过一些自定义 CRD 等的手段实现队列 Lag 和 Collector 性能监测和自动扩缩容。</p><p>当然，在服务应用本体直接对接到消息队列也不是一种很优雅的方式。其实我们可以基于 Opentelemtry 的 SDK 封装一套自用 SDK，将 Span 都以日志的形式落盘，然后再由 Filebeat 等组件将日志上传到消息队列中，其优势有以下几点：</p><ol><li>多层次解耦，相当于给 Jaeger 加了多级缓存，保证 Jaeger 收集到的数据完整性和稳定性。</li><li>Span 落盘，Trace 可回放。</li></ol><p>其服务架构如下：<br><img src="https://i.mji.rip/2026/02/20/f8c506e0d0816db21ff62593dc8c3b94.png" alt="f8c506e0d0816db21ff62593dc8c3b94.png"></p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;监控方案介绍&quot;&gt;&lt;a href=&quot;#监控方案介绍&quot; class=&quot;headerlink&quot; title=&quot;监控方案介绍&quot;&gt;&lt;/a&gt;监控方案介绍&lt;/h1&gt;&lt;p&gt;在 SRE 工程中，服务可观测性是一个重要的服务可靠性治理手段，其中包括但不限于：&lt;/p&gt;
&lt;ul&gt;
&lt;li</summary>
      
    
    
    
    <category term="运维技术" scheme="http://www.huckops.xyz/categories/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/"/>
    
    
  </entry>
  
  <entry>
    <title>短链接系统设计</title>
    <link href="http://www.huckops.xyz/2026/02/06/%E5%BC%80%E6%BA%90%E4%BB%A3%E7%A0%81/%E7%9F%AD%E9%93%BE%E6%8E%A5%E7%B3%BB%E7%BB%9F%E8%AE%BE%E8%AE%A1/"/>
    <id>http://www.huckops.xyz/2026/02/06/%E5%BC%80%E6%BA%90%E4%BB%A3%E7%A0%81/%E7%9F%AD%E9%93%BE%E6%8E%A5%E7%B3%BB%E7%BB%9F%E8%AE%BE%E8%AE%A1/</id>
    <published>2026-02-06T09:53:00.000Z</published>
    <updated>2026-06-06T17:04:29.931Z</updated>
    
    <content type="html"><![CDATA[<h2 id="系统背景"><a href="#系统背景" class="headerlink" title="系统背景"></a>系统背景</h2><blockquote><p>本系统的开发目的源于一道经典的开发面试题——设计一个短链接系统。本文将从多维度、多角度对其进行剖析。</p></blockquote><p>短链接系统几乎是每个互联网公司营销业务线必备的跳转平台，其核心优势在于能最大限度压缩链接长度，对于营销短信、邮件而言，不仅具备良好的匿名性，还能提升视觉友好度。例如以下这条短链接：</p><p><img src="https://i.imgs.ovh/2026/02/06/y4pY9e.jpeg" alt="y4pY9e.jpeg"></p><p>单从短信中的链接来看，其 URI 看似一段乱码，但仔细分析会发现，这段 URI 由 8 个字符组成。若按每个字符占 8bit 计算，整个 URI 的信息容量恰好为 64bit，这绝非巧合。</p><h2 id="从现象反推原理"><a href="#从现象反推原理" class="headerlink" title="从现象反推原理"></a>从现象反推原理</h2><p>访问上述短链接会发现，浏览器会执行一次跳转操作：</p><p><img src="https://i.imgs.ovh/2026/02/06/y4pN0c.png" alt="y4pN0c.png"></p><p>也就是说，当用户访问短链接时，短链接服务的后端会根据 URI 末尾那段看似乱码的字符串，查询对应的原始长链接，随后返回 302 状态码，引导浏览器跳转到目标地址。从表面上看，其原理与普通互联网项目并无太大差异，核心都是“查询数据库 → 返回结果”，但这段“乱码”真的是后端随机生成的吗？如何确保这个随机字符串的唯一性？</p><h2 id="永不重复的-ID"><a href="#永不重复的-ID" class="headerlink" title="永不重复的 ID"></a>永不重复的 ID</h2><p>在 MySQL 等传统数据库中，通常会使用主键 ID 唯一标识一条数据，其核心优势如下：</p><ol><li><p>ID 通常由数据库自动生成，借助事务机制可确保其唯一性；</p></li><li><p>ID 呈线性增长，当需要通过 ID 进行游标查询时（表述可能不够精准），游标可快速定位到指定位置，进而查询后续数据（即具备可排序性）；</p></li><li><p>可基于 ID 对数据进行排序操作。</p></li></ol><p>对于 MySQL 而言，常用的最大数字类型是 bigint，其最大值为 9223372036854775807，占用空间恰好为 64bit，与前文提到的短链接 URI 信息容量完全对应。但如果直接将 bigint 作为短链接的编码，会给业务带来诸多瓶颈：</p><ol><li><p>ID 生成完全依赖数据库，在分布式、高并发场景下，数据库会直接承受巨大压力；</p></li><li><p>ID 线性增长的特性，可能导致黑客通过起始 ID 遍历所有短链接，存在安全隐患。</p></li></ol><h3 id="眼光投向-MongoDB"><a href="#眼光投向-MongoDB" class="headerlink" title="眼光投向 MongoDB"></a>眼光投向 MongoDB</h3><p>我们知道，MongoDB 是一款分布式 NoSQL 数据库，其分布式特性决定了它在高并发场景下的稳定性与可扩展性。MongoDB 中数据的主键是 ObjectID，查阅官方文档可知，该 ID 具有明确的结构特性：</p><ul><li><p>时间戳（4 字节）：记录 ObjectID 的创建时间，单位为秒；</p></li><li><p>机器标识符（3 字节）：标识生成 ObjectID 的机器，前两字节为网络字节序的机器 ID，后一字节为进程 ID；</p></li><li><p>计数器（2 字节）：在同一台机器、同一进程中生成新 ObjectID 时，计数器会自动递增；</p></li><li><p>随机数（3 字节）：增加随机性，降低 ID 冲突的概率。</p></li></ul><p>本质上，ObjectID 是一段字符串的拼接。从结构不难看出，该 ID 由分片节点（shard）生成——相较于 MySQL，MongoDB 将 ID 生成的工作转移到了 shard 节点，有效降低了单节点的压力。从理论上来说，短链接系统完全可以基于 MongoDB 实现。</p><p>但在工业级实践中，短链接系统很少使用 MongoDB 存储数据，核心原因如下：</p><ol><li><p>ObjectID 长度过长，相较于对 bigint 进行编码的方式，MongoDB 会占用更多存储空间，增加成本；</p></li><li><p>MongoDB 的核心优势在于表扩展、集群扩展等全方位可扩展性，但短链接系统通常只需要集群扩展能力，无需过度依赖其全量扩展特性。</p></li></ol><p>综合以上分析，我们可以借鉴 ObjectID 的生成思路，设计一种适配分布式场景的 ID 生成方案。</p><h3 id="SnowFlake（雪花算法）"><a href="#SnowFlake（雪花算法）" class="headerlink" title="SnowFlake（雪花算法）"></a>SnowFlake（雪花算法）</h3><p>雪花算法的 ID 结构与 ObjectID 类似，具体结构如下：</p><p><img src="https://s3.huckops.xyz/1780764505764.png" alt="1780764505764.png"></p><p>雪花算法的经典之处在于，其 ID 结构可根据实际业务需求灵活定制；同时，雪花 ID 的生成过程通常由分布式业务节点自行完成，这就保证了高并发场景下 ID 生成的性能。</p><p>雪花算法的原理可类比于令牌桶算法：例如某业务系统使用令牌桶算法对单节点进行限流，令牌桶算法会在每个时间窗口的第一次调用时，向桶中存入对应数量的令牌；该时间窗口内的所有请求，需从桶中获取令牌后才能访问后端服务。当桶中令牌耗尽时，后续请求要么排队等待下一个时间窗口，要么直接被拒绝。</p><p>雪花算法的原理与之类似：每台机器可看作通过自身的机器 ID，维护一个专属令牌桶；以时间戳作为时间窗口，雪花 ID 中的“序号”可理解为令牌桶中的令牌。当服务生成雪花 ID 时，在同一个时间窗口内每生成一个 ID，序号就会加 1（此处需注意线程安全问题——序号的递增操作需引入线程锁，防止高并发场景下出现序号复用。对于 Go、C++等语言而言，线程锁的耗时基本在微秒级，通常不会影响业务基线性能）。当时间窗口累加溢出时，生成 ID 的请求会暂时挂起，等待下一个时间窗口再继续获取。</p><p>使用雪花算法的核心优势如下：</p><ol><li><p>ID 具有结构化特性，结合算法本身的设计，基本不会出现 ID 冲突的情况；</p></li><li><p>ID 本质是一个 int64 类型的数据，整体呈线性增长，但并非连续增长——这就避免了黑客通过起始 ID 遍历所有短链接的风险，同时数据库也可基于该 ID 对数据进行排序操作。</p></li></ol><h3 id="那段乱码怎么解释"><a href="#那段乱码怎么解释" class="headerlink" title="那段乱码怎么解释"></a>那段乱码怎么解释</h3><p>雪花算法生成的是一个 int64 类型的长整数，这种结构化生成的 ID 通常长度较长。因此，在短链接系统中，通常会使用 base62 等编码算法，将 int64 类型的 ID 转换为 8 位字符串（需注意：转换后的字符串长度缩短，但存储空间并未减少，该字符串的大小仍为 8×8=64bit）。这就是短链接 URI 的最终形态。</p><h3 id="直接上代码"><a href="#直接上代码" class="headerlink" title="直接上代码"></a>直接上代码</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">package</span> snowflake</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"></span><br><span class="line">        <span class="string">&quot;errors&quot;</span></span><br><span class="line"></span><br><span class="line">        <span class="string">&quot;shoturl/config&quot;</span></span><br><span class="line"></span><br><span class="line">        <span class="string">&quot;sync&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="string">&quot;time&quot;</span></span><br><span class="line"></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">const</span> (</span><br><span class="line"></span><br><span class="line">        timestampBits = <span class="number">41</span></span><br><span class="line"></span><br><span class="line">        machineIdBits = <span class="number">10</span></span><br><span class="line"></span><br><span class="line">        sequenceBits  = <span class="number">12</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        maxMachineId   = <span class="number">-1</span> ^ (<span class="number">-1</span> &lt;&lt; machineIdBits)</span><br><span class="line"></span><br><span class="line">        maxSequence    = <span class="number">-1</span> ^ (<span class="number">-1</span> &lt;&lt; sequenceBits)</span><br><span class="line"></span><br><span class="line">        timestampShift = sequenceBits + machineIdBits</span><br><span class="line"></span><br><span class="line">        machineIdShift = sequenceBits</span><br><span class="line"></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Snowflake <span class="keyword">struct</span> &#123;</span><br><span class="line"></span><br><span class="line">        mutex         sync.Mutex</span><br><span class="line"></span><br><span class="line">        lastTimestamp <span class="type">int64</span></span><br><span class="line"></span><br><span class="line">        machineId     <span class="type">int64</span></span><br><span class="line"></span><br><span class="line">        sequence      <span class="type">int64</span></span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> (</span><br><span class="line"></span><br><span class="line">        ErrInvalidMachineId      = errors.New(<span class="string">&quot;invalid machine id&quot;</span>)</span><br><span class="line"></span><br><span class="line">        ErrorClockMovedBackwards = errors.New(<span class="string">&quot;clock moved backwards&quot;</span>)</span><br><span class="line"></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">New</span><span class="params">()</span></span> (*Snowflake, <span class="type">error</span>) &#123;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> config.Cfg.MachineID &lt; <span class="number">0</span> || config.Cfg.MachineID &gt; maxMachineId &#123;</span><br><span class="line"></span><br><span class="line">                <span class="keyword">return</span> <span class="literal">nil</span>, ErrInvalidMachineId</span><br><span class="line"></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> &amp;Snowflake&#123;</span><br><span class="line"></span><br><span class="line">  lastTimestamp: <span class="number">-1</span>,</span><br><span class="line"></span><br><span class="line">                machineId:     config.Cfg.MachineID,</span><br><span class="line"></span><br><span class="line">                sequence:      <span class="number">0</span>,</span><br><span class="line"></span><br><span class="line">        &#125;, <span class="literal">nil</span></span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Snowflake)</span></span> NextId() (<span class="type">int64</span>, <span class="type">error</span>) &#123;</span><br><span class="line"></span><br><span class="line">        s.mutex.Lock()</span><br><span class="line"></span><br><span class="line">        <span class="keyword">defer</span> s.mutex.Unlock()</span><br><span class="line"></span><br><span class="line">        timestamp := time.Now().UnixMilli()</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> timestamp &lt; s.lastTimestamp &#123;</span><br><span class="line"></span><br><span class="line">                <span class="keyword">return</span> <span class="number">0</span>, ErrorClockMovedBackwards</span><br><span class="line"></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> timestamp == s.lastTimestamp &#123;</span><br><span class="line"></span><br><span class="line">  s.sequence = (s.sequence + <span class="number">1</span>) &amp; maxSequence</span><br><span class="line"></span><br><span class="line">                <span class="keyword">if</span> s.sequence == <span class="number">0</span> &#123;</span><br><span class="line"></span><br><span class="line">                        <span class="keyword">for</span> timestamp &lt;= s.lastTimestamp &#123;</span><br><span class="line"></span><br><span class="line">                                timestamp = time.Now().UnixMilli()</span><br><span class="line"></span><br><span class="line">                        &#125;</span><br><span class="line"></span><br><span class="line">                &#125;</span><br><span class="line"></span><br><span class="line"> &#125; <span class="keyword">else</span> &#123;</span><br><span class="line"></span><br><span class="line">                s.sequence = <span class="number">0</span></span><br><span class="line"></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        s.lastTimestamp = timestamp</span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> (timestamp &lt;&lt; timestampShift) | (s.machineId &lt;&lt; machineIdShift) | s.sequence, <span class="literal">nil</span></span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">const</span> base62Chars = <span class="string">&quot;0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">const</span> Base62Length = <span class="number">62</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Snowflake)</span></span> NextBase62Id() (<span class="type">string</span>, <span class="type">error</span>) &#123;</span><br><span class="line"></span><br><span class="line">        id, err := s.NextId()</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"></span><br><span class="line">                <span class="keyword">return</span> <span class="string">&quot;&quot;</span>, err</span><br><span class="line"></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        <span class="keyword">var</span> buf []<span class="type">byte</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        num := id</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        <span class="keyword">for</span> num &gt; <span class="number">0</span> &#123;</span><br><span class="line"></span><br><span class="line">                index := num % Base62Length</span><br><span class="line"></span><br><span class="line">                buf = <span class="built_in">append</span>(buf, base62Chars[index])</span><br><span class="line"></span><br><span class="line">                num = num / Base62Length</span><br><span class="line"></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        <span class="keyword">for</span> i, j := <span class="number">0</span>, <span class="built_in">len</span>(buf)<span class="number">-1</span>; i &lt; j; i, j = i+<span class="number">1</span>, j<span class="number">-1</span> &#123;</span><br><span class="line"></span><br><span class="line">                buf[i], buf[j] = buf[j], buf[i]</span><br><span class="line"></span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> <span class="type">string</span>(buf), <span class="literal">nil</span></span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><h2 id="大流量访问"><a href="#大流量访问" class="headerlink" title="大流量访问"></a>大流量访问</h2><p>在短链接系统中，访问短链接的请求量通常远高于创建短链接的请求量。因此，若用户每次访问短链接都直接查询数据库，无疑是一种低效且不合理的做法。此时，我们很容易想到在数据库前增加一层缓存，但换个角度思考：这种缓存方案真的可靠吗？</p><p>例如，当黑客漫无目的地扫描短链接系统漏洞时，可能会发现：访问不存在的短链接时，请求响应速度明显慢于访问正常短链接。由此可轻易推断出，系统存在缓存击穿的情况——即不存在的短链接会直接穿透缓存，查询数据库。基于这一漏洞，黑客可通过批量请求不存在的短链接，发起攻击，最终导致数据库崩溃。</p><h3 id="怎么快速拦截不存在的短链接"><a href="#怎么快速拦截不存在的短链接" class="headerlink" title="怎么快速拦截不存在的短链接"></a>怎么快速拦截不存在的短链接</h3><p>提到“快速拦截”，我们首先想到的依然是缓存。最基础的解决方案如下：</p><blockquote><p>在生成短链接时，直接将短链接的 ID 存入缓存（无论是使用 set 类型还是 string 类型均可）；当有访问请求时，先查询缓存，若缓存中不存在该 ID，则直接返回错误，拒绝跳转。</p></blockquote><p>这种方案虽能解决缓存击穿问题，但在用户量庞大、短链接数量极多的场景下，会占用大量内存空间，显然不符合实际应用需求。</p><h3 id="布隆过滤器"><a href="#布隆过滤器" class="headerlink" title="布隆过滤器"></a>布隆过滤器</h3><p>布隆过滤器最初由 Java 通过 bitmap 实现，其核心原理是：将所有可能存在的短链接 ID 存入一个 bitmap 中；当用户访问某个短链接时，先检查该 ID 是否存在于 bitmap 中，若不存在则直接返回错误，若存在则继续查询数据库。</p><p>关于布隆过滤器的详细实现原理，可参考<a href="https://cloud.tencent.com/developer/article/1688747">此处</a>（不想重复造轮子，就偷懒一下啦）。</p><p>值得一提的是，Redis 8.0 及以上版本中，官方已原生支持 BloomFilter（布隆过滤器）功能。开发者可直接通过<code>bf.add</code>命令向布隆过滤器中添加元素，通过<code>bf.exists</code>命令检查元素是否存在，开发友好度大幅提升。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;系统背景&quot;&gt;&lt;a href=&quot;#系统背景&quot; class=&quot;headerlink&quot; title=&quot;系统背景&quot;&gt;&lt;/a&gt;系统背景&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;本系统的开发目的源于一道经典的开发面试题——设计一个短链接系统。本文将从多维度、多角度对其进行剖析</summary>
      
    
    
    
    <category term="运维技术" scheme="http://www.huckops.xyz/categories/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/"/>
    
    
  </entry>
  
  <entry>
    <title>HuckOps的2025年度总结</title>
    <link href="http://www.huckops.xyz/2025/12/31/%E5%B9%B4%E5%BA%A6%E6%80%BB%E7%BB%93/2025%E5%B9%B4%E5%BA%A6%E6%80%BB%E7%BB%93/"/>
    <id>http://www.huckops.xyz/2025/12/31/%E5%B9%B4%E5%BA%A6%E6%80%BB%E7%BB%93/2025%E5%B9%B4%E5%BA%A6%E6%80%BB%E7%BB%93/</id>
    <published>2025-12-31T23:59:59.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<p>emmm，2025年又这么过去了。第一次写年度总结，好像应该写得正式一点，但又不知道从哪开始。随便写写吧，反正也没人看，就当是对自己这一年的碎碎念。</p><h1 id="感情的一波三折"><a href="#感情的一波三折" class="headerlink" title="感情的一波三折"></a>感情的一波三折</h1><p>和她认识到现在已经两年多了，分手也已经一年多了。分手后的这一年，我过得真有点不人不鬼。其实到现在我也没搞明白，自己为什么会变成现在这个样子。有时候连我自己都讨厌自己，一直想着改变，却又好像被捆住手脚一样，浑身无力。</p><p>今年年初开始，每天眼睛一睁，脑子里就忍不住想：我还能不能把她找回来？她还会不会接受我？直到后来我真的试着去联系她、挽回她，才发现我们之间的感情好像已经变了味——不再像互相喜欢，反而更像一种建立在金钱或物质上的关系。所以我一直在想，到底是什么导致了这样的变化？如果那些事没发生，我们是不是永远不会变成这样？还是一直维持着这段畸形的关系？如果没有那些事，我现在是不是已经结婚了？</p><p>了解我的人都知道，我是个十足的I人，很多事都喜欢放在心里反复琢磨。不知不觉，这些念头完全占据了我的大脑，到最后甚至影响了我的工作和生活。我总觉得自己做什么都不如以前有干劲了，就算这份工作是我喜欢的，也提不起一点动力去认真对待。再后来，我越来越难集中注意力，甚至开始控制不住地去想这些事，晚上也一直想，整夜整夜睡不着。再后来，我渐渐的开始了有自残动作。直到身体拉响警报，我才发现自己已经被拖垮了。</p><p>说实话，这是我长这么大第一次去精神病院，以前从没想过会和这种地方产生交集。那天我一个人去了广州脑科医院，医生好像叫李英。我从来没和精神科医生打过交道，原本以为医生都差不多，只会关心病情，不会在意你的状态。但当时我一进诊室，医生看了我一眼，还没等我开口就问我：“你怎么一个人来的？”我瞬间愣住了。好像已经很久没人这样问过我了。是啊，我好像很坚强，这么严重的情况一个人扛了这么久，还能淡定地自己去精神病院；可我也好像很脆弱，因为这些破事，最终把自己拖垮了。</p><p>一切，也不过是一个在黑暗里的人，本能地想要求生罢了。</p><p>不出所料，检查结果和诊断显示我得了抑郁症——中度抑郁、轻度焦虑、轻度强迫。呵，果然如此。</p><p><img src="https://s3.huckops.xyz/1780764613387.jpg" alt="1780764613387.jpg"></p><p>也许这是我自己做出的孽吧。</p><h1 id="佛陀说"><a href="#佛陀说" class="headerlink" title="佛陀说"></a>佛陀说</h1><p>走出医院的那一刻，我明白了——有些事情必须改变。</p><p>从那以后，我几乎每天准时下班，回到小小的出租屋里打坐。伴着《金刚经》或《心经》的诵读声，我也跟着默念。佛陀说：“凡所有相，皆是虚妄”；“舍利子，是诸法空相，不生不灭，不垢不净，不增不减”。我所经历的这一切，或许只是一场业障，终将归于虚无，也不过是一次聚散、一场缘起缘灭的现象罢了。何必让这些虚妄困住自己呢？</p><p>记得有一次刷抖音，看到仁波切大师在讲座中被学生提问：“您是乘愿而来，还是因业而来？”这句话让我思考了很久。如果把这些经历都看作造业，那我显然已是业大于愿。但就像地藏王菩萨曾发下“地狱不空，誓不成佛”的大愿，并以愿为心、以业为行，持续度化众生——我也不能因一时业障而遮蔽了内心的愿力。唯有以愿导行、以行践愿，才能逐渐发掘自身的如来藏，走向解脱。</p><p>于是，我开始慢慢调整自己，尝试朝着好的方向前进。</p><h1 id="凡所有相，皆是虚妄"><a href="#凡所有相，皆是虚妄" class="headerlink" title="凡所有相，皆是虚妄"></a>凡所有相，皆是虚妄</h1><p>八月的某一天，领导突然叫我去会议室“聊点事情”。当他来找我的时候，我已经猜到要谈什么了。</p><p>果然，又是绩效问题。起初他说有机会让我转岗到其他部门，我同意后，很快安排了内部面试，并且很顺利。但对方部门因业务调整暂停了招聘，于是，我被列入了裁员名单。</p><p>其实这一天我等待已久，但我难以接受的，是他给出的理由。下面是今年三个季度的绩效反馈：</p><table><thead><tr><th>季度</th><th>结果</th><th>原因</th></tr></thead><tbody><tr><td>2025Q1</td><td>B</td><td>漏接报警电话2次，两个电话均为非核心项目的普通报警</td></tr><tr><td>2025Q2</td><td>B</td><td>领导说：“你性格太内向，影响到工作了。”</td></tr><tr><td>2025Q3</td><td>B</td><td>领导说：“你工作能力和岗位不匹配”</td></tr></tbody></table><p>这些理由听起来是否有些可笑？</p><p>我想说的是：第一，我性格确实偏内向，但如果真的影响了工作，为什么所有对接的项目方领导在我提出离职时都试图挽留甚至想挖我？为什么我能和不少业务方开发打成一片？第二，如果我的能力真的不匹配岗位，为什么在网易四年间，由我独立负责的项目虽非核心，却从未出现人为故障或业务投诉，还时常获得项目方的认可与好评？为什么经手的所有任务从未延期或无法交付？</p><p>这些问题，我的领导恐怕一个也答不上来。因为我知道，至少在我离职前半年，他根本没看过我的周报，完全不清楚我在项目中做了什么。</p><p>说到底，无非是公司眼下要“开源节流”，而我的优势在这里无法发挥最大价值，再加上我不是嫡系员工。裁掉我，换上一个自己人，他的“统治”地位便更加稳固。那些理由，不过是为了裁人而找的借口罢了，皆是虚妄。</p><p>不过依我看，这位领导恐怕也很难再往上升了，甚至可能很快面临他自己的“35岁危机”——因为他的管理能力实在令人担忧。我在职四年，团队裁掉三个人：我算是团队的代码核心输出，另外两位是核心项目担当。而我们都有一个共同点：都不是嫡系。</p><p>更多的话也不必说了。从以上这些，应该也能看出我的领导究竟是怎样的人了。</p><h1 id="处处不留爷，爷去当八路"><a href="#处处不留爷，爷去当八路" class="headerlink" title="处处不留爷，爷去当八路"></a>处处不留爷，爷去当八路</h1><p>经历过两段大厂生涯，我对互联网行业已没什么留恋。离职前，我曾在工位上自学Web3开发，离职后也尝试做过一些小项目。但我发现这条路在国内确实难走：人在国内，短期无法出国工作，Web3领域又面临很多合规问题，只好暂时搁置。</p><p>后来妹妹告诉我，她打算自己创办一家K12培训机构，想拉我一起当老师。我一口答应了。也许这是一条新的出路，能帮我走出互联网带来的心理阴影。于是今年最后这两个月，我一直埋头复习高中化学，专心准备寒假的班课。</p><p>但愿，我能沿着这条路，稳稳地走下去。</p><h1 id="有缘再见"><a href="#有缘再见" class="headerlink" title="有缘再见"></a>有缘再见</h1><p>在广州待了4年，也只认识了这几个朋友，以后可能再也见不到了。</p><p>这是两年前的照片，我的眼睛是迷离的。<br><img src="https://s3.huckops.xyz/1780764747284.jpg" alt="1780764747284.jpg"></p><p>这是谈完离职当天晚上的照片，我的眼睛里又有光了。<br><img src="https://s3.huckops.xyz/1780764825863.jpg" alt="1780764825863.jpg"></p><p>还有我最喜欢的网易老员工猫猫，貌似以前的白猫退休了，现在是它们的儿子来接班了。<br><img src="https://s3.huckops.xyz/1780764851627.png" alt="1780764851627.png"></p><p>还有我最喜欢的香港，如果以后还有机会，我一定会好好逛逛这个我最喜欢的城市。<br><img src="https://s3.huckops.xyz/1780764678265.jpg" alt="1780764678265.jpg"><br><img src="https://s3.huckops.xyz/1780764900166.jpg" alt="1780764900166.jpg"></p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;emmm，2025年又这么过去了。第一次写年度总结，好像应该写得正式一点，但又不知道从哪开始。随便写写吧，反正也没人看，就当是对自己这一年的碎碎念。&lt;/p&gt;
&lt;h1 id=&quot;感情的一波三折&quot;&gt;&lt;a href=&quot;#感情的一波三折&quot; class=&quot;headerlink&quot; tit</summary>
      
    
    
    
    <category term="年度总结" scheme="http://www.huckops.xyz/categories/%E5%B9%B4%E5%BA%A6%E6%80%BB%E7%BB%93/"/>
    
    
  </entry>
  
  <entry>
    <title>解读Cloudflare 20251118全球故障报告</title>
    <link href="http://www.huckops.xyz/2025/11/20/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/%E8%A7%A3%E8%AF%BBCloudflare%2020251118%E5%85%A8%E7%90%83%E6%95%85%E9%9A%9C%E6%8A%A5%E5%91%8A/"/>
    <id>http://www.huckops.xyz/2025/11/20/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/%E8%A7%A3%E8%AF%BBCloudflare%2020251118%E5%85%A8%E7%90%83%E6%95%85%E9%9A%9C%E6%8A%A5%E5%91%8A/</id>
    <published>2025-11-20T15:53:34.000Z</published>
    <updated>2026-06-06T17:04:29.931Z</updated>
    
    <content type="html"><![CDATA[<h1 id="Cloudflare-是什么"><a href="#Cloudflare-是什么" class="headerlink" title="Cloudflare 是什么"></a>Cloudflare 是什么</h1><p>对于运维同学来说，cloudflare 并不是一个陌生的名词，我们会经常把 cloudflare 称为赛博活佛（因为它提供了免费的全球代理等功能，在其他云上那边都要只能上全球 CDN，收费还不便宜，但是 cloudflare 只需要把域名托管到 cloudflare 平台上就可以免费使用全球加速了）。正如最近网上盛传的一个梗图</p><p><img src="https://s3.huckops.xyz/1780765022682.png" alt="1780765022682.png"></p><p>cloudflare 已经算是互联网的最根本的基石了，可以说 cloudflare 撑起了一整个互联网。cloudflare 在海外提供了多种云服务，比如权威 DNS（one.one.one.one）、全球 CDN、DDoS 防护、SSL/TLS 加密等，可以说只要目前咱们叫得上名字的大部分海外大厂都在用他们家的服务，比如 Netflix、Spotify、Twitter 等，可以说 cloudflare 在海外已经是举足轻重的地位了。但是因为一些合规原因，cloudflare 在国内业务是代理给国内一家叫科赋锐的公司（名字这么像，估计是为了合规在国内搞的全资子公司），而且经过代理之后，海外 cloudflare 有的优势以及特性在国内都不存在了，所以在国内的普及度远不如腾讯云和阿里云。</p><h1 id="Cloudflare-20251118-全球故障"><a href="#Cloudflare-20251118-全球故障" class="headerlink" title="Cloudflare 20251118 全球故障"></a>Cloudflare 20251118 全球故障</h1><p>2025 年 11 月 18 日 11:20 UTC（19:20 中国时间），全球多个在 cloudflare 平台托管的域名出现了故障，包括 coinbase、twitter、reddit、github 等，访问报错 500.</p><p><img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3ony9XsTIteX8DNEFJDddJ/7da2edd5abca755e9088002a0f5d1758/BLOG-3079_2.png"></p><p>所有网站访问的报错都是上边这样的，从域名关联的流量链路共性来看，都经过了一层 cloudflare 代理，都显示 cloudflare 报错，所以可以肯定是 cloudflare 跪了。目前 cloudflare 方面已经发布了相关的故障报告，链接：<a href="https://blog.cloudflare.com/zh-cn/18-november-2025-outage/">2025 年 11 月 18 日 Cloudflare 服务中断</a></p><p>这次的故障影响面极大，我们先不看故障报告，用我们运维经验来看分析这个故障，无外乎有以下一些可能：</p><h2 id="DDoS"><a href="#DDoS" class="headerlink" title="DDoS"></a>DDoS</h2><p>cloudflare 的代理可以理解是一个全球动态加速的 CDN 和优质 IP 的 ALB 的集合。既然这是一个 CDN 服务，那么一定会在全球各地部署边缘节点，并且会有一整套 anycast 策略。也就是说，如果遭受到 DDoS 攻击，如果 anycast 策略没有失能，且没有出现 BUG 向所有节点分发流量的问题（anycast 是基于路由策略实现的，即使出问题也是直接丢包，全量分发流量基本不太可能），那么即使被攻击，最多也是某一个区域出现故障，不会影响到全球用户。而且以 cloudflare 这么大的体量，加上 cloudflare 在前端本来就有一层 WAF，即使流量都穿透到用户那边了，我觉得最先跪下的是产品服务端，而不是 cloudflare。</p><p>当然也有一种特殊的可能，就是黑客从全球多点发起大规模的攻击，这个是根本不可能的。就 cloudflare 当前的体量，加上本来也加了一层 WAF 的情况来说，黑客攻击的难度极高，且成本也极高（要是这都想打，这得和 cloudflare 多大仇啊，我只能竖起大拇指了），所以基本可以排查出掉这个说法。</p><h2 id="代理服务-BUG"><a href="#代理服务-BUG" class="headerlink" title="代理服务 BUG"></a>代理服务 BUG</h2><p>我们都知道，Cloudflare 的代理服务不仅具备传统代理的基础能力，还通过类似 OpenResty 的机制扩展了大量插件功能，从而实现丰富的流量处理逻辑。因此，其代理服务已远超常规意义上的代理范畴。</p><p>回顾当天的故障现象，由于所有托管网站都炸了，且未出现部分请求正常的情况，可以推断：如果真是插件代码出现 BUG，那么很可能是某个必须对每条流量进行处理的插件发生异常，导致所有请求都在该环节被阻塞，进而被丢弃或返回错误，而未能继续发往后端服务。</p><p>这么看的话，理论是成立的，两个必要条件只要满足一个就基本实锤：</p><ol><li>故障前进行了整系统变更，包括但不限于插件升级、配置变更等。</li><li>故障时间点的流量出现了一个极大的波动，导致代理系统的资源被耗尽。</li></ol><h2 id="cloudflare-内部网络故障"><a href="#cloudflare-内部网络故障" class="headerlink" title="cloudflare 内部网络故障"></a>cloudflare 内部网络故障</h2><p>这个原因是比较少人考虑到的一种情况。cloudflare 既然代理是个全球动态 CDN，那么原理无外乎是用户请求从 anycast 入口（ingress 向）进入，各边缘节点通过隧道/专线等策略实现互通，流量走这些路由策略以最快的速度到达距离服务端最近的出口网关（egress 向）然后把流量发出去，回包的原理直接把这个链路反过来就好（GCP 的 global load balancer 底层原理就和这个一样）。那么，如果内部 SDN 或者专线出现故障的话，依然会导致整个代理系统雪崩。</p><p>这个理论在本次故障中也是成立的。</p><h1 id="放下经验，回归报告"><a href="#放下经验，回归报告" class="headerlink" title="放下经验，回归报告"></a>放下经验，回归报告</h1><p>报告内容很长，我只摘取我觉得关键的点进行一些分析。</p><h2 id="开门见山"><a href="#开门见山" class="headerlink" title="开门见山"></a>开门见山</h2><blockquote><p>并不是任何类型的网络攻击或恶意活动直接或间接引发了此问题。相反，它是因为我们数据库系统权限变更而触发，权限变更导致数据库将多个条目输出到 Cloudflare 机器人管理系统使用的“特征文件”。结果，该特征文件的大小增加了一倍。随后，这个超出预期大小的特征文件传播到构成 Cloudflare 网络的所有计算机。<br>合乎逻辑的解释是，该文件由 ClickHouse 数据库集群上运行的查询每五分钟生成一次，该集群正在逐步更新，以改进权限管理。只有当查询在集群中已更新的部分运行时，才会生成错误数据。因此，有可能每五分钟生成一组正确的配置文件或一组错误的配置文件，并在 Cloudflare 网络中快速传播。</p></blockquote><p>报告直接开门见山的交代了本次故障的根本原因，指明了这就是由代码变更引入的 BUG，有人可能会问，这个 BUG 难道在上线前 QA 就没有测出来吗？</p><p>其实，这种 bug 即使在上线前过了 QA，并且严格按照了灰度发布流程，也未见得就能发现。原因如下（以下内容是从经验出发，可能不单单针对这一次故障）：</p><ol><li><p>如果是小版本更新迭代的话，团队通常是不会做性能压测的（压测浪费时间，并且成本极高），且测试可能会存在一定的边界条件，就会出现：</p><ul><li>打流不足，服务无法明确展示 QPS 与消耗资源的关系</li></ul></li><li><p>测试出现边界（测试环境和正式环境数据不一致，这个是难免的，甚至有的回归环境数据都是乱的），所以测试用例可能只能覆盖到正常情况，而无法覆盖到边缘情况。</p></li><li><p>灰度发布策略也不一定会及时发现问题，这个主要取决于发布策略了。如果首轮放量太小的话，也会出现像上面一样打流不足的情况。所以估计是新版本发布时仅进行了小流量灰度放流，灰度环境未发现异常就直接全量了，然后全量后就炸了</p></li></ol><p>我以前在做云游戏的时候遇到过一个类似问题。当时后端更新新版后引入一个错误的查询，导致查询量超大，进而导致数据库和后端服务器被拖垮，导致服务不可用，这种问题 QA 是测不出来的。（这个故障和本次故障的根因类似，但是机制不同，继续看报告）</p><h2 id="问题分析"><a href="#问题分析" class="headerlink" title="问题分析"></a>问题分析</h2><p><img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6qlWXM3gh4SaYYvsGc7mFV/99294b22963bb414435044323aed7706/BLOG-3079_4.png"></p><p>这个代理架构我在前面已经有说过了，现在直接看核心内容</p><blockquote><p>Cloudflare 的机器人管理模块包含多个系统，其中之一是机器学习模型，我们使用该模型为流经 Cloudflare 网络的每个请求生成机器人评分。我们的客户使用机器人评分来控制允许或禁止哪些机器人访问其网站。<br>由于底层 ClickHouse 查询行为变更（详见下文），导致生成特征文件时出现了大量重复的“特征”行。这改变了原本固定大小的特征配置文件的大小，从而导致机器人模块触发错误。</p></blockquote><p>这就是个典型的中间件故障导致的整个系统异常的情况。</p><p>从结构图中可以看出，流量从客户端到服务端是一个链式的，FL 模块中所有子模块都是独立的，理论上来说一个好的架构在大模块中的子模块应该尽可能完全独立，某一模块出现故障时可以直接降级对应模块以维护业务主流程稳定，但是从本次的故障看来，cloudflare 对于这部分并没有做到很好的降级策略（其实这个也不能怪到 cloudflare，如果真 的在故障时把流量清洗这一块降级掉的话，那攻击流量是很恐怖的，流量直接穿透客户要直接骂街了）。</p><p>基于上述分析，我们可以总结出运维架构设计中的几个关键原则：</p><ol><li><p><strong>强依赖组件的独立性设计</strong>：在架构设计阶段，应确保核心模块内的子系统具备高度独立性，当某一组件发生故障时，能够通过降级机制保障业务主流程的持续运行（即使牺牲部分非核心功能）。本质上，就是要为业务流程提供绕过故障节点的能力。</p></li><li><p><strong>降级策略的场景化考量</strong>：上述原则并非适用于所有场景，例如本次 Cloudflare 的流量清洗场景。以电商系统为例，若订单财务系统发生故障，强行降级以维持交易流程可能导致订单数据不一致或丢失。在非高峰期，直接暂停相关业务流程进行维护，其整体影响和损失反而小于数据异常带来的风险。</p></li></ol><p>数据查询的内容多且繁杂，我用 AI 进行了一下总结：</p><blockquote><p>权限变更引发数据异常：为优化 ClickHouse 分布式查询的安全性与精细化权限管理，Cloudflare 调整数据库配置，让用户可显式访问 r0 数据库（存储分片基础数据）的表元数据，打破了此前仅能查看 default 数据库（分布式表所在库）的限制。<br>查询缺失数据库筛选条件：机器人管理系统生成特征文件时，执行的表元数据查询指令（如查询 system.columns 表）未添加 “按数据库名称筛选” 的条件，权限变更后，该查询同时返回 default 和 r0 数据库中同名表的列数据，导致结果出现大量重复特征行。<br>重复数据触发连锁故障：重复条目使特征文件体积翻倍，超出核心代理软件的预设大小限制，直接引发路由软件崩溃，进而导致全网服务中断。</p></blockquote><p>我对 PG 也不是很熟悉，那我们就从一个不懂 PG 的角度来切入，看这里他们踩了几个雷：</p><ol><li><p>涉及数据库账号权限的，一定要做显式定义，一定要有权限 deny 兜底，以防止权限击穿，操作到不该操作的东西。</p></li><li><p>数据变更，特别是权限类的变更，一个不小心就会导致线上出现问题，应该在回归环境充分进行模拟测试，确保变更合理。</p></li><li><p>数据库查询时尽量要做显式筛选，避免数据查询精确度问题。</p></li><li><p>代码冗余性问题，查询到错误数据就直接 crash 掉了，这显然是不太优雅的。我理解这里应该可以接入一个缓存机制（因为模型数据本来也变的不多，所以引入缓存也不会对业务有很大影响）（尽量用 memcache 这一类的单节点缓存，每个节点只存自己的数据，避免缓存数据异常时整个集群雪崩。）当程序明确感知到查到的数据有异常时，默认从缓存中获取最后一次获取到合法的数据。</p></li></ol><h1 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h1><p>其实回看这个故障报告，主要的风险点就是权限变更引入 BUG 进而导致业务雪崩，并且业务没有做到很好的兜底策略。总的来看，有两个教训：</p><ul><li>运维中，一定要严格按照既定 SOP 操作，操作前在测试或者回归环境中做严格的变更测试，操作后密切关注业务状态，状态异常立即降级服务或回滚变更。</li><li>开发中，做好重点服务依赖隔离，做好容灾降级策略，做好服务异常兜底策略。</li></ul><p>但愿这么大的故障，以后还是少出现一些吧。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;Cloudflare-是什么&quot;&gt;&lt;a href=&quot;#Cloudflare-是什么&quot; class=&quot;headerlink&quot; title=&quot;Cloudflare 是什么&quot;&gt;&lt;/a&gt;Cloudflare 是什么&lt;/h1&gt;&lt;p&gt;对于运维同学来说，cloudflare 并不</summary>
      
    
    
    
    <category term="运维技术" scheme="http://www.huckops.xyz/categories/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/"/>
    
    
  </entry>
  
  <entry>
    <title>从ERC20 USDT分析区块链合约原理</title>
    <link href="http://www.huckops.xyz/2025/11/15/Web3.0/%E4%BB%8EERC20%20USDT%E5%88%86%E6%9E%90%E5%90%88%E7%BA%A6%E5%8E%9F%E7%90%86/"/>
    <id>http://www.huckops.xyz/2025/11/15/Web3.0/%E4%BB%8EERC20%20USDT%E5%88%86%E6%9E%90%E5%90%88%E7%BA%A6%E5%8E%9F%E7%90%86/</id>
    <published>2025-11-15T10:00:00.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="加密货币和稳定币"><a href="#加密货币和稳定币" class="headerlink" title="加密货币和稳定币"></a>加密货币和稳定币</h1><p>加密货币（Cryptocurrency）又经常被称为加密资产（Crypto Asset），是指在区块链技术上运行的数字资产，如比特币（Bitcoin）、以太坊（Ethereum）等，同时也衍生出其他类型的加密资产，如 NFT（非同质化代币）等。此类货币相对于现实世界无固定的锚点，其价值由市场供需决定，且由于区块出块难度调整、市场流动性等因素，价格会有较大波动。为了解决这一问题，市场衍生出了稳定币（Stablecoin）。</p><p>稳定币（Stablecoin）是指通过特定机制保持价值相对稳定的加密货币，通常与法定货币（如美元、欧元）挂钩。目前市场上常见的稳定币包括 USDT（Tether）、USDC（USD Coin）等。根据发行机制不同，稳定币主要可分为三类：法币抵押型、加密资产抵押型和算法稳定币。以 USDT 为例，它属于法币抵押型稳定币，由 Tether 公司发行，理论上每发行 1 枚 USDT，Tether 应持有价值 1 美元的储备资产（包括现金、债券等），形成 1:1 的锚定关系，用户理论上可以随时用 1 USDT 兑换 1 美元。</p><h1 id="多链问题解决"><a href="#多链问题解决" class="headerlink" title="多链问题解决"></a>多链问题解决</h1><p>了解区块链的玩家都知道，目前主流的公链有很多种，数字资产在发行时会选择在一个链或是多个链上发行。例如，USDT（Tether）就选择在多个链上发行，包括以太坊（Ethereum）、波卡（Polkadot）、卡普（Kucoin）等，但像 OKB 这一类的平台币，通常只在 OK 链（OKChain）上发行。</p><p>在不同的链上发行数字货币，其实本质上就是在不同的链上部署一个智能合约，如 USDT 在 Tron 链的合约地址<a href="https://tronscan.org/#/token20/TR7NHqjeKQxGTCi8q8ZY4pL8otSzgjLj6t">TR7NHqjeKQxGTCi8q8ZY4pL8otSzgjLj6t</a>，在以太坊链上的合约地址<a href="https://etherscan.io/address/0xdAC17F958D2ee523a2206206994597C13D831ec7">0xdAC17F958D2ee523a2206206994597C13D831ec7</a>。从合约地址可以看出，USDT 在不同链上的合约地址是不同的，这是因为每个链都有自己的地址空间，合约地址在不同链上是唯一的。</p><p>当然，无论在哪个链上购买数字资产，理论上所有链的同名数字资产都是等价的，用户如果希望在不同链上进行资产转移，通常需要通过跨链桥（Cross-Chain Bridge）来实现。跨链桥是一种特殊的合约，用于在不同链之间进行资产的转移和交互，在进行交易时需要支付一笔跨链手续费。</p><p>有人可能也会疑惑，在交易所充值 UDST 时，为什么只有在充值时需要选择充值的链，而充值完成后却不显示自己充值的货币属于哪个链。其实是因为，目前交易所都是用的内部记账的方式，用户只需要在充值时选择交易链，交易所收到充值金额后会向用户的账户上划拨对应的金额。其实本质上来说充值到交易所的数字货币是托管在交易所的，真实的 Token 资产并不在用户手里（这时候的余额其实就是数据库里的一串数字）。当用户需要提取加密货币时，用户可以选择任意的交易链（其实可以认为用户的数字资产在交易所实现了无痛跨链转换）。</p><h1 id="智能合约原理分析"><a href="#智能合约原理分析" class="headerlink" title="智能合约原理分析"></a>智能合约原理分析</h1><p>从区块链的原理来看，数字货币本质上就是一个智能合约。智能合约是运行在区块链 EVM 上的程序，通常由 solidity 语言、go 语言、rust 等语言编写，这里以 ERC20 USDT 合约为例，分析一下 USDT 合约的原理。</p><h2 id="名词和术语"><a href="#名词和术语" class="headerlink" title="名词和术语"></a>名词和术语</h2><p>区块链可以类比到普通业务应用，只不过区块链业务时去中心化的，节点之间通过共识机制进行通信。用户和区块链交互一般都是通过 RPC 接口调用合约（和普通业务应用的 API 接口类似），合约通常有多个接口函数，在区块链中被称为 ABI（Application Binary Interface）。每个接口函数都有一个函数签名，用于唯一标识该函数。合约的状态变量（如余额、授权等）通常存储在合约的存储区域（Storage）中，而函数参数和局部变量则存储在合约的内存区域（Memory）中。</p><h2 id="合约标准"><a href="#合约标准" class="headerlink" title="合约标准"></a>合约标准</h2><p>和普通业务应用一样，智能合约也有类似 Restful 一样的接口规范，如：</p><h3 id="ERC20-标准"><a href="#ERC20-标准" class="headerlink" title="ERC20 标准"></a>ERC20 标准</h3><p>ERC20 时以太坊上最常见的合约标准，其定义了以下一些函数</p><ul><li>账户余额(balanceOf())</li><li>转账(transfer())</li><li>授权转账(transferFrom())</li><li>授权(approve())</li><li>代币总供给(totalSupply())</li><li>授权转账额度(allowance())</li><li>代币信息（可选）：名称(name())，代号(symbol())，小数位数(decimals())</li></ul><p>其对应一个接口规范 IERC20，其定义如下：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">interface IERC20 &#123;</span><br><span class="line">    function totalSupply() external view returns (uint256);</span><br><span class="line">    function balanceOf(address account) external view returns (uint256);</span><br><span class="line">    function transfer(address recipient, uint256 amount) external returns (bool);</span><br><span class="line">    function allowance(address owner, address spender) external view returns (uint256);</span><br><span class="line">    function approve(address spender, uint256 amount) external returns (bool);</span><br><span class="line">    function transferFrom(address sender, address recipient, uint256 amount) external returns (bool);</span><br><span class="line"></span><br><span class="line">    event Transfer(address indexed from, address indexed to, uint256 value);</span><br><span class="line">    event Approval(address indexed owner, address indexed spender, uint256 value);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="ERC721-标准"><a href="#ERC721-标准" class="headerlink" title="ERC721 标准"></a>ERC721 标准</h3><p>ERC721 是以太坊上最常见的非同质化代币（NFT）合约标准，其对应一个接口规范 IERC721，其定义如下：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line">interface IERC721 is IERC165 &#123;</span><br><span class="line">    event Transfer(address indexed from, address indexed to, uint256 indexed tokenId);</span><br><span class="line">    event Approval(address indexed owner, address indexed approved, uint256 indexed tokenId);</span><br><span class="line">    event ApprovalForAll(address indexed owner, address indexed operator, bool approved);</span><br><span class="line"></span><br><span class="line">    function balanceOf(address owner) external view returns (uint256 balance);</span><br><span class="line"></span><br><span class="line">    function ownerOf(uint256 tokenId) external view returns (address owner);</span><br><span class="line"></span><br><span class="line">    function safeTransferFrom(</span><br><span class="line">        address from,</span><br><span class="line">        address to,</span><br><span class="line">        uint256 tokenId,</span><br><span class="line">        bytes calldata data</span><br><span class="line">    ) external;</span><br><span class="line"></span><br><span class="line">    function safeTransferFrom(</span><br><span class="line">        address from,</span><br><span class="line">        address to,</span><br><span class="line">        uint256 tokenId</span><br><span class="line">    ) external;</span><br><span class="line"></span><br><span class="line">    function transferFrom(</span><br><span class="line">        address from,</span><br><span class="line">        address to,</span><br><span class="line">        uint256 tokenId</span><br><span class="line">    ) external;</span><br><span class="line"></span><br><span class="line">    function approve(address to, uint256 tokenId) external;</span><br><span class="line"></span><br><span class="line">    function setApprovalForAll(address operator, bool _approved) external;</span><br><span class="line"></span><br><span class="line">    function getApproved(uint256 tokenId) external view returns (address operator);</span><br><span class="line"></span><br><span class="line">    function isApprovedForAll(address owner, address operator) external view returns (bool);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以对比 ERC20 标准，可以发现 ERC721 标准多出一个 tokenId 参数，其本质原因来源于 NFT 的特殊性。ERC20 代币都时同质化代币，也就是所有代币（Token）是相同的，就好比 1 美元，无论多少张这个钱的设计以及价值都是一样的。但是 ERC721 代币不同，每个代币都会携带不同的元数据（Metadata），可以类比成纪念银币，这个代币合约类比到一批银币，合约铸造出不同的代币对应到不同的银币，每个银币都是独一无二的。</p><h2 id="USDT-合约分析"><a href="#USDT-合约分析" class="headerlink" title="USDT 合约分析"></a>USDT 合约分析</h2><p>有了上述的基础知识，可以开始看 UDST 的合约代码了。</p><h3 id="代币主合约"><a href="#代币主合约" class="headerlink" title="代币主合约"></a>代币主合约</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br><span class="line">188</span><br><span class="line">189</span><br><span class="line">190</span><br><span class="line">191</span><br><span class="line">192</span><br><span class="line">193</span><br><span class="line">194</span><br><span class="line">195</span><br><span class="line">196</span><br><span class="line">197</span><br><span class="line">198</span><br><span class="line">199</span><br><span class="line">200</span><br><span class="line">201</span><br><span class="line">202</span><br><span class="line">203</span><br><span class="line">204</span><br><span class="line">205</span><br><span class="line">206</span><br><span class="line">207</span><br><span class="line">208</span><br><span class="line">209</span><br><span class="line">210</span><br><span class="line">211</span><br><span class="line">212</span><br><span class="line">213</span><br><span class="line">214</span><br><span class="line">215</span><br><span class="line">216</span><br><span class="line">217</span><br><span class="line">218</span><br><span class="line">219</span><br><span class="line">220</span><br><span class="line">221</span><br><span class="line">222</span><br><span class="line">223</span><br><span class="line">224</span><br><span class="line">225</span><br><span class="line">226</span><br><span class="line">227</span><br><span class="line">228</span><br><span class="line">229</span><br><span class="line">230</span><br><span class="line">231</span><br><span class="line">232</span><br><span class="line">233</span><br><span class="line">234</span><br><span class="line">235</span><br><span class="line">236</span><br><span class="line">237</span><br><span class="line">238</span><br><span class="line">239</span><br><span class="line">240</span><br><span class="line">241</span><br><span class="line">242</span><br><span class="line">243</span><br><span class="line">244</span><br><span class="line">245</span><br><span class="line">246</span><br><span class="line">247</span><br><span class="line">248</span><br><span class="line">249</span><br><span class="line">250</span><br><span class="line">251</span><br><span class="line">252</span><br><span class="line">253</span><br><span class="line">254</span><br><span class="line">255</span><br><span class="line">256</span><br><span class="line">257</span><br><span class="line">258</span><br><span class="line">259</span><br><span class="line">260</span><br><span class="line">261</span><br><span class="line">262</span><br><span class="line">263</span><br><span class="line">264</span><br><span class="line">265</span><br><span class="line">266</span><br><span class="line">267</span><br><span class="line">268</span><br><span class="line">269</span><br><span class="line">270</span><br><span class="line">271</span><br><span class="line">272</span><br><span class="line">273</span><br><span class="line">274</span><br><span class="line">275</span><br><span class="line">276</span><br><span class="line">277</span><br><span class="line">278</span><br><span class="line">279</span><br><span class="line">280</span><br><span class="line">281</span><br><span class="line">282</span><br><span class="line">283</span><br><span class="line">284</span><br><span class="line">285</span><br><span class="line">286</span><br><span class="line">287</span><br><span class="line">288</span><br><span class="line">289</span><br><span class="line">290</span><br><span class="line">291</span><br><span class="line">292</span><br><span class="line">293</span><br><span class="line">294</span><br><span class="line">295</span><br><span class="line">296</span><br><span class="line">297</span><br><span class="line">298</span><br><span class="line">299</span><br><span class="line">300</span><br><span class="line">301</span><br><span class="line">302</span><br><span class="line">303</span><br><span class="line">304</span><br><span class="line">305</span><br><span class="line">306</span><br><span class="line">307</span><br><span class="line">308</span><br><span class="line">309</span><br><span class="line">310</span><br><span class="line">311</span><br><span class="line">312</span><br><span class="line">313</span><br><span class="line">314</span><br><span class="line">315</span><br><span class="line">316</span><br><span class="line">317</span><br><span class="line">318</span><br><span class="line">319</span><br><span class="line">320</span><br><span class="line">321</span><br><span class="line">322</span><br><span class="line">323</span><br><span class="line">324</span><br><span class="line">325</span><br><span class="line">326</span><br><span class="line">327</span><br><span class="line">328</span><br><span class="line">329</span><br><span class="line">330</span><br><span class="line">331</span><br><span class="line">332</span><br><span class="line">333</span><br><span class="line">334</span><br><span class="line">335</span><br><span class="line">336</span><br><span class="line">337</span><br><span class="line">338</span><br><span class="line">339</span><br><span class="line">340</span><br><span class="line">341</span><br><span class="line">342</span><br><span class="line">343</span><br><span class="line">344</span><br><span class="line">345</span><br><span class="line">346</span><br><span class="line">347</span><br><span class="line">348</span><br><span class="line">349</span><br><span class="line">350</span><br><span class="line">351</span><br><span class="line">352</span><br><span class="line">353</span><br><span class="line">354</span><br><span class="line">355</span><br><span class="line">356</span><br><span class="line">357</span><br><span class="line">358</span><br><span class="line">359</span><br><span class="line">360</span><br><span class="line">361</span><br><span class="line">362</span><br><span class="line">363</span><br><span class="line">364</span><br><span class="line">365</span><br><span class="line">366</span><br><span class="line">367</span><br><span class="line">368</span><br><span class="line">369</span><br><span class="line">370</span><br><span class="line">371</span><br><span class="line">372</span><br><span class="line">373</span><br><span class="line">374</span><br><span class="line">375</span><br><span class="line">376</span><br><span class="line">377</span><br><span class="line">378</span><br><span class="line">379</span><br><span class="line">380</span><br><span class="line">381</span><br><span class="line">382</span><br><span class="line">383</span><br><span class="line">384</span><br><span class="line">385</span><br><span class="line">386</span><br><span class="line">387</span><br><span class="line">388</span><br><span class="line">389</span><br><span class="line">390</span><br><span class="line">391</span><br><span class="line">392</span><br><span class="line">393</span><br><span class="line">394</span><br><span class="line">395</span><br><span class="line">396</span><br><span class="line">397</span><br><span class="line">398</span><br><span class="line">399</span><br><span class="line">400</span><br><span class="line">401</span><br><span class="line">402</span><br><span class="line">403</span><br><span class="line">404</span><br><span class="line">405</span><br><span class="line">406</span><br><span class="line">407</span><br><span class="line">408</span><br><span class="line">409</span><br><span class="line">410</span><br><span class="line">411</span><br><span class="line">412</span><br><span class="line">413</span><br><span class="line">414</span><br><span class="line">415</span><br><span class="line">416</span><br><span class="line">417</span><br><span class="line">418</span><br><span class="line">419</span><br><span class="line">420</span><br><span class="line">421</span><br><span class="line">422</span><br><span class="line">423</span><br><span class="line">424</span><br><span class="line">425</span><br><span class="line">426</span><br><span class="line">427</span><br><span class="line">428</span><br><span class="line">429</span><br><span class="line">430</span><br><span class="line">431</span><br><span class="line">432</span><br><span class="line">433</span><br><span class="line">434</span><br><span class="line">435</span><br><span class="line">436</span><br><span class="line">437</span><br><span class="line">438</span><br><span class="line">439</span><br><span class="line">440</span><br><span class="line">441</span><br><span class="line">442</span><br><span class="line">443</span><br><span class="line">444</span><br><span class="line">445</span><br><span class="line">446</span><br><span class="line">447</span><br></pre></td><td class="code"><pre><span class="line">pragma solidity ^0.4.17;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title SafeMath</span><br><span class="line"> * @dev Math operations with safety checks that throw on error</span><br><span class="line"> */</span><br><span class="line">library SafeMath &#123;</span><br><span class="line">    function mul(uint256 a, uint256 b) internal pure returns (uint256) &#123;</span><br><span class="line">        if (a == 0) &#123;</span><br><span class="line">            return 0;</span><br><span class="line">        &#125;</span><br><span class="line">        uint256 c = a * b;</span><br><span class="line">        assert(c / a == b);</span><br><span class="line">        return c;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function div(uint256 a, uint256 b) internal pure returns (uint256) &#123;</span><br><span class="line">        // assert(b &gt; 0); // Solidity automatically throws when dividing by 0</span><br><span class="line">        uint256 c = a / b;</span><br><span class="line">        // assert(a == b * c + a % b); // There is no case in which this doesn&#x27;t hold</span><br><span class="line">        return c;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function sub(uint256 a, uint256 b) internal pure returns (uint256) &#123;</span><br><span class="line">        assert(b &lt;= a);</span><br><span class="line">        return a - b;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function add(uint256 a, uint256 b) internal pure returns (uint256) &#123;</span><br><span class="line">        uint256 c = a + b;</span><br><span class="line">        assert(c &gt;= a);</span><br><span class="line">        return c;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title Ownable</span><br><span class="line"> * @dev The Ownable contract has an owner address, and provides basic authorization control</span><br><span class="line"> * functions, this simplifies the implementation of &quot;user permissions&quot;.</span><br><span class="line"> */</span><br><span class="line">contract Ownable &#123;</span><br><span class="line">    address public owner;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">      * @dev The Ownable constructor sets the original `owner` of the contract to the sender</span><br><span class="line">      * account.</span><br><span class="line">      */</span><br><span class="line">    function Ownable() public &#123;</span><br><span class="line">        owner = msg.sender;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">      * @dev Throws if called by any account other than the owner.</span><br><span class="line">      */</span><br><span class="line">    modifier onlyOwner() &#123;</span><br><span class="line">        require(msg.sender == owner);</span><br><span class="line">        _;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev Allows the current owner to transfer control of the contract to a newOwner.</span><br><span class="line">    * @param newOwner The address to transfer ownership to.</span><br><span class="line">    */</span><br><span class="line">    function transferOwnership(address newOwner) public onlyOwner &#123;</span><br><span class="line">        if (newOwner != address(0)) &#123;</span><br><span class="line">            owner = newOwner;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title ERC20Basic</span><br><span class="line"> * @dev Simpler version of ERC20 interface</span><br><span class="line"> * @dev see https://github.com/ethereum/EIPs/issues/20</span><br><span class="line"> */</span><br><span class="line">contract ERC20Basic &#123;</span><br><span class="line">    uint public _totalSupply;</span><br><span class="line">    function totalSupply() public constant returns (uint);</span><br><span class="line">    function balanceOf(address who) public constant returns (uint);</span><br><span class="line">    function transfer(address to, uint value) public;</span><br><span class="line">    event Transfer(address indexed from, address indexed to, uint value);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title ERC20 interface</span><br><span class="line"> * @dev see https://github.com/ethereum/EIPs/issues/20</span><br><span class="line"> */</span><br><span class="line">contract ERC20 is ERC20Basic &#123;</span><br><span class="line">    function allowance(address owner, address spender) public constant returns (uint);</span><br><span class="line">    function transferFrom(address from, address to, uint value) public;</span><br><span class="line">    function approve(address spender, uint value) public;</span><br><span class="line">    event Approval(address indexed owner, address indexed spender, uint value);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title Basic token</span><br><span class="line"> * @dev Basic version of StandardToken, with no allowances.</span><br><span class="line"> */</span><br><span class="line">contract BasicToken is Ownable, ERC20Basic &#123;</span><br><span class="line">    using SafeMath for uint;</span><br><span class="line"></span><br><span class="line">    mapping(address =&gt; uint) public balances;</span><br><span class="line"></span><br><span class="line">    // additional variables for use if transaction fees ever became necessary</span><br><span class="line">    uint public basisPointsRate = 0;</span><br><span class="line">    uint public maximumFee = 0;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev Fix for the ERC20 short address attack.</span><br><span class="line">    */</span><br><span class="line">    modifier onlyPayloadSize(uint size) &#123;</span><br><span class="line">        require(!(msg.data.length &lt; size + 4));</span><br><span class="line">        _;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev transfer token for a specified address</span><br><span class="line">    * @param _to The address to transfer to.</span><br><span class="line">    * @param _value The amount to be transferred.</span><br><span class="line">    */</span><br><span class="line">    function transfer(address _to, uint _value) public onlyPayloadSize(2 * 32) &#123;</span><br><span class="line">        uint fee = (_value.mul(basisPointsRate)).div(10000);</span><br><span class="line">        if (fee &gt; maximumFee) &#123;</span><br><span class="line">            fee = maximumFee;</span><br><span class="line">        &#125;</span><br><span class="line">        uint sendAmount = _value.sub(fee);</span><br><span class="line">        balances[msg.sender] = balances[msg.sender].sub(_value);</span><br><span class="line">        balances[_to] = balances[_to].add(sendAmount);</span><br><span class="line">        if (fee &gt; 0) &#123;</span><br><span class="line">            balances[owner] = balances[owner].add(fee);</span><br><span class="line">            Transfer(msg.sender, owner, fee);</span><br><span class="line">        &#125;</span><br><span class="line">        Transfer(msg.sender, _to, sendAmount);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev Gets the balance of the specified address.</span><br><span class="line">    * @param _owner The address to query the the balance of.</span><br><span class="line">    * @return An uint representing the amount owned by the passed address.</span><br><span class="line">    */</span><br><span class="line">    function balanceOf(address _owner) public constant returns (uint balance) &#123;</span><br><span class="line">        return balances[_owner];</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title Standard ERC20 token</span><br><span class="line"> *</span><br><span class="line"> * @dev Implementation of the basic standard token.</span><br><span class="line"> * @dev https://github.com/ethereum/EIPs/issues/20</span><br><span class="line"> * @dev Based oncode by FirstBlood: https://github.com/Firstbloodio/token/blob/master/smart_contract/FirstBloodToken.sol</span><br><span class="line"> */</span><br><span class="line">contract StandardToken is BasicToken, ERC20 &#123;</span><br><span class="line"></span><br><span class="line">    mapping (address =&gt; mapping (address =&gt; uint)) public allowed;</span><br><span class="line"></span><br><span class="line">    uint public constant MAX_UINT = 2**256 - 1;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev Transfer tokens from one address to another</span><br><span class="line">    * @param _from address The address which you want to send tokens from</span><br><span class="line">    * @param _to address The address which you want to transfer to</span><br><span class="line">    * @param _value uint the amount of tokens to be transferred</span><br><span class="line">    */</span><br><span class="line">    function transferFrom(address _from, address _to, uint _value) public onlyPayloadSize(3 * 32) &#123;</span><br><span class="line">        var _allowance = allowed[_from][msg.sender];</span><br><span class="line"></span><br><span class="line">        // Check is not needed because sub(_allowance, _value) will already throw if this condition is not met</span><br><span class="line">        // if (_value &gt; _allowance) throw;</span><br><span class="line"></span><br><span class="line">        uint fee = (_value.mul(basisPointsRate)).div(10000);</span><br><span class="line">        if (fee &gt; maximumFee) &#123;</span><br><span class="line">            fee = maximumFee;</span><br><span class="line">        &#125;</span><br><span class="line">        if (_allowance &lt; MAX_UINT) &#123;</span><br><span class="line">            allowed[_from][msg.sender] = _allowance.sub(_value);</span><br><span class="line">        &#125;</span><br><span class="line">        uint sendAmount = _value.sub(fee);</span><br><span class="line">        balances[_from] = balances[_from].sub(_value);</span><br><span class="line">        balances[_to] = balances[_to].add(sendAmount);</span><br><span class="line">        if (fee &gt; 0) &#123;</span><br><span class="line">            balances[owner] = balances[owner].add(fee);</span><br><span class="line">            Transfer(_from, owner, fee);</span><br><span class="line">        &#125;</span><br><span class="line">        Transfer(_from, _to, sendAmount);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev Approve the passed address to spend the specified amount of tokens on behalf of msg.sender.</span><br><span class="line">    * @param _spender The address which will spend the funds.</span><br><span class="line">    * @param _value The amount of tokens to be spent.</span><br><span class="line">    */</span><br><span class="line">    function approve(address _spender, uint _value) public onlyPayloadSize(2 * 32) &#123;</span><br><span class="line"></span><br><span class="line">        // To change the approve amount you first have to reduce the addresses`</span><br><span class="line">        //  allowance to zero by calling `approve(_spender, 0)` if it is not</span><br><span class="line">        //  already 0 to mitigate the race condition described here:</span><br><span class="line">        //  https://github.com/ethereum/EIPs/issues/20#issuecomment-263524729</span><br><span class="line">        require(!((_value != 0) &amp;&amp; (allowed[msg.sender][_spender] != 0)));</span><br><span class="line"></span><br><span class="line">        allowed[msg.sender][_spender] = _value;</span><br><span class="line">        Approval(msg.sender, _spender, _value);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    /**</span><br><span class="line">    * @dev Function to check the amount of tokens than an owner allowed to a spender.</span><br><span class="line">    * @param _owner address The address which owns the funds.</span><br><span class="line">    * @param _spender address The address which will spend the funds.</span><br><span class="line">    * @return A uint specifying the amount of tokens still available for the spender.</span><br><span class="line">    */</span><br><span class="line">    function allowance(address _owner, address _spender) public constant returns (uint remaining) &#123;</span><br><span class="line">        return allowed[_owner][_spender];</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * @title Pausable</span><br><span class="line"> * @dev Base contract which allows children to implement an emergency stop mechanism.</span><br><span class="line"> */</span><br><span class="line">contract Pausable is Ownable &#123;</span><br><span class="line">  event Pause();</span><br><span class="line">  event Unpause();</span><br><span class="line"></span><br><span class="line">  bool public paused = false;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">  /**</span><br><span class="line">   * @dev Modifier to make a function callable only when the contract is not paused.</span><br><span class="line">   */</span><br><span class="line">  modifier whenNotPaused() &#123;</span><br><span class="line">    require(!paused);</span><br><span class="line">    _;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  /**</span><br><span class="line">   * @dev Modifier to make a function callable only when the contract is paused.</span><br><span class="line">   */</span><br><span class="line">  modifier whenPaused() &#123;</span><br><span class="line">    require(paused);</span><br><span class="line">    _;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  /**</span><br><span class="line">   * @dev called by the owner to pause, triggers stopped state</span><br><span class="line">   */</span><br><span class="line">  function pause() onlyOwner whenNotPaused public &#123;</span><br><span class="line">    paused = true;</span><br><span class="line">    Pause();</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  /**</span><br><span class="line">   * @dev called by the owner to unpause, returns to normal state</span><br><span class="line">   */</span><br><span class="line">  function unpause() onlyOwner whenPaused public &#123;</span><br><span class="line">    paused = false;</span><br><span class="line">    Unpause();</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">contract BlackList is Ownable, BasicToken &#123;</span><br><span class="line"></span><br><span class="line">    /////// Getters to allow the same blacklist to be used also by other contracts (including upgraded Tether) ///////</span><br><span class="line">    function getBlackListStatus(address _maker) external constant returns (bool) &#123;</span><br><span class="line">        return isBlackListed[_maker];</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function getOwner() external constant returns (address) &#123;</span><br><span class="line">        return owner;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    mapping (address =&gt; bool) public isBlackListed;</span><br><span class="line"></span><br><span class="line">    function addBlackList (address _evilUser) public onlyOwner &#123;</span><br><span class="line">        isBlackListed[_evilUser] = true;</span><br><span class="line">        AddedBlackList(_evilUser);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function removeBlackList (address _clearedUser) public onlyOwner &#123;</span><br><span class="line">        isBlackListed[_clearedUser] = false;</span><br><span class="line">        RemovedBlackList(_clearedUser);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function destroyBlackFunds (address _blackListedUser) public onlyOwner &#123;</span><br><span class="line">        require(isBlackListed[_blackListedUser]);</span><br><span class="line">        uint dirtyFunds = balanceOf(_blackListedUser);</span><br><span class="line">        balances[_blackListedUser] = 0;</span><br><span class="line">        _totalSupply -= dirtyFunds;</span><br><span class="line">        DestroyedBlackFunds(_blackListedUser, dirtyFunds);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    event DestroyedBlackFunds(address _blackListedUser, uint _balance);</span><br><span class="line"></span><br><span class="line">    event AddedBlackList(address _user);</span><br><span class="line"></span><br><span class="line">    event RemovedBlackList(address _user);</span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">contract UpgradedStandardToken is StandardToken&#123;</span><br><span class="line">    // those methods are called by the legacy contract</span><br><span class="line">    // and they must ensure msg.sender to be the contract address</span><br><span class="line">    function transferByLegacy(address from, address to, uint value) public;</span><br><span class="line">    function transferFromByLegacy(address sender, address from, address spender, uint value) public;</span><br><span class="line">    function approveByLegacy(address from, address spender, uint value) public;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">contract TetherToken is Pausable, StandardToken, BlackList &#123;</span><br><span class="line"></span><br><span class="line">    string public name;</span><br><span class="line">    string public symbol;</span><br><span class="line">    uint public decimals;</span><br><span class="line">    address public upgradedAddress;</span><br><span class="line">    bool public deprecated;</span><br><span class="line"></span><br><span class="line">    //  The contract can be initialized with a number of tokens</span><br><span class="line">    //  All the tokens are deposited to the owner address</span><br><span class="line">    //</span><br><span class="line">    // @param _balance Initial supply of the contract</span><br><span class="line">    // @param _name Token Name</span><br><span class="line">    // @param _symbol Token symbol</span><br><span class="line">    // @param _decimals Token decimals</span><br><span class="line">    function TetherToken(uint _initialSupply, string _name, string _symbol, uint _decimals) public &#123;</span><br><span class="line">        _totalSupply = _initialSupply;</span><br><span class="line">        name = _name;</span><br><span class="line">        symbol = _symbol;</span><br><span class="line">        decimals = _decimals;</span><br><span class="line">        balances[owner] = _initialSupply;</span><br><span class="line">        deprecated = false;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Forward ERC20 methods to upgraded contract if this one is deprecated</span><br><span class="line">    function transfer(address _to, uint _value) public whenNotPaused &#123;</span><br><span class="line">        require(!isBlackListed[msg.sender]);</span><br><span class="line">        if (deprecated) &#123;</span><br><span class="line">            return UpgradedStandardToken(upgradedAddress).transferByLegacy(msg.sender, _to, _value);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            return super.transfer(_to, _value);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Forward ERC20 methods to upgraded contract if this one is deprecated</span><br><span class="line">    function transferFrom(address _from, address _to, uint _value) public whenNotPaused &#123;</span><br><span class="line">        require(!isBlackListed[_from]);</span><br><span class="line">        if (deprecated) &#123;</span><br><span class="line">            return UpgradedStandardToken(upgradedAddress).transferFromByLegacy(msg.sender, _from, _to, _value);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            return super.transferFrom(_from, _to, _value);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Forward ERC20 methods to upgraded contract if this one is deprecated</span><br><span class="line">    function balanceOf(address who) public constant returns (uint) &#123;</span><br><span class="line">        if (deprecated) &#123;</span><br><span class="line">            return UpgradedStandardToken(upgradedAddress).balanceOf(who);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            return super.balanceOf(who);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Forward ERC20 methods to upgraded contract if this one is deprecated</span><br><span class="line">    function approve(address _spender, uint _value) public onlyPayloadSize(2 * 32) &#123;</span><br><span class="line">        if (deprecated) &#123;</span><br><span class="line">            return UpgradedStandardToken(upgradedAddress).approveByLegacy(msg.sender, _spender, _value);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            return super.approve(_spender, _value);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Forward ERC20 methods to upgraded contract if this one is deprecated</span><br><span class="line">    function allowance(address _owner, address _spender) public constant returns (uint remaining) &#123;</span><br><span class="line">        if (deprecated) &#123;</span><br><span class="line">            return StandardToken(upgradedAddress).allowance(_owner, _spender);</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            return super.allowance(_owner, _spender);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // deprecate current contract in favour of a new one</span><br><span class="line">    function deprecate(address _upgradedAddress) public onlyOwner &#123;</span><br><span class="line">        deprecated = true;</span><br><span class="line">        upgradedAddress = _upgradedAddress;</span><br><span class="line">        Deprecate(_upgradedAddress);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // deprecate current contract if favour of a new one</span><br><span class="line">    function totalSupply() public constant returns (uint) &#123;</span><br><span class="line">        if (deprecated) &#123;</span><br><span class="line">            return StandardToken(upgradedAddress).totalSupply();</span><br><span class="line">        &#125; else &#123;</span><br><span class="line">            return _totalSupply;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Issue a new amount of tokens</span><br><span class="line">    // these tokens are deposited into the owner address</span><br><span class="line">    //</span><br><span class="line">    // @param _amount Number of tokens to be issued</span><br><span class="line">    function issue(uint amount) public onlyOwner &#123;</span><br><span class="line">        require(_totalSupply + amount &gt; _totalSupply);</span><br><span class="line">        require(balances[owner] + amount &gt; balances[owner]);</span><br><span class="line"></span><br><span class="line">        balances[owner] += amount;</span><br><span class="line">        _totalSupply += amount;</span><br><span class="line">        Issue(amount);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Redeem tokens.</span><br><span class="line">    // These tokens are withdrawn from the owner address</span><br><span class="line">    // if the balance must be enough to cover the redeem</span><br><span class="line">    // or the call will fail.</span><br><span class="line">    // @param _amount Number of tokens to be issued</span><br><span class="line">    function redeem(uint amount) public onlyOwner &#123;</span><br><span class="line">        require(_totalSupply &gt;= amount);</span><br><span class="line">        require(balances[owner] &gt;= amount);</span><br><span class="line"></span><br><span class="line">        _totalSupply -= amount;</span><br><span class="line">        balances[owner] -= amount;</span><br><span class="line">        Redeem(amount);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    function setParams(uint newBasisPoints, uint newMaxFee) public onlyOwner &#123;</span><br><span class="line">        // Ensure transparency by hardcoding limit beyond which fees can never be added</span><br><span class="line">        require(newBasisPoints &lt; 20);</span><br><span class="line">        require(newMaxFee &lt; 50);</span><br><span class="line"></span><br><span class="line">        basisPointsRate = newBasisPoints;</span><br><span class="line">        maximumFee = newMaxFee.mul(10**decimals);</span><br><span class="line"></span><br><span class="line">        Params(basisPointsRate, maximumFee);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    // Called when new token are issued</span><br><span class="line">    event Issue(uint amount);</span><br><span class="line"></span><br><span class="line">    // Called when tokens are redeemed</span><br><span class="line">    event Redeem(uint amount);</span><br><span class="line"></span><br><span class="line">    // Called when contract is deprecated</span><br><span class="line">    event Deprecate(address newAddress);</span><br><span class="line"></span><br><span class="line">    // Called if contract ever adds fees</span><br><span class="line">    event Params(uint feeBasisPoints, uint maxFee);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="可升级合约"><a href="#可升级合约" class="headerlink" title="可升级合约"></a>可升级合约</h3><p>看到代币的主合约 TetherToken 中，在 Storage 中定义了两个比较特殊的变量：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">bool public deprecated;</span><br><span class="line">address public upgradedAddress;</span><br></pre></td></tr></table></figure><p>在看到转账函数，有一个判断 deprecated 的逻辑：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">if (deprecated) &#123;</span><br><span class="line">    return UpgradedStandardToken(upgradedAddress).transferByLegacy(msg.sender, _to, _value);</span><br><span class="line">&#125; else &#123;</span><br><span class="line">    return super.transfer(_to, _value);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这是一个开关式的代理合约设计，在默认状态下，合约代理是关闭的，调用转账函数时会通过 super 调用到 BasicToken 中的转账函数，当有需求时，可以调用 ABI <code>function deprecate(address _upgradedAddress) public onlyOwner</code>启用代理合约并设置底层合约地址。</p><p>为什么要这样设计呢？其实本质就在于区块链的不可篡改性，合约被部署后时被打包到区块中的，无法发布更新，但是 storage 中的数据是可以通过调用合约修改的，使用这种方式可以做到入口合约地址不变，更新代币合约逻辑。</p><h3 id="授权转账（区块链特有的转账方式）"><a href="#授权转账（区块链特有的转账方式）" class="headerlink" title="授权转账（区块链特有的转账方式）"></a>授权转账（区块链特有的转账方式）</h3><p>授权转账和普通转账不同。普通转账，用户只能通过自己的账户向他人进行转账，而授权转账时，只要被授权用户获得他人钱包的授权，用户就可以直接通过授权转账的方式转出他人资产。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line">mapping (address =&gt; mapping (address =&gt; uint)) public allowed;</span><br><span class="line"></span><br><span class="line">function approve(address _spender, uint _value) public onlyPayloadSize(2 * 32) &#123;</span><br><span class="line">    // To change the approve amount you first have to reduce the addresses`</span><br><span class="line">    //  allowance to zero by calling `approve(_spender, 0)` if it is not</span><br><span class="line">    //  already 0 to mitigate the race condition described here:</span><br><span class="line">    //  https://github.com/ethereum/EIPs/issues/20#issuecomment-263524729</span><br><span class="line">    require(!((_value != 0) &amp;&amp; (allowed[msg.sender][_spender] != 0)));</span><br><span class="line"></span><br><span class="line">    allowed[msg.sender][_spender] = _value;</span><br><span class="line">    Approval(msg.sender, _spender, _value);</span><br><span class="line">&#125;</span><br><span class="line">function allowance(address _owner, address _spender) public constant returns (uint remaining) &#123;</span><br><span class="line">    return allowed[_owner][_spender];</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">function transferFrom(address _from, address _to, uint _value) public onlyPayloadSize(3 * 32) &#123;</span><br><span class="line">    var _allowance = allowed[_from][msg.sender];</span><br><span class="line"></span><br><span class="line">    // Check is not needed because sub(_allowance, _value) will already throw if this condition is not met</span><br><span class="line">    // if (_value &gt; _allowance) throw;</span><br><span class="line"></span><br><span class="line">    uint fee = (_value.mul(basisPointsRate)).div(10000);</span><br><span class="line">    if (fee &gt; maximumFee) &#123;</span><br><span class="line">        fee = maximumFee;</span><br><span class="line">    &#125;</span><br><span class="line">    if (_allowance &lt; MAX_UINT) &#123;</span><br><span class="line">        allowed[_from][msg.sender] = _allowance.sub(_value);</span><br><span class="line">    &#125;</span><br><span class="line">    uint sendAmount = _value.sub(fee);</span><br><span class="line">    balances[_from] = balances[_from].sub(_value);</span><br><span class="line">    balances[_to] = balances[_to].add(sendAmount);</span><br><span class="line">    if (fee &gt; 0) &#123;</span><br><span class="line">        balances[owner] = balances[owner].add(fee);</span><br><span class="line">        Transfer(_from, owner, fee);</span><br><span class="line">    &#125;</span><br><span class="line">    Transfer(_from, _to, sendAmount);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>用户的授权信息时被存到 allowed 映射中，key 是被授权用户的地址，value 是一个映射，key 是授权用户的地址，value 是授权的金额。钱包主人使用<code>function approve(address _spender, uint _value) public onlyPayloadSize(2 * 32)</code>授权给 _spender 地址转账 _value 金额的能力，授权数据被存储在 allowed 映射中。</p><p>可以看到上面的<code>function transferFrom(address _from, address _to, uint _value) public onlyPayloadSize(3 * 32)</code>中，转账会先获取调用者（msg.sender）针对 _from 地址的授权金额 _allowance，然后判断是否足够转账，不足则抛出异常。</p><h3 id="Ownable"><a href="#Ownable" class="headerlink" title="Ownable"></a>Ownable</h3><p>Ownable 是合约中重要的管理合约。当合约部署时，Ownable 合约将会把合约部署者的地址存放到 owner 变量中。Ownable 合约中的 onlyOwner()，相当于一个权限校验中间件，在部分 ABI 调用前会先运行以校验用户的操作是否合法。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">modifier onlyOwner() &#123;</span><br><span class="line">    require(msg.sender == owner);</span><br><span class="line">    _;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;加密货币和稳定币&quot;&gt;&lt;a href=&quot;#加密货币和稳定币&quot; class=&quot;headerlink&quot; title=&quot;加密货币和稳定币&quot;&gt;&lt;/a&gt;加密货币和稳定币&lt;/h1&gt;&lt;p&gt;加密货币（Cryptocurrency）又经常被称为加密资产（Crypto Asset），是</summary>
      
    
    
    
    <category term="Web3.0" scheme="http://www.huckops.xyz/categories/Web3-0/"/>
    
    
    <category term="Web3.0" scheme="http://www.huckops.xyz/tags/Web3-0/"/>
    
    <category term="Ethereum" scheme="http://www.huckops.xyz/tags/Ethereum/"/>
    
    <category term="ERC20" scheme="http://www.huckops.xyz/tags/ERC20/"/>
    
  </entry>
  
  <entry>
    <title>区块链和智能合约工作原理</title>
    <link href="http://www.huckops.xyz/2025/08/02/Web3.0/%E5%8C%BA%E5%9D%97%E9%93%BE%E5%92%8C%E6%99%BA%E8%83%BD%E5%90%88%E7%BA%A6%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86/"/>
    <id>http://www.huckops.xyz/2025/08/02/Web3.0/%E5%8C%BA%E5%9D%97%E9%93%BE%E5%92%8C%E6%99%BA%E8%83%BD%E5%90%88%E7%BA%A6%E5%B7%A5%E4%BD%9C%E5%8E%9F%E7%90%86/</id>
    <published>2025-08-02T20:39:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<p>最近经常和一些朋友聊到区块链相关内容，很多人对区块链的理解只停留在数字货币，对整个区块链及其相关衍生物的了解程度都比较低，我来整理一下最基本的区块链工作原理及知识，以做科普。</p><h1 id="区块链？"><a href="#区块链？" class="headerlink" title="区块链？"></a>区块链？</h1><p>区块链概念最早可以追溯到中本聪时间。2008年中本聪提出了加密数字货币和相关加密算法的概念，从这个时间起，区块链就逐渐的开始在互联网上生根发芽。最早期大部分人都认为比特币是一个骗局，因为这种货币既没有实体，也没有实际价值（注意区分价值和价格，最早期价格也许有，但是价值是确实没有的），所以在最早期持有持有比特币的那一群人在领到mint后的不久都分分出售了手中的BTC，所以现在币圈很难见到手上有上千个BTC的投资者，如果从2012年持续持有BTC的话，现在的收益已经会有几千倍了，从BTC的市值就不难看出，web3在近些年的发展有多迅速了。</p><p>那么，现在有一个现实的问题，区块链就是加密货币吗？加密货币是否就是区块链的全部？</p><h1 id="区块链原理（基于Ethereum-主网）"><a href="#区块链原理（基于Ethereum-主网）" class="headerlink" title="区块链原理（基于Ethereum 主网）"></a>区块链原理（基于Ethereum 主网）</h1><p>先从区块链这三个字进行解读，这个词可以拆分为区块和链两部分，区块顾名思义，就是一个一个的数据块，链就是一个链条，组合起来就可以把区块链解释为数据块组成的一个数据链条。我们可以用程序语言来理解这个数据结构就很容易被接受了：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> Block <span class="keyword">struct</span> &#123;</span><br><span class="line">    Header       *Header         <span class="comment">// 区块头（核心元数据）</span></span><br><span class="line">    Transactions []*Transaction  <span class="comment">// 交易列表</span></span><br><span class="line">    Uncles       []*Header       <span class="comment">// 叔块头（以太坊特有）</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Header <span class="keyword">struct</span> &#123;</span><br><span class="line">    ParentHash  common.Hash    <span class="comment">// 前一个区块哈希（32字节）</span></span><br><span class="line">    UncleHash   common.Hash    <span class="comment">// 叔块哈希的Merkle根</span></span><br><span class="line">    Coinbase    common.Address <span class="comment">// 矿工地址</span></span><br><span class="line">    Root        common.Hash    <span class="comment">// 状态树根哈希（关键！）</span></span><br><span class="line">    TxHash      common.Hash    <span class="comment">// 交易树的Merkle根</span></span><br><span class="line">    ReceiptHash common.Hash    <span class="comment">// 交易收据树的Merkle根</span></span><br><span class="line">    Bloom       Bloom          <span class="comment">// 日志过滤器</span></span><br><span class="line">    Difficulty  *big.Int       <span class="comment">// 当前区块难度</span></span><br><span class="line">    Number      *big.Int       <span class="comment">// 区块高度（BlockID的升级版）</span></span><br><span class="line">    GasLimit    <span class="type">uint64</span>         <span class="comment">// 区块Gas上限</span></span><br><span class="line">    GasUsed     <span class="type">uint64</span>         <span class="comment">// 本区块消耗的总Gas</span></span><br><span class="line">    Time        <span class="type">uint64</span>         <span class="comment">// 区块时间戳</span></span><br><span class="line">    Extra       []<span class="type">byte</span>         <span class="comment">// 额外数据</span></span><br><span class="line">    MixDigest   common.Hash    <span class="comment">// 随机数混合值</span></span><br><span class="line">    Nonce       BlockNonce     <span class="comment">// 工作量证明随机数（8字节）</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这是一个简单的区块数据结构，可以看到每个区块中都会包含上一个区块的哈希，也就是只有上一个区块数据被打包固定后才会生成的一个唯一Hash，然后被记录到新的区块中，这个反向链表有以下一些优势：</p><ol><li><p>整个链仅可以向后追加，且全链都不可以更改。因为任意节点的数据发生变化后，其区块Hash一定会变，那么后续所有区块中的<code>ParentHash</code>也都会发生变化，相当于来说从这个节点以后的所有数据都成了不可信或者废数据。</p></li><li><p>数据完整性验证优势。当一个节点同步区块链数据时，可以用下一个区块中存储的上一个区块Hash来校验上一个区块的数据是否合法。</p></li><li><p>可以通过一个区块高度查到所有这个区块前的数据。推广一下这个特性，也就是如果在某一个区块之后网络发生了不可逆的问题，也可以回滚到某一个时间点的数据以及所有交易记录（以太坊主网在 2016 年 DAO 攻击事件后进行了硬分叉，回滚了攻击发生后的状态，形成了今天的 Ethereum（ETH）与 Ethereum Classic（ETC）两条链。）</p></li></ol><p>其实从上面说的这些特点可以看出，区块链是一个主要用来存数据的东西，所有历史数据都是不可被篡改的，这其实就是区块链最大的价值。那么既然区块链要存大量的数据，这些数据要怎么存呢？数据优势怎么正确的被存起来的？这个就要从区块链网络结构来切入都存储原理了。</p><h1 id="区块链结构（Ethereum-主网）"><a href="#区块链结构（Ethereum-主网）" class="headerlink" title="区块链结构（Ethereum 主网）"></a>区块链结构（Ethereum 主网）</h1><p>Ethereum 主网网络的结构可以大概归纳为这样：</p><p><img src="https://i.mji.rip/2025/08/02/7a58187e3fb87a6c466b7b49fd8a2f3f.png" alt="7a58187e3fb87a6c466b7b49fd8a2f3f.png"></p><p>看这个结构中有很多种类的节点，理解起来比较困难，那么我从完成一笔交易来解释整个网络工作的机制。</p><p><img src="https://i.mji.rip/2025/08/02/e53f0d29620beefd700a5ae610e5b0a1.png" alt="e53f0d29620beefd700a5ae610e5b0a1.png"></p><p>比如，我的钱包发起 USDT 转账，本质是调用 USDT 合约的 transfer 函数。钱包构造交易并签名后，通过连接的 RPC 节点将交易广播到网络中的内存池。验证者从内存池中选取交易，在以太坊虚拟机（EVM）中执行合约调用，计算状态变更，并将交易打包进候选区块。随后，验证者通过权益证明（PoS）共识机制达成共识，确认该区块的有效性。所有全节点再执行区块内交易以验证一致性，最终将该区块写入链上，更新 USDT 合约的账户余额状态。</p><h1 id="共识机制"><a href="#共识机制" class="headerlink" title="共识机制"></a>共识机制</h1><p>早期以太坊采用 PoW（工作量证明）共识机制，出块权由矿工的算力决定，算力越高的矿工越有可能优先挖出新区块。当矿工节点接收到新的交易后，会将其加入内存池，并开始对新区块进行哈希计算，试图生成一个满足全网难度目标的区块头。一旦某个矿工率先算出合法区块，会将其广播至全网，由其他全节点进行验证。只要该区块中的交易合法，且哈希值满足难度要求，该区块就会被接受并添加至主链。网络会向成功出块的矿工发放奖励，包括交易中的 Gas 费用和系统的区块奖励。</p><p>当前 PoW 共识机制已被淘汰，以太坊主网已全面转向 PoS（权益证明）机制。在 PoS 模式下，出块者由原先的矿工变为验证者，系统会根据质押的 ETH 数量进行加权随机选择，决定哪个验证者负责出块。区块生成后，其他验证者将对其进行验证，一旦共识达成，该区块就会被正式添加到主链中。出块验证者可以获得交易手续费（Gas）以及可能的 MEV 奖励。相比 PoW 模式，PoS 不再依赖高强度的哈希运算，验证节点的硬件要求和能耗显著降低，出块能力与其质押的 ETH 数量正相关。</p><h1 id="数字货币金融属性的诞生"><a href="#数字货币金融属性的诞生" class="headerlink" title="数字货币金融属性的诞生"></a>数字货币金融属性的诞生</h1><p>从上边的整个原理可以看到，无论是 PoW 还是 PoS 共识机制，出块行为都伴随着实际的资源消耗。PoW 依赖矿工的算力竞争，需要大量电力和硬件设备；而 PoS 虽然计算强度较低，但仍需质押资产、维持节点在线等资源投入，因此都具有一定的运行成本。</p><p>当用户在链上发起交易时，需要支付 Gas 费用（以太坊网络中以 ETH 计价），这笔费用最终由成功出块的矿工或验证者获得，以激励其参与网络安全维护。随后，这些获得的代币往往会被出售或兑换成其他资产，从而流入市场，进一步推动了整个数字货币交易生态的形成。</p><p>从这个角度来看，数字货币本质上是为支撑区块链运行而设计的一种激励机制，是区块链技术体系中的一种经济衍生物，而不是区块链的全部。</p><h1 id="杞人忧天"><a href="#杞人忧天" class="headerlink" title="杞人忧天"></a>杞人忧天</h1><p>当前以太坊已全面转向 PoS（权益证明）机制。在这种模式下，验证者被选中参与出块的概率与其质押的 ETH 数量成正比，质押越多，被选为出块者的机会也就越高。这种机制虽然提升了网络能耗效率，但也引发一些安全担忧，例如：某个质押量极高的验证者是否有可能通过作恶来干扰网络运行？</p><p>在PoS机制下，一旦验证者存在恶意行为，如双签（double signing）、发布无效区块、长时间离线等，网络将对其进行惩罚（Slashing），从其质押的 ETH 中扣除一定比例作为惩罚，情节严重时甚至会完全没收质押并将其剔除出验证者集合。被剔除后，该节点将不再具备出块权，也无法继续打包交易，对主网的影响也就自然终止。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;最近经常和一些朋友聊到区块链相关内容，很多人对区块链的理解只停留在数字货币，对整个区块链及其相关衍生物的了解程度都比较低，我来整理一下最基本的区块链工作原理及知识，以做科普。&lt;/p&gt;
&lt;h1 id=&quot;区块链？&quot;&gt;&lt;a href=&quot;#区块链？&quot; class=&quot;headerli</summary>
      
    
    
    
    <category term="Web3.0" scheme="http://www.huckops.xyz/categories/Web3-0/"/>
    
    
    <category term="Web3.0" scheme="http://www.huckops.xyz/tags/Web3-0/"/>
    
    <category term="Ethereum" scheme="http://www.huckops.xyz/tags/Ethereum/"/>
    
    <category term="ERC20" scheme="http://www.huckops.xyz/tags/ERC20/"/>
    
  </entry>
  
  <entry>
    <title>docker网络转发路由模型</title>
    <link href="http://www.huckops.xyz/2025/02/06/container/docker%E7%BD%91%E7%BB%9C%E8%BD%AC%E5%8F%91%E8%B7%AF%E7%94%B1%E6%A8%A1%E5%9E%8B/"/>
    <id>http://www.huckops.xyz/2025/02/06/container/docker%E7%BD%91%E7%BB%9C%E8%BD%AC%E5%8F%91%E8%B7%AF%E7%94%B1%E6%A8%A1%E5%9E%8B/</id>
    <published>2025-02-06T00:42:01.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="前叙"><a href="#前叙" class="headerlink" title="前叙"></a>前叙</h1><p>docker是目前容器服务最常用的containerd上层构建，其docker工具链调用containerd底层实现容器调度和管理。在日常的使用环境中，docker常用四种网络模式：</p><p>1、br模式：通常dockerd在拉起时，会在linux系统中生成一个叫docker0的虚拟网卡，这个网卡的ip通常是172.19.0.x，这个网卡不被绑定到任何物理网卡上，所以也就不会进入物理冲突域。在使用br模式时，生成的容器和docker0网卡使用同段ip，以docker0作为网关，流量经过iptables后转发进入外网，或者说流量从物理网卡进入后流量经过iptables转发到docker0后转发到对应目的地址。也就是在默认情况下，出向和入向流量链路大体是对称的。</p><p>2、Host模式：容器直接复用宿主机网络，不同容器使用的是同一个ns，也就是不同的容器必须使用不通的端口，否则会冲突</p><p>3、None模式：无网络模式，这个直接就是字面意思，没有分配ip，也没有网络，用的地方比较少</p><p>4、overlay模式：跨机通信会经常用到，主要用VXLAN实现</p><p>这里还经常会用到一种特殊的网络模式：</p><p>5、MACVLAN: 可以理解为把容器虚拟成一个真实的设备，克隆一个虚拟的MAC直接暴露到冲突域内，容器的IP和宿主的IP可以是同段的（也可以不是同段的，这个主要看配置)，网络直接绑定到某一个物理网络设备上</p><p>其实在生产环境中，最常用的模式是br模式，也是docker默认的网络模式，其他模式一般都是在特殊场景下使用的，这篇文章来展开讨论br模式的流量转发链路。</p><h1 id="iptables结构"><a href="#iptables结构" class="headerlink" title="iptables结构"></a>iptables结构</h1><p>上边说道使用br模式时，docker0到物理网卡这一段流量是要通过iptables进行转发的，我们首先要研究一下iptables的结构：</p><p><img src="https://s3.huckops.xyz/1780765128412.png" alt="1780765128412.png"></p><p>在传统网络模式中，上图的网络A和网络B都是指同一个网络，也就是外网。外部流量流入网卡后，流量会先进入PREROUTING链进行路由预处理，处理完成后，传统模式中流量会被路由到INPUT链中，这条链也是最常被拿来做入站向ACL的链。随后流量会被转发到内核中，进行内核态和用户态的交互（也就是图上说的应用进程），完成后流量会被转发到OUTPUT链上，这条链是最常被用作出站向ACL的链，处理完成后直接进入POSTROUTING链转发到网络中。这就是一个完整的传统网络模式流量转发链路。</p><p>针对于docker的br网络，可以理解为流量入口和容器不在同一个网段，流向想要从宿主到容器上就需要进行NAT，所以docker的流量在从网卡进入后，会直接被NAT到指定的容器，出向流量yes被SNAT到对应的物理出口，这是一个比较复杂的链路，下面我们对docker的网络进行一下拆解。</p><h1 id="容器联网链路"><a href="#容器联网链路" class="headerlink" title="容器联网链路"></a>容器联网链路</h1><p>容器联网链路也就是出站向链路。在默认运行docker容器时，容器能通外网差不多是一个最基本的要求了（也有部分服务是禁止外网的，这个后边会详细说怎么实现）。在这里容器可以被理解是一个宿主上的虚拟机，出站向流量需要被SNAT出去，这个规则我们可以看到iptables中：</p><p><img src="https://s3.huckops.xyz/1780765164169.png" alt="1780765164169.png"></p><p>其实这里主要关注POSTROUTING被标注出来的这部分就好了，这条规则的意思是，把容器网络出容器网卡（即物理网卡）进行源地址转换，也就是进行了SNAT，远端回包的时候会经过PREROUTING被转发回容器中。这是br模式下最简单的网络链路。</p><h1 id="容器端口映射"><a href="#容器端口映射" class="headerlink" title="容器端口映射"></a>容器端口映射</h1><p>默认运行容器时，容器端口只会绑定到对应容器ip上，也就是只会绑定到对应ns上，如果用户想从外部访问容器服务时一定会使用到端口映射，把docker的容器端口映射到物理网卡上。</p><p>如我拉起一个nginx服务，把容器的80端口映射到宿主的80上:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">docker run -itd -p 80:80 nginx</span><br></pre></td></tr></table></figure><p>上文也说到了，容器可以近似看做宿主机上的一个虚拟机，流量想从宿主转发到对应虚拟机就需要进行DNAT转换，于是我们观察<code>nat</code>表就会发现以下一条规则:</p><p><img src="https://s3.huckops.xyz/1780765189236.png" alt="1780765189236.png"></p><p>流量进入到PREROUTING的nat表后，流量被DNAT转发到容器对应端口上，可以对照上面的结构图，流量被转发后，会进入路由选择：</p><p><img src="https://s3.huckops.xyz/1780765257373.png" alt="1780765257373.png"></p><p>匹配到路由后，流量会进入FORWARD链。</p><p><strong>注意：因为路由选择匹配到了转发路由，所以流量会进入FORWARD链，而不是进入INPUT链，所以如果要给容器的映射端口添加ACL一定不是在INPUT中</strong></p><p>观察一下FORWARD链，会发现filter的这个链真不是一般的复杂：</p><p><img src="https://s3.huckops.xyz/1780765296851.png" alt="1780765296851.png"></p><p>下面逐一解释一下这些规则到底都有什么用：</p><p>首先是第一条DOCKER-USER链，这里面主要是一些用户定义的DOCKER网络策略，一般默认情况下这条链只有一条全放通规则，后面定制转发规则时可能会用到，这里先不做过多解释。</p><p>其次是DOCKER-ISOLATION-STAGE-1这条链，可以看到这条链还引用了DOCKER-ISOLATION-STAGE-2，这是docker在使用多br卡时进行网络隔离用的，比如我添加了一个br网络后的规则是这样的：</p><p><img src="https://s3.huckops.xyz/1780765338555.png" alt="1780765338555.png"></p><p>我用br-c181e75dbe81这个网卡进行一下举例分析。当容器流量从br-c181e75dbe81发出进入非br-c181e75dbe81网卡时，进入DOCKER-ISOLATION-STAGE-2进行二阶段处理，这里知名了流量禁止跨网卡访问。</p><p>第三条是定义了从docker0网卡出来的流量可以被建立连接，这个一般不要动，保持就好。</p><p>第四条是DOCKER转发ACL，比如上图就是定义了流量从docker0网卡转发出去后允许转发到容器的80端口（所以docker的ACL一般是要放到这个链生效前生效前的，黑名单一定要在这个链之前）。</p><p>第五和第六条就是一些docker网络基础访问权限控制了，主要是放通了br网卡出向和同卡访问的权限。</p><h1 id="基于容器的ACL"><a href="#基于容器的ACL" class="headerlink" title="基于容器的ACL"></a>基于容器的ACL</h1><p>经过上面的分析，不难发现，如果要做docker容器ACL的话，要放到filter的FORWAED链中配置的，上面也说过，docker管已经专门为用户创建了一个DOCKER-USER链，所以用户可以直接把ACL放入上述链中。</p><p>从上文的梳理中可以发现，使用docker端口映射，端口即为全网段放通，所以，我们要在这里以白名单的方式添加规则。</p><p>比如，我们创建了一个映射80端口的容器，然后创建一下一些规则：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">iptables -N T80</span><br><span class="line">iptables -I DOCKER-USER -p tcp --dport 80 -j T80</span><br><span class="line">iptables -I T80 -j DROP</span><br></pre></td></tr></table></figure><p>然后用外部访问宿主机的80端口，会发现访问已经不通了：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">➜ ~ curl 198.19.249.82</span><br><span class="line">curl: (28) Failed to connect to 198.19.249.82 port 80 after 75006 ms: Couldn&#x27;t connect to server</span><br></pre></td></tr></table></figure><p>尝试添加白名单放通宿主同段网络：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">iptables -I T80 --src 198.19.249.0/24 -j RETURN</span><br></pre></td></tr></table></figure><p>再尝试访问目的地址，发现198.19.249.0/24 网段访问可以通了，其他网段仍然访问不通。</p><p><strong>注意：上面添加的规则，一定要保证在FORWARD链的第一条，否则端口可能会被直接全部放通</strong></p><h1 id="特殊场景下的一些奇怪ACL"><a href="#特殊场景下的一些奇怪ACL" class="headerlink" title="特殊场景下的一些奇怪ACL"></a>特殊场景下的一些奇怪ACL</h1><p>博主从事云游戏运维，目前云游戏常用方案为docker运行android容器，用户直接连入容器使用，所以就会存在内网安全问题，用户可以从容器里直接访问到到内网，这个是很危险的，所以我们要添加一条ACL禁止用户访问到IDC内网：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">iptables -I FORWARD -i docker0 --dest 10.0.0.0/8 -j DROP</span><br></pre></td></tr></table></figure><p>可以发现从容器内ping内网不通。这是因为容器的流量会进入docker0网卡后经过FORWARD链后进入POSTROUTING链，所以我们只需要给docker0添加一个FORWARD规则就可以直接截断流量了。</p><p><strong>开端不规范，亲人两行泪。用docker拉起服务的时候，一定要注意管理好服务端口权限，否则数据泄露或者被注入就完蛋了。</strong></p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;前叙&quot;&gt;&lt;a href=&quot;#前叙&quot; class=&quot;headerlink&quot; title=&quot;前叙&quot;&gt;&lt;/a&gt;前叙&lt;/h1&gt;&lt;p&gt;docker是目前容器服务最常用的containerd上层构建，其docker工具链调用containerd底层实现容器调度和管理。在日常的</summary>
      
    
    
    
    <category term="云计算" scheme="http://www.huckops.xyz/categories/%E4%BA%91%E8%AE%A1%E7%AE%97/"/>
    
    
    <category term="容器技术" scheme="http://www.huckops.xyz/tags/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
  </entry>
  
  <entry>
    <title>supervisor子进程oom导致supervisord进程退出问题排查</title>
    <link href="http://www.huckops.xyz/2024/06/04/%E7%96%91%E9%9A%BE%E6%9D%82%E7%97%87/supervisor%E5%AD%90%E8%BF%9B%E7%A8%8Boom%E5%AF%BC%E8%87%B4supervisor%E9%80%80%E5%87%BA/"/>
    <id>http://www.huckops.xyz/2024/06/04/%E7%96%91%E9%9A%BE%E6%9D%82%E7%97%87/supervisor%E5%AD%90%E8%BF%9B%E7%A8%8Boom%E5%AF%BC%E8%87%B4supervisor%E9%80%80%E5%87%BA/</id>
    <published>2024-06-04T21:53:34.000Z</published>
    <updated>2026-06-06T17:04:29.931Z</updated>
    
    <content type="html"><![CDATA[<h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>这个问题出自一个线上故障。uwsgi进程查询数据库内容过大导致进程oom，同时supervisord进程接收到一个退出信号后进行优雅退出。因为supervisord进程退出所以uwsgi进程没有被重新拉起导致业务故障。</p><h1 id="架构说明"><a href="#架构说明" class="headerlink" title="架构说明"></a>架构说明</h1><p><img src="https://s3.huckops.xyz/1780764958822.png" alt="1780764958822.png"></p><p>本次故障主要涉及项目中心服，中心服用到的技术栈为python3.11+uwsgi+mysql+mongo+redis，中心服由两个进程构成。uwsgi作为中心服的api接口，也是用户访问的主业务入口，push服务是一个异步推送服务，主要为用户推送信令以及处理一些异步指令。这两个进程都由supervisor进程管理，supervisor以apt方式进行安装。</p><h1 id="现场描述"><a href="#现场描述" class="headerlink" title="现场描述"></a>现场描述</h1><p>uwsgi服务发布了新的版本，新增了评论检测功能（6月4号，历史原因，说不得）导致mongodb查询量剧增，查询结果超级大，之后uwsgi服务和push服务以及supervisor服务全部异常退出（不难想到是oom导致的进程被kill），nginx仍可正常进行服务，但用户访问报错500，errorlog报找不到uwsgi.sock(nginx反向代理到uwsgi的socket文件)，服务不可用。</p><h1 id="故障恢复"><a href="#故障恢复" class="headerlink" title="故障恢复"></a>故障恢复</h1><p>hotfix线上查询问题并发新版，重新拉起supervisor发现uwsgi和push服务都能正常被拉起，故障恢复。</p><h1 id="故障复盘"><a href="#故障复盘" class="headerlink" title="故障复盘"></a>故障复盘</h1><p>因为uwsgi新版本的原因导致mongodb查询量剧增，且返回的查询内容非常大，导致mongodb的shard和uwsgi双双因为oom异常退出，但是从既往经验来看，有几个比较大的疑点：</p><ol><li>uwsgi发生oom，supervisor按道理应该不会退出，而是supervisor重新拉起uwsgi进程才对，为什么这里退出了？</li><li>既然进程发生oom了，那么进程一定会被杀死，但是杀死的机制是什么，为什么连supervisor都会被杀死？难道是乱杀？那为什么所有机器的supervisor进程都是退出的？</li><li>uwsgi作为supervisor的子进程，子进程oom，到底是谁给的kill信号？内核？其他组件？会不会系统把子进程和父进程当同类一起杀掉？</li></ol><h1 id="故障排查"><a href="#故障排查" class="headerlink" title="故障排查"></a>故障排查</h1><p>发生oom问题，最直接的排查方法就是检查内核日志，但是我们本次排查的主要目的是排斥supervisor也一起异常退出了。<br>通关检查内核日志发现以下一些相关的日志：</p><blockquote><p>以下日志脱敏处理，去掉了前两列，时间都是Jun  3 20:08:04</p></blockquote><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">kernel: [18504827.972952] Call Trace:</span><br><span class="line">kernel: [18504827.972964]  dump_stack+0x6b/0x83</span><br><span class="line">kernel: [18504827.972968]  dump_header+0x4a/0x1f4</span><br><span class="line">kernel: [18504827.972971]  oom_kill_process.cold+0xb/0x10</span><br><span class="line">kernel: [18504827.972978]  out_of_memory+0x1bd/0x4e0</span><br><span class="line">kernel: [18504827.972982]  __alloc_pages_slowpath.constprop.0+0xbcc/0xc90</span><br><span class="line">kernel: [18504827.972985]  __alloc_pages_nodemask+0x2de/0x310</span><br><span class="line">kernel: [18504827.972989]  pagecache_get_page+0x175/0x390</span><br><span class="line">kernel: [18504827.972991]  filemap_fault+0x6a2/0x900</span><br><span class="line">kernel: [18504827.973019]  ext4_filemap_fault+0x2d/0x50 [ext4]</span><br><span class="line">kernel: [18504827.973022]  __do_fault+0x34/0x170</span><br><span class="line">kernel: [18504827.973024]  handle_mm_fault+0x124d/0x1c00</span><br><span class="line">kernel: [18504827.973029]  do_user_addr_fault+0x1b8/0x400</span><br><span class="line">kernel: [18504827.973032]  exc_page_fault+0x78/0x160</span><br><span class="line">kernel: [18504827.973037]  ? asm_exc_page_fault+0x8/0x30</span><br><span class="line">kernel: [18504827.973038]  asm_exc_page_fault+0x1e/0x30</span><br><span class="line">kernel: [18504827.973041] RIP: 0033:0x560db0e3f8d1</span><br><span class="line">...</span><br><span class="line">kernel: [18504827.973163] Swap cache stats: add 0, delete 0, find 0/0</span><br><span class="line">kernel: [18504827.973164] Free swap  = 0kB</span><br><span class="line">kernel: [18504827.973165] Total swap = 0kB</span><br><span class="line">kernel: [18504827.973166] 33458335 pages RAM</span><br><span class="line">kernel: [18504827.973167] 0 pages HighMem/MovableOnly</span><br><span class="line">kernel: [18504827.973167] 553944 pages reserved</span><br><span class="line">kernel: [18504827.973168] 0 pages hwpoisoned</span><br><span class="line">...</span><br><span class="line">kernel: [18504827.973685] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/supervisor.service,task=uwsgi,pid=3861657,uid=52160</span><br><span class="line">kernel: [18504827.973711] Out of memory: Killed process 3861657 (uwsgi) total-vm:49128384kB, anon-rss:15311648kB, file-rss:0kB, shmem-rss:58156kB, UID:52160 pgtables:31148kB oom_score_adj:0</span><br></pre></td></tr></table></figure><p>从内核日志能看到uwsgi因为oom问题被内核kill掉，这个是符合预期的，但是没见到supervisor的退出日志，也就是说supervisor不是由内核kill掉的，所以kernel日志并不能排查出具体原因。</p><p>我们继续检查supervisor日志，看到这些：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">2024-06-03 20:08:06,001 WARN received SIGTERM indicating exit request</span><br><span class="line">...</span><br><span class="line">2024-06-03 20:08:18,536 INFO waiting for cloud-game-push_04, cloud-game-push_05, cloud-game-push_06, cloud-game-push_07, cloud-game-push_00, cloud-game-push_01, cloud-game-push_02, cloud-game-push_03, cloud-game-push_08, cloud-game-push_09, cloud-game-logic-worker-2, cloud-game-logic-worker-5, cloud-game-logic-worker-10, cloud-game-logic-worker-16, cloud-game-logic-worker-17, cloud-game-logic-worker-14, cloud-game-logic-worker-15 to die</span><br><span class="line">...</span><br><span class="line">2024-06-03 20:08:38,745 WARN stopped: cloud-game-push_00 (terminated by SIGINT)</span><br><span class="line">...</span><br></pre></td></tr></table></figure><p>从日志可以看到，在uwsgi服务oom后的两秒supervisor收到了一个SIGTERM信号，supervisor服务开始进入退出流程。</p><p>奇怪了，如果是因为oom发生的kill理论上来说是由内核发出信号，应该内核会有日志记录，但是又没查到内核日志里有kill信号发出，查询陷入僵局。</p><h1 id="故障复现"><a href="#故障复现" class="headerlink" title="故障复现"></a>故障复现</h1><p>写一个简单的内存炸弹，模拟进程oom</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">a = []</span><br><span class="line"><span class="keyword">while</span> <span class="literal">True</span>:</span><br><span class="line">    a.append(<span class="string">&quot;test&quot;</span>)</span><br></pre></td></tr></table></figure><p>使用supervisor托管进程：</p><figure class="highlight ini"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="section">[program:test]</span></span><br><span class="line"><span class="attr">command</span>=python3 /root/test.py</span><br><span class="line"><span class="attr">process_name</span>=%(program_name)s</span><br><span class="line"><span class="attr">numprocs</span>=<span class="number">1</span></span><br></pre></td></tr></table></figure><p>观察到内存溢出时出现了同样的问题：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"># supervisorctl stop all</span><br><span class="line">unix:///var/run/supervisor.sock no such file</span><br><span class="line"># dmesg -T</span><br><span class="line">[Thu Jun  6 00:53:53 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/supervisor.service,task=python3,pid=1823,uid=0</span><br><span class="line">[Thu Jun  6 00:53:53 2024] Out of memory: Killed process 1823 (python3) total-vm:2665068kB, anon-rss:1624352kB, file-rss:4kB, shmem-rss:0kB, UID:0 pgtables:5104kB oom_score_adj:0</span><br><span class="line"># cat supervisor/supervisord.log</span><br><span class="line">2024-06-06 00:59:31,736 WARN exited: cat (terminated by SIGKILL; not expected)</span><br><span class="line">2024-06-06 00:59:31,839 INFO spawned: &#x27;cat&#x27; with pid 1916</span><br><span class="line">2024-06-06 00:59:32,842 INFO success: cat entered RUNNING state, process has stayed up for &gt; than 1 seconds (startsecs)</span><br><span class="line">2024-06-06 01:00:09,418 WARN exited: cat (terminated by SIGKILL; not expected)</span><br><span class="line">2024-06-06 01:00:09,486 INFO spawned: &#x27;cat&#x27; with pid 1934</span><br><span class="line">2024-06-06 01:00:10,170 INFO waiting for cat to die</span><br><span class="line">2024-06-06 01:00:11,187 WARN stopped: cat (terminated by SIGTERM)</span><br></pre></td></tr></table></figure><p>同样的溢出，同样的日志，同样的supervisor，同样的收到一个信号。</p><p>现在可以确定一个事情，跟系统环境应该时没关系的了，测试环境使用的时debian12，生产环境时debian10。</p><p>此时，我怀疑时supervisor版本原因，生产环境使用4.2.5版本，测试环境也是4.2.5，所以使用pip安装其他版本supervisor后手动拉起supervisor：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">supervisord -c /etc/supervisor/supervisord.conf</span><br></pre></td></tr></table></figure><p>这是奇怪的点就来了，测试进程挂了之后supervisor竟然还能正常运行：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">root@debian:/var/log# supervisorctl status</span><br><span class="line">cat                              RUNNING   pid 1983, uptime 0:00:41</span><br><span class="line">root@debian:/var/log# supervisorctl status</span><br><span class="line">cat                              STARTING</span><br></pre></td></tr></table></figure><p>那就基本可以断定，supervisor进程异常退出应该和进程启动和托管方式有关了，那么，apt安装的supervisor时systemd服务托管的，难道时systemd服务导致的？</p><h1 id="从supervisor的service文件入手"><a href="#从supervisor的service文件入手" class="headerlink" title="从supervisor的service文件入手"></a>从supervisor的service文件入手</h1><p>拿到supervisor的systemd配置文件：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">[Unit]</span><br><span class="line">Description=Supervisor process control system for UNIX</span><br><span class="line">Documentation=http://supervisord.org</span><br><span class="line">After=network.target</span><br><span class="line"></span><br><span class="line">[Service]</span><br><span class="line">ExecStart=/usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf</span><br><span class="line">ExecStop=/usr/bin/supervisorctl $OPTIONS shutdown</span><br><span class="line">ExecReload=/usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf $OPTIONS reload</span><br><span class="line">KillMode=process</span><br><span class="line">Restart=on-failure</span><br><span class="line">RestartSec=50s</span><br><span class="line"></span><br><span class="line">[Install]</span><br><span class="line">WantedBy=multi-user.target</span><br></pre></td></tr></table></figure><p>看起来没有什么奇怪的地方，那难道是有什么莫默认配置在作妖？查看systemd配置文档发现了几个奇怪的参数：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">✓ ManagedOOMSwap=</span><br><span class="line">✓ ManagedOOMMemoryPressure=</span><br><span class="line">✓ ManagedOOMMemoryPressureLimit=</span><br><span class="line">✓ ManagedOOMPreference=</span><br><span class="line">✓ OOMPolicy=</span><br></pre></td></tr></table></figure><p>看起来这些参数都是配置一些systemd的OOM控制，这里主要关注OOMPolicy，他的说明很有意思：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">&lt;varlistentry&gt;</span><br><span class="line">  &lt;term&gt;&lt;varname&gt;DefaultOOMPolicy=&lt;/varname&gt;&lt;/term&gt;</span><br><span class="line"></span><br><span class="line">  &lt;listitem&gt;&lt;para&gt;Configure the default policy for reacting to processes being killed by the Linux</span><br><span class="line">  Out-Of-Memory (OOM) killer or &lt;command&gt;systemd-oomd&lt;/command&gt;. This may be used to pick a global default for the per-unit</span><br><span class="line">  &lt;varname&gt;OOMPolicy=&lt;/varname&gt; setting. See</span><br><span class="line">  &lt;citerefentry&gt;&lt;refentrytitle&gt;systemd.service&lt;/refentrytitle&gt;&lt;manvolnum&gt;5&lt;/manvolnum&gt;&lt;/citerefentry&gt;</span><br><span class="line">  for details. Note that this default is not used for services that have &lt;varname&gt;Delegate=&lt;/varname&gt;</span><br><span class="line">  turned on.&lt;/para&gt;</span><br><span class="line"></span><br><span class="line">  &lt;xi:include href=&quot;version-info.xml&quot; xpointer=&quot;v243&quot;/&gt;&lt;/listitem&gt;</span><br><span class="line">&lt;/varlistentry&gt;</span><br></pre></td></tr></table></figure><p>也就是说，supervisor没有配置OOMPolicy的话，一定是匹配到了默认值，检查systemd默认配置发现默认值:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"># cat /etc/systemd/system.conf  | grep OOM</span><br><span class="line">#DefaultOOMPolicy=stop</span><br></pre></td></tr></table></figure><p>真相来了，直接莽一波去翻systemd代码，看到一个enum和一个函数：</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">unit_defaults_init</span><span class="params">(UnitDefaults *defaults, RuntimeScope scope)</span> &#123;</span><br><span class="line">        assert(defaults);</span><br><span class="line">        assert(scope &gt;= <span class="number">0</span>);</span><br><span class="line">        assert(scope &lt; _RUNTIME_SCOPE_MAX);</span><br><span class="line"></span><br><span class="line">        *defaults = (UnitDefaults) &#123;</span><br><span class="line">                .std_output = EXEC_OUTPUT_JOURNAL,</span><br><span class="line">                .std_error = EXEC_OUTPUT_INHERIT,</span><br><span class="line">                .restart_usec = DEFAULT_RESTART_USEC,</span><br><span class="line">                .timeout_start_usec = manager_default_timeout(scope),</span><br><span class="line">                .timeout_stop_usec = manager_default_timeout(scope),</span><br><span class="line">                .timeout_abort_usec = manager_default_timeout(scope),</span><br><span class="line">                .timeout_abort_set = <span class="literal">false</span>,</span><br><span class="line">                .device_timeout_usec = manager_default_timeout(scope),</span><br><span class="line">                .start_limit_interval = DEFAULT_START_LIMIT_INTERVAL,</span><br><span class="line">                .start_limit_burst = DEFAULT_START_LIMIT_BURST,</span><br><span class="line"></span><br><span class="line">                <span class="comment">/* On 4.15+ with unified hierarchy, CPU accounting is essentially free as it doesn&#x27;t require the CPU</span></span><br><span class="line"><span class="comment">                 * controller to be enabled, so the default is to enable it unless we got told otherwise. */</span></span><br><span class="line">                .cpu_accounting = cpu_accounting_is_cheap(),</span><br><span class="line">                .memory_accounting = MEMORY_ACCOUNTING_DEFAULT,</span><br><span class="line">                .io_accounting = <span class="literal">false</span>,</span><br><span class="line">                .blockio_accounting = <span class="literal">false</span>,</span><br><span class="line">                .tasks_accounting = <span class="literal">true</span>,</span><br><span class="line">                .ip_accounting = <span class="literal">false</span>,</span><br><span class="line"></span><br><span class="line">                .tasks_max = DEFAULT_TASKS_MAX,</span><br><span class="line">                .timer_accuracy_usec = <span class="number">1</span> * USEC_PER_MINUTE,</span><br><span class="line"></span><br><span class="line">                .memory_pressure_watch = CGROUP_PRESSURE_WATCH_AUTO,</span><br><span class="line">                .memory_pressure_threshold_usec = MEMORY_PRESSURE_DEFAULT_THRESHOLD_USEC,</span><br><span class="line"></span><br><span class="line">                .oom_policy = OOM_STOP,</span><br><span class="line">                .oom_score_adjust_set = <span class="literal">false</span>,</span><br><span class="line">        &#125;;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">enum</span> <span class="title">OOMPolicy</span> &#123;</span></span><br><span class="line">        OOM_CONTINUE,          <span class="comment">/* The kernel or systemd-oomd kills the process it wants to kill, and that&#x27;s it */</span></span><br><span class="line">        OOM_STOP,              <span class="comment">/* The kernel or systemd-oomd kills the process it wants to kill, and we stop the unit */</span></span><br><span class="line">        OOM_KILL,              <span class="comment">/* The kernel or systemd-oomd kills the process it wants to kill, and all others in the unit, and we stop the unit */</span></span><br><span class="line">        _OOM_POLICY_MAX,</span><br><span class="line">        _OOM_POLICY_INVALID = -EINVAL,</span><br><span class="line">&#125; OOMPolicy;</span><br></pre></td></tr></table></figure><p>也就是说，默认状态下oom的策略被设置为stop，当Unit的子进程挂了的时候，整个Unit也会被kill掉。当业务进程oom的时候systemd-oomd和systemd并没有失能，systemd-oomd捕获到业务进程挂了之后按照oom策略向supervisor的Unit发去一个退出信号，所以当业务进程oom后的几秒钟supervisor也退出了（supervisor要等业务进程退出），所以上面日志所有时间和进程接收到的信号也就解释起来非常合理了。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h1&gt;&lt;p&gt;这个问题出自一个线上故障。uwsgi进程查询数据库内容过大导致进程oom，同时supervisord进程接收到一个退出信号后进行优雅退出。因</summary>
      
    
    
    
    <category term="运维技术" scheme="http://www.huckops.xyz/categories/%E8%BF%90%E7%BB%B4%E6%8A%80%E6%9C%AF/"/>
    
    
  </entry>
  
  <entry>
    <title>滴滴故障猜测分析</title>
    <link href="http://www.huckops.xyz/2023/12/04/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E6%BB%B4%E6%BB%B4%E6%95%85%E9%9A%9C%E7%8C%9C%E6%B5%8B%E5%88%86%E6%9E%90/"/>
    <id>http://www.huckops.xyz/2023/12/04/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E6%BB%B4%E6%BB%B4%E6%95%85%E9%9A%9C%E7%8C%9C%E6%B5%8B%E5%88%86%E6%9E%90/</id>
    <published>2023-12-04T22:19:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="事件背景"><a href="#事件背景" class="headerlink" title="事件背景"></a>事件背景</h1><p>11 月 27 日，滴滴出现多端 app/小程序服务异常，截至今天滴滴并未公布具体故障原因，网上说法众说纷纭，今天对这个问题展开讨论猜测一下可能的情况。</p><p><img src="https://i.miji.bid/2023/12/04/6708aa0a27ccf28f8383a8638b17bcca.webp" alt="6708aa0a27ccf28f8383a8638b17bcca.webp"></p><h1 id="故障表现"><a href="#故障表现" class="headerlink" title="故障表现"></a>故障表现</h1><p>滴滴多端不可用（包含用户端/司机端/青桔），显示网络异常，无故障期间 http/tcp 抓包，无法故障回放。网传滴滴内网同步异常，暂无求证门路。</p><h1 id="故障猜想"><a href="#故障猜想" class="headerlink" title="故障猜想"></a>故障猜想</h1><ol><li>网络攻击（网传）</li><li>k8s 升级故障（网传）<blockquote><p>以下均为猜测</p></blockquote></li><li>中间件及 db，系统内核层面等程序爆发式 bug</li><li>白名单异常</li><li>中台重要环节故障导致调用链断裂</li><li>业务引擎 bug</li></ol><h1 id="故障猜想论证"><a href="#故障猜想论证" class="headerlink" title="故障猜想论证"></a>故障猜想论证</h1><h2 id="网络攻击"><a href="#网络攻击" class="headerlink" title="网络攻击"></a>网络攻击</h2><p>网络攻击具有不确定性：不确定来源，不可预知流量，不确定攻击时间。本次故障看来从时间持续性较长，具有一定的被攻击的特征，但是在受到攻击后通常企业会做以下操作：</p><ol><li>接入流量清洗，但是对于被攻击目标太大时使用流量清洗或者 WAF 也是无力的，会引入很大的接入成本和使用成本</li><li>直接切换入口，切换入口主要风险点在于是否有备用 ip 可以更换，链路是否冗余，域名 ttl 是否过长，但是故障收敛速度很快，不过这种方式基本只会存在于单业务故障的时候</li></ol><p>从以上解决方案来看，故障时间都可以在较短的时间内完成故障收敛，本次故障持续了 12 小时明显站不住脚。并且攻击通常不会针对全业务进行，最多打爆某几个业务已经属于很大量的攻击了，再加上内网也同时异常，完全不符合攻击的特征。</p><h2 id="k8s-集群升级引入"><a href="#k8s-集群升级引入" class="headerlink" title="k8s 集群升级引入"></a>k8s 集群升级引入</h2><p>我们先看看 k8s 常见多 master 的架构：</p><p><img src="https://i.miji.bid/2023/12/04/0eca920639a134895f8aa5c26a4e5be0.webp" alt="0eca920639a134895f8aa5c26a4e5be0.webp"></p><p>网传本次故障是更新 APIServer 导致 k8s 集群主节点崩溃，从而导致 pod 调度失败，但是反观这个观点有几点站不住脚：</p><ol><li>既然是更新，那肯定是滚动更新，不可能直接全量，如果滚动更新单 master 挂了，完全可以切掉这个 master 节点，故障收敛时间不会超过一个小时，即使用单集群部署故障也会很快收敛。</li><li>这里我姑且认为他 master 挂了，而且无法恢复，那么，为什么影响面会这么广，莫非所有业务部署在同一集群？而且没有用多集群做 AZ 高可用或者 region 级别的高可用？</li></ol><p><strong>我们先认为他们使用了一个公共集群部署所有业务</strong>，如果 master 真的故障了，也是应该第一时间切掉故障，理论上来说切掉故障点后容器重生基本就可以恢复问题了。但是不可否认可能为以下问题导致雪崩：</p><ol><li>集群实在塞的太满，rolling update 策略完全跑不起来，quota 已经被用尽，新生成不了容器</li><li>同时挂的容器比较多，且异常容器并未正常退出，和 db 等中间件仍保持连接，重生的容器直接把这些东西的连接限制打满导致恶行循环</li></ol><p>但是显然这些可能是不会很容易就出现的，毕竟这是滴滴，主要业务后端组件承载量不会就那么小。</p><p><strong>换个角度，所有业务用同一套多集群公共集群</strong></p><p>这里就有必要引入华为开发的 karmada 来说事了。我们先看看多集群架构，先不说引入 istio 多集群流量治理这一步，从最简单的多集群来说：</p><p><img src="https://i.miji.bid/2023/12/04/3b5ecb3acd8fe9f0e719790a304f8990.png" alt="3b5ecb3acd8fe9f0e719790a304f8990.png"></p><p>karmada 的优势在于可以通过一个入口管理到两个集群，业务可以直接通过 op/pp 定向发布到指定的集群，况且，karmada 有很强的容灾能力，甚至能做到单集群挂了之后把容器全量迁移到另一个集群上，收敛时间基本可以做到分钟级。哪怕不切域名，也至少会有部分人正常访问。</p><p><strong>我姑且认为他们没有用这一套方案做</strong>，那么当一个集群炸了之后，第一时间应该是尽快服务降级，并切除故障集群，入口导流到另一个集群上，从收敛时间来看，也不太像。</p><p><strong>需要注意，现在企业为了容灾都是用的多套集群分别部署多套业务，况且这是核心业务，一般都会有 AZ 高可用</strong>，但凡用了集群拆分部署不同业务也不会出现这么严重的故障。况且这么复杂的业务网关入口大概率也用了 istio 类的流量治理工具，在多集群场景下流量治理也能完成一定范围内的容灾。所以 k8s 升级故障说法不攻自破。</p><h2 id="中间件及-db，系统内核层面等程序爆发式-bug"><a href="#中间件及-db，系统内核层面等程序爆发式-bug" class="headerlink" title="中间件及 db，系统内核层面等程序爆发式 bug"></a>中间件及 db，系统内核层面等程序爆发式 bug</h2><p>这个问题是很罕见的，我有幸遇到过一次。当时没有任何征兆，一千多台华为鲲鹏 cpu 机器集体趴窝，排查结果是驱动异常导致网卡和内核通信异常，直接让网卡不停的 up/down。这一情况也可能推广到 db 或者其他中间件上。但是需要注意这种情况有以下一些因素可能推翻这一可能：</p><ol><li>不同业务用的中间件/db 版本，种类是不相同的，所以同一时间集体趴窝可能性不大</li><li>系统内核层面爆发式 bug，这种情况和内核版本/系统版本/机器机型有关，也不太可能同时趴窝</li></ol><p>所以除非蠢到所有中间件/db/os 版本完全一样，机型也一样，才可能有这么重大的故障。</p><h2 id="白名单异常"><a href="#白名单异常" class="headerlink" title="白名单异常"></a>白名单异常</h2><p>这个问题我认为可能性比较大，可以参考 11 月 12 日阿里云故障的表现，当时阿里云基本全系列产品都不可用。</p><p>通常在大型企业的防火墙是有一个专门的平台管理的，且作为基础组件/运维中台来用的，如果白名单出现 bug 或者直接挂了，生成残缺或者空白的白名单，完全有可能直接导致全部接入业务 403（七层接入）或者拒绝连接（四层接入）。并且综合内网也挂了这一表现来看，如果内网也用到了这个白名单服务的话，确实也会导致内网崩溃，所以这一可能是比较有理论依据的。</p><p>我以前恰巧也遇到过这一问题，当时防火墙系统生成了空白名单，直接导致所有接入业务（包括 CMDB，监控，日志，甚至内网认证）都出现异常，外部用户的感知是只要涉及接入的服务都会被影响到。</p><h2 id="中台重要环节故障导致调用链断裂-基础依赖问题"><a href="#中台重要环节故障导致调用链断裂-基础依赖问题" class="headerlink" title="中台重要环节故障导致调用链断裂/基础依赖问题"></a>中台重要环节故障导致调用链断裂/基础依赖问题</h2><p>这也是一种比较大的可能。比如设想一个场景：用户通过 auth 登录后，进行的每一次业务操作（比如打车发出订单，或者扫码单车，非查看类的）基本都要从鉴权那边获取用户是否可以操作，或者从计费那边检查，用户是否有未支付订单等等操作。如果这些业务出现任何一个故障，都可能直接导致用户的操作异常。从本次故障来看，用户的感知基本符合这一问题，但是同样不能解释内网挂了的原因。因为通常企业不会把内网 auth 和外网 auth 用同一套，况且法规要求数据保护，办公区网络和服务基本不可能和线上服务部署到一起且相互影响。</p><p>从恢复时间上看也可能是服务依赖问题，比如 A 服务启动需要 B 服务，B 服务启动需要 C 服务，C 服务启动需要 A 服务，这样的环形依赖，但凡任何一个服务挂了，都可能导致三个服务循环挂，而且极难恢复。根据和滴滴司机面谈，司机反馈说早上时候服务有一段时间可用，之后又不可用了，若为这个原因的话，可能在早上做了服务降级，把某一环依赖去掉了，暂时拉起服务，后又恢复业务，依赖产生之后又重新挂掉，之后直接开始 hotfix，直到问题被解决，这条路可以解释通。</p><p>也有一种可能，是直接基础依赖出现问题了，比如 dns。如果公司用自建 anycast 且办公网和业务网不隔离的情况下，确实可能出现所有业务故障。或者可以参考美团上次上海机房故障问题，主要业务都部署在同一机房，其他机房部署周边业务且没有等量的常备计算池，发生机房级别的故障后没有足够的冗余资源，从故障 12 小时来看，这个观点只有一种可能，主机房故障了 12 小时，且部分内网业务也部署在这个机房（这也是大部分企业的做法，一个超大机房，几个大机房，其余都是小机房或者 pop 节点，至少小米是这样的），期间部分业务迁移到其他节点机房暂时拉起，因资源不足被打挂了，机房恢复以后服务才逐渐恢复。这个情况以前有遇见过，主机房空调故障导致机器大面积关机，没有做 AZ 高可用的服务都挂了，有 AZ 高可用的暂且苟住了。</p><h2 id="业务引擎-bug"><a href="#业务引擎-bug" class="headerlink" title="业务引擎 bug"></a>业务引擎 bug</h2><p>这个问题可能有些人比较难想到，但是像这种巨型业务，一定是对各部分进行过拆分，这个引擎可以理解为业务的最核心逻辑，为多端进行服务，业务引擎炸了，自然从引擎派生出来的多端服务也就跟着炸了，且是一起炸，极可能也是一起恢复。但是仍然解释不通内网异常这个问题。</p><h1 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h1><p>从以上分析不难看出，判别具体故障原因的必要条件就是需要明确知道滴滴的内网到底有没有挂，如果内网也挂了，基本可以断定是因为基础设施或者白名单问题导致故障，如果内网没挂，可能的原因可能很多，上面只是对我认为可能的原因进行了猜测和举证，可能还有很多原因是我没有想到的吧。但愿滴滴的运维同行今年还能拿到年终奖。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;事件背景&quot;&gt;&lt;a href=&quot;#事件背景&quot; class=&quot;headerlink&quot; title=&quot;事件背景&quot;&gt;&lt;/a&gt;事件背景&lt;/h1&gt;&lt;p&gt;11 月 27 日，滴滴出现多端 app/小程序服务异常，截至今天滴滴并未公布具体故障原因，网上说法众说纷纭，今天对这个问题</summary>
      
    
    
    
    <category term="架构" scheme="http://www.huckops.xyz/categories/%E6%9E%B6%E6%9E%84/"/>
    
    
    <category term="SRE" scheme="http://www.huckops.xyz/tags/SRE/"/>
    
  </entry>
  
  <entry>
    <title>配置文件动态获取</title>
    <link href="http://www.huckops.xyz/2023/05/28/%E5%BC%80%E6%BA%90%E4%BB%A3%E7%A0%81/go-admin/%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E5%8A%A8%E6%80%81%E8%8E%B7%E5%8F%96/"/>
    <id>http://www.huckops.xyz/2023/05/28/%E5%BC%80%E6%BA%90%E4%BB%A3%E7%A0%81/go-admin/%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E5%8A%A8%E6%80%81%E8%8E%B7%E5%8F%96/</id>
    <published>2023-05-28T15:53:34.000Z</published>
    <updated>2026-06-06T17:04:29.931Z</updated>
    
    <content type="html"><![CDATA[<h1 id="优点"><a href="#优点" class="headerlink" title="优点"></a>优点</h1><p>服务可以做到动态更新，配置更新时不需要停服，且文件更新后不需要重启服务。</p><h1 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h1><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">config.Setup(</span><br><span class="line">    file.NewSource(file.WithPath(configYml)),</span><br><span class="line">    database.Setup,</span><br><span class="line">    storage.Setup,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><h2 id="Options思想"><a href="#Options思想" class="headerlink" title="Options思想"></a>Options思想</h2><p>在python中，形参可以定义为可选参数，但是在go这种强语法的语言中形参总是不可变的，所以我们就要引入一种Options的方式来将不可变的参数转换为可变的参数，如下边一段代码：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> main</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> <span class="string">&quot;fmt&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Options <span class="keyword">struct</span> &#123;</span><br><span class="line">Test1 <span class="type">string</span></span><br><span class="line">Test2 <span class="type">string</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Option <span class="function"><span class="keyword">func</span><span class="params">(*Options)</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">WithTest1</span><span class="params">(t1 <span class="type">string</span>)</span></span> Option &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="function"><span class="keyword">func</span><span class="params">(o *Options)</span></span> &#123;</span><br><span class="line">o.Test1 = t1</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewOptions</span><span class="params">(opts ...Option)</span></span> Options &#123;</span><br><span class="line">options := Options&#123;&#125;</span><br><span class="line"><span class="keyword">for</span> _, opt := <span class="keyword">range</span> opts &#123;</span><br><span class="line">opt(&amp;options)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> options</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line">opts := NewOptions(WithTest1(<span class="string">&quot;test1&quot;</span>))</span><br><span class="line">fmt.Println(opts)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Options相当于作为了新建结构体函数的形参列表，有点类似于ts中的interface机制。但是在python中，可变的形参可以直接用等号连接，并且在调用的时候可以直接进行赋值，但是在go中需要引入Option和With函数。</p><p>With函数本质上就指Option，在创建对象时用户可以先调用With函数生成可变参数操作的引用，然后再创建函数中进行遍历，达到了可变的结构体初始化函数形参。</p><h2 id="源文件封装"><a href="#源文件封装" class="headerlink" title="源文件封装"></a>源文件封装</h2><p>程序在此处开始了配置文件解析，引入了文件对象的Options：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> Options <span class="keyword">struct</span> &#123;</span><br><span class="line"><span class="comment">// Encoder</span></span><br><span class="line">Encoder encoder.Encoder</span><br><span class="line"></span><br><span class="line"><span class="comment">// for alternative data</span></span><br><span class="line">Context context.Context</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">type</span> Option <span class="function"><span class="keyword">func</span><span class="params">(o *Options)</span></span></span><br></pre></td></tr></table></figure><p>可以看到再Options里放了一个编码器和一个上下文管理器，并定义了一个Option的hook进写入了数据：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// WithPath sets the path to file</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">WithPath</span><span class="params">(p <span class="type">string</span>)</span></span> source.Option &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="function"><span class="keyword">func</span><span class="params">(o *source.Options)</span></span> &#123;</span><br><span class="line"><span class="keyword">if</span> o.Context == <span class="literal">nil</span> &#123;</span><br><span class="line">o.Context = context.Background()</span><br><span class="line">&#125;</span><br><span class="line">o.Context = context.WithValue(o.Context, filePathKey&#123;&#125;, p)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这里的逻辑是将文件路径存放到上下文管理器中，而后，NewSource生成了一个file管理器放了Options和filePath：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> file <span class="keyword">struct</span> &#123;</span><br><span class="line">path <span class="type">string</span></span><br><span class="line">opts source.Options</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewSource</span><span class="params">(opts ...source.Option)</span></span> source.Source &#123;</span><br><span class="line">options := source.NewOptions(opts...)</span><br><span class="line">path := DefaultPath</span><br><span class="line">    <span class="comment">// 判断上下文管理器中是否成功的放置了文件路径</span></span><br><span class="line">f, ok := options.Context.Value(filePathKey&#123;&#125;).(<span class="type">string</span>)</span><br><span class="line"><span class="keyword">if</span> ok &#123;</span><br><span class="line">path = f</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> &amp;file&#123;opts: options, path: path&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewOptions</span><span class="params">(opts ...Option)</span></span> Options &#123;</span><br><span class="line">options := Options&#123;</span><br><span class="line">Encoder: json.NewEncoder(),</span><br><span class="line">Context: context.Background(),</span><br><span class="line">&#125;</span><br><span class="line">    <span class="comment">// 运行了上边的引用,把配置地址放到了上下文管理器中</span></span><br><span class="line"><span class="keyword">for</span> _, o := <span class="keyword">range</span> opts &#123;</span><br><span class="line">o(&amp;options)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> options</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以看到这里做了多次配置文件保底,分别是:</p><ol><li>命令行获取代码配置地址,第一次保底,获取到配置文件地址后使用WithPath的hook写如了配置文件位置。本质上来说已经够了</li></ol><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">StartCmd.PersistentFlags().StringVarP(&amp;configYml, <span class="string">&quot;config&quot;</span>, <span class="string">&quot;c&quot;</span>, <span class="string">&quot;config/settings.yml&quot;</span>, <span class="string">&quot;Start server with provided configuration file&quot;</span>)</span><br></pre></td></tr></table></figure><ol start="2"><li>从上下文中判断有没有放入配置文件地址,如果没有的话拿代码里的配置.猜测使用该处代码的主要原因可能是为了防止上下文放入失败</li></ol><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">f, ok := options.Context.Value(filePathKey&#123;&#125;).(<span class="type">string</span>)</span><br><span class="line"><span class="keyword">if</span> ok &#123;</span><br><span class="line">    path = f</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="配置封装"><a href="#配置封装" class="headerlink" title="配置封装"></a>配置封装</h2><p>至此,我们已经拿到了编码器和配置文件的位置,我们可以开始做配置解析了:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Options 配置的参数</span></span><br><span class="line"><span class="keyword">type</span> Options <span class="keyword">struct</span> &#123;</span><br><span class="line">Loader loader.Loader</span><br><span class="line">Reader reader.Reader</span><br><span class="line">    <span class="comment">// 配置源</span></span><br><span class="line">Source []source.Source</span><br><span class="line"></span><br><span class="line"><span class="comment">// for alternative data</span></span><br><span class="line">Context context.Context</span><br><span class="line"></span><br><span class="line">Entity Entity</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">Setup</span><span class="params">(s source.Source,</span></span></span><br><span class="line"><span class="params"><span class="function">fs ...<span class="keyword">func</span>()</span></span>) &#123;</span><br><span class="line">_cfg = &amp;Settings&#123;</span><br><span class="line">Settings: Config&#123;</span><br><span class="line">            ...</span><br><span class="line">&#125;,</span><br><span class="line">callbacks: fs,</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line">config.DefaultConfig, err = config.NewConfig(</span><br><span class="line">config.WithSource(s),</span><br><span class="line">config.WithEntity(_cfg),</span><br><span class="line">)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Fatal(fmt.Sprintf(<span class="string">&quot;New config object fail: %s&quot;</span>, err.Error()))</span><br><span class="line">&#125;</span><br><span class="line">_cfg.Init()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>首先这里边定义了一个新的Options,可以认为是配置选项器.代码里引入了新Option:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">WithSource</span><span class="params">(s source.Source)</span></span> Option &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="function"><span class="keyword">func</span><span class="params">(o *Options)</span></span> &#123;</span><br><span class="line">o.Source = <span class="built_in">append</span>(o.Source, s)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// WithEntity sets the config Entity</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">WithEntity</span><span class="params">(e Entity)</span></span> Option &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="function"><span class="keyword">func</span><span class="params">(o *Options)</span></span> &#123;</span><br><span class="line">o.Entity = e</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>先把源目放入管理器,然后管理器进行解析:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// NewConfig returns new config</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewConfig</span><span class="params">(opts ...Option)</span></span> (Config, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">return</span> newConfig(opts...)</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">newConfig</span><span class="params">(opts ...Option)</span></span> (Config, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">var</span> c config</span><br><span class="line"></span><br><span class="line">err := c.Init(opts...)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">go</span> c.run()</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> &amp;c, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *config)</span></span> Init(opts ...Option) <span class="type">error</span> &#123;</span><br><span class="line">c.opts = Options&#123;</span><br><span class="line">Reader: json.NewReader(),</span><br><span class="line">&#125;</span><br><span class="line">c.exit = <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="type">bool</span>)</span><br><span class="line"><span class="keyword">for</span> _, o := <span class="keyword">range</span> opts &#123;</span><br><span class="line">o(&amp;c.opts)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 这里主要是补充程序启动时数据还未预置,自动更新流程不能触发,自动更新时在携程中运行的,流程答题相似</span></span><br><span class="line"><span class="comment">// 读文件      -&gt;    Loader    -&gt; 设置快照 -&gt; 手动解析</span></span><br><span class="line"><span class="comment">// 携程watcher |                          |-&gt; 自动解析</span></span><br><span class="line"><span class="comment">// default loader uses the configured reader</span></span><br><span class="line"><span class="keyword">if</span> c.opts.Loader == <span class="literal">nil</span> &#123;</span><br><span class="line">c.opts.Loader = memory.NewLoader(memory.WithReader(c.opts.Reader))</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">err := c.opts.Loader.Load(c.opts.Source...)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">c.snap, err = c.opts.Loader.Snapshot()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">c.vals, err = c.opts.Reader.Values(c.snap.ChangeSet)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> c.opts.Entity != <span class="literal">nil</span> &#123;</span><br><span class="line">_ = c.vals.Scan(c.opts.Entity)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>先来关注这部分代码:</p><p>从这个调用链不难看出,代码一进来就调起了配置管理器的初始化.这里主要时对配置管理器进行了初始化操作，放入了Reader和Loader。</p><p>在初始化Loader的时候，创建了一个协程监听文件：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 初始化部分，一般不会监听</span></span><br><span class="line"><span class="keyword">for</span> i, s := <span class="keyword">range</span> options.Source &#123;</span><br><span class="line">m.sets[i] = &amp;source.ChangeSet&#123;Source: s.String()&#125;</span><br><span class="line"><span class="keyword">go</span> m.watch(i, s)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 装载配置文件,启动监听任务</span></span><br><span class="line"><span class="keyword">for</span> _, source := <span class="keyword">range</span> sources &#123;</span><br><span class="line">set, err := source.Read()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">gerrors = <span class="built_in">append</span>(gerrors,</span><br><span class="line">fmt.Sprintf(<span class="string">&quot;error loading source %s: %v&quot;</span>,</span><br><span class="line">source,</span><br><span class="line">err))</span><br><span class="line"><span class="comment">// continue processing</span></span><br><span class="line"><span class="keyword">continue</span></span><br><span class="line">&#125;</span><br><span class="line">m.Lock()</span><br><span class="line">m.sources = <span class="built_in">append</span>(m.sources, source)</span><br><span class="line">m.sets = <span class="built_in">append</span>(m.sets, set)</span><br><span class="line">idx := <span class="built_in">len</span>(m.sets) - <span class="number">1</span></span><br><span class="line">m.Unlock()</span><br><span class="line"><span class="keyword">go</span> m.watch(idx, source)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>当监听到为文件变化时出发读取，文件属性以及文件值会被更新，并且创建一个快照。</p><p>而后回对所有文件全部注意转码:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> err := m.reload(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">gerrors = <span class="built_in">append</span>(gerrors, err.Error())</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">codec, ok := j.opts.Encoding[m.Format]</span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line"><span class="comment">// fallback</span></span><br><span class="line">codec = j.json</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>转为json后便可直接用json解析器解析配置到结构体了:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(j *jsonValues)</span></span> Scan(v <span class="keyword">interface</span>&#123;&#125;) <span class="type">error</span> &#123;</span><br><span class="line">b, err := j.sj.MarshalJSON()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> json.Unmarshal(b, v)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="动态更新"><a href="#动态更新" class="headerlink" title="动态更新"></a>动态更新</h2><p>上边启动过一个文件监听携程:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *memory)</span></span> watch(idx <span class="type">int</span>, s source.Source) &#123;</span><br><span class="line"><span class="comment">// watches a source for changes</span></span><br><span class="line">watch := <span class="function"><span class="keyword">func</span><span class="params">(idx <span class="type">int</span>, s source.Watcher)</span></span> <span class="type">error</span> &#123;</span><br><span class="line"><span class="keyword">for</span> &#123;</span><br><span class="line"><span class="comment">// get changeset</span></span><br><span class="line">cs, err := s.Next()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">m.Lock()</span><br><span class="line"></span><br><span class="line"><span class="comment">// save</span></span><br><span class="line">m.sets[idx] = cs</span><br><span class="line"></span><br><span class="line"><span class="comment">// merge sets</span></span><br><span class="line">set, err := m.opts.Reader.Merge(m.sets...)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">m.Unlock()</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// set values</span></span><br><span class="line">m.vals, _ = m.opts.Reader.Values(set)</span><br><span class="line">m.snap = &amp;loader.Snapshot&#123;</span><br><span class="line">ChangeSet: set,</span><br><span class="line">Version:   genVer(),</span><br><span class="line">&#125;</span><br><span class="line">m.Unlock()</span><br><span class="line"></span><br><span class="line"><span class="comment">// send watch updates</span></span><br><span class="line">m.update()</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> &#123;</span><br><span class="line"><span class="comment">// watch the source</span></span><br><span class="line">w, err := s.Watch()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">time.Sleep(time.Second)</span><br><span class="line"><span class="keyword">continue</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">done := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="type">bool</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">// the stop watch func</span></span><br><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="keyword">select</span> &#123;</span><br><span class="line"><span class="keyword">case</span> &lt;-done:</span><br><span class="line"><span class="keyword">case</span> &lt;-m.exit:</span><br><span class="line">&#125;</span><br><span class="line">_ = w.Stop()</span><br><span class="line">&#125;()</span><br><span class="line"></span><br><span class="line"><span class="comment">// block watch</span></span><br><span class="line"><span class="keyword">if</span> err := watch(idx, w); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// do something better</span></span><br><span class="line">time.Sleep(time.Second)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// close done chan</span></span><br><span class="line"><span class="built_in">close</span>(done)</span><br><span class="line"></span><br><span class="line"><span class="comment">// if the config is closed exit</span></span><br><span class="line"><span class="keyword">select</span> &#123;</span><br><span class="line"><span class="keyword">case</span> &lt;-m.exit:</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line"><span class="keyword">default</span>:</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>当文件变化时,触发Next函数,Next将新读到的文件信息返回出来,把新文件信息放入sets后重新进入Merge进行转码,需要注意的是,这里的重转码并不是只针对单个配置文件,而是针对所有文件.sets更新后进入Values对配置信息进行解析</p><p>回顾一下上面的代码,一共启动了两个携程:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">newConfig</span><span class="params">(opts ...Option)</span></span> (Config, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">go</span> c.run()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个携程是进行配置解析以及回调的,也就是说现在存在两个携程,使用一个公共内存通信:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">watcher -&gt; source -&gt; config.run</span><br></pre></td></tr></table></figure><p>结构体监听主要得益于前面在config中添加的snapshot,代码循环比较当前配置和快照是否相同,如果不相同则触发重解析并重新运行初始化回调.然后更新快照,继续等待下一次更新.</p><p>不得不说,这个设计很复杂,但是也很巧妙,梳理的比较粗糙,要想知道详细的工作流程可以看看go-micro/config的源码,这里只做了少量魔改,基本流程都一样.</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;优点&quot;&gt;&lt;a href=&quot;#优点&quot; class=&quot;headerlink&quot; title=&quot;优点&quot;&gt;&lt;/a&gt;优点&lt;/h1&gt;&lt;p&gt;服务可以做到动态更新，配置更新时不需要停服，且文件更新后不需要重启服务。&lt;/p&gt;
&lt;h1 id=&quot;源码分析&quot;&gt;&lt;a href=&quot;#源码分析&quot;</summary>
      
    
    
    
    <category term="开源软件" scheme="http://www.huckops.xyz/categories/%E5%BC%80%E6%BA%90%E8%BD%AF%E4%BB%B6/"/>
    
    
    <category term="go-admin" scheme="http://www.huckops.xyz/tags/go-admin/"/>
    
  </entry>
  
  <entry>
    <title>发布工程</title>
    <link href="http://www.huckops.xyz/2022/03/28/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E5%8F%91%E5%B8%83%E5%B7%A5%E7%A8%8B/"/>
    <id>http://www.huckops.xyz/2022/03/28/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E5%8F%91%E5%B8%83%E5%B7%A5%E7%A8%8B/</id>
    <published>2022-03-28T22:17:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="发布"><a href="#发布" class="headerlink" title="发布"></a>发布</h1><p>对于大型生产环境和复杂的业务系统来说，发布应用或者变更并不是简单的在服务器上运行二进制包或者脚本就可以了，这样很不便于服务的管理，且服务的稳定性也无法保障，所以就会有发布工程这一说法。</p><h1 id="发布工程要素"><a href="#发布工程要素" class="headerlink" title="发布工程要素"></a>发布工程要素</h1><h2 id="自给自足的发布"><a href="#自给自足的发布" class="headerlink" title="自给自足的发布"></a>自给自足的发布</h2><p>对于每个研发团队来说，都需要有一个相对应的发布工程师，发布工程师指定发布策略，编写发布工具，指定发布流程。</p><h2 id="追求速度"><a href="#追求速度" class="headerlink" title="追求速度"></a>追求速度</h2><p>当发布频繁时，每个版本之间的变更将减少。这样能减小新版本上线侯的测试和调试成本。对于项目可以高频次构建，然后在多次构建中选择一个构建版本进行发布（条件：需要是所有测试都通过的版本）。</p><h2 id="密闭性"><a href="#密闭性" class="headerlink" title="密闭性"></a>密闭性</h2><p>每一次构建的产物，都不能因为构建环境差异导致构建差异。所有依赖在构建过程要自包含。</p><p>当构建在生产环境中出现bug时，按照之前的源码版本，加入新的改动之后生成新的构建（即向repo推送fix的commit）。构建的编译环境会被包含到repo中（显然有些不可能，所以最佳做法应该时在repo中保存编译器版本等环境信息），这样能保证每次构建环境都是相同的。</p><h1 id="发布流程"><a href="#发布流程" class="headerlink" title="发布流程"></a>发布流程</h1><p>对于变更发布，一般需要遵循一下流程：</p><ol><li>提交代码到repo，提mr，进行review</li><li>指定流程中的动作</li><li>创建新发布版</li><li>批准集成，即合并mr</li><li>部署发布版</li><li>修改项目配置文件</li></ol><h1 id="持续集成"><a href="#持续集成" class="headerlink" title="持续集成"></a>持续集成</h1><h2 id="构建"><a href="#构建" class="headerlink" title="构建"></a>构建</h2><p>构建目标应该被定义在配置文件中，如常用的configure文件。可以指定构建的参数及属性。</p><h2 id="分支"><a href="#分支" class="headerlink" title="分支"></a>分支</h2><p>常用的分支管理与google有一些出入。项目变更通常会推送到开发分支，开发分支变更经review和审计之后被合并到发布分支上进行发布，由此可以清楚看出每次合并的变更信息。</p><h2 id="测试"><a href="#测试" class="headerlink" title="测试"></a>测试</h2><p>开发分支提交变更后即执行单元测试。以分支的构建结果和测试结果作为指定发布版本的依据。通常可以使用最后一次成功构建和测试通过的版本进行发布。</p><p>发布过程中要进行全部的单元测试，防止开发分支不包含某些发布分支的代码出现未测到的问题。同样要保证最后一次测试通过。</p><h2 id="打包"><a href="#打包" class="headerlink" title="打包"></a>打包</h2><p>对构建成果进行打包，可以使用构建得出的hash和构建标签进行打标签，加入签名以保证打包的完整性（可以使用MD5）。</p><h2 id="部署"><a href="#部署" class="headerlink" title="部署"></a>部署</h2><p>一个发布时的组成时一个或者多个任务组成的逻辑工作单元（可以理解为微服务）。自动发布系统可以用构建时的build tag进行自动化部署（指定发布版本）。对于大型项目部署速度通常时指数型的，即灰度发布。</p><h1 id="配置管理"><a href="#配置管理" class="headerlink" title="配置管理"></a>配置管理</h1><p>项目配置需要为一致的，所以可以将配置文件存放到项目的repo中，同样进行代码审计。在打包时可以将配置文件一起进行打包，降低部署的难度，但是同样会带来一些问题，比如配置的灵活性下降。可以将配置文件存放于外部存储上，每次发布从外部存储上读取配置文件，从而提高配置的灵活度。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;发布&quot;&gt;&lt;a href=&quot;#发布&quot; class=&quot;headerlink&quot; title=&quot;发布&quot;&gt;&lt;/a&gt;发布&lt;/h1&gt;&lt;p&gt;对于大型生产环境和复杂的业务系统来说，发布应用或者变更并不是简单的在服务器上运行二进制包或者脚本就可以了，这样很不便于服务的管理，且服务的稳</summary>
      
    
    
    
    <category term="架构" scheme="http://www.huckops.xyz/categories/%E6%9E%B6%E6%9E%84/"/>
    
    
    <category term="SRE" scheme="http://www.huckops.xyz/tags/SRE/"/>
    
  </entry>
  
  <entry>
    <title>自动化运维</title>
    <link href="http://www.huckops.xyz/2022/03/27/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E8%87%AA%E5%8A%A8%E5%8C%96%E8%BF%90%E7%BB%B4/"/>
    <id>http://www.huckops.xyz/2022/03/27/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E8%87%AA%E5%8A%A8%E5%8C%96%E8%BF%90%E7%BB%B4/</id>
    <published>2022-03-27T22:02:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="自动化运维的价值"><a href="#自动化运维的价值" class="headerlink" title="自动化运维的价值"></a>自动化运维的价值</h1><p>对于大型的计算集群或者线上（测试）环境，有成千上万台服务器在运行，单凭SRE进行手动变更或者运维是远不能支持平时工作的，所以这里就要讲到自动化运维的价值：</p><h2 id="一致性"><a href="#一致性" class="headerlink" title="一致性"></a>一致性</h2><p>大型集群在硬件层面很难做到完全统一，但是在系统层面、软件层面和服务层面是可以通过认为影响达到一致的。生产环境的一直能很大程度上标准化生产环境，以降低环境差异化带来的人力成本或者其他附加成本。比如在生产环境中，机器可以通过群组的方式进行批量管理，每个群组内的机器试用相同的环境配置，对于配置试用repo进行统一管理，由此可以把机器的管理粒度从单机提升到群组（但是同时带来一个隐患，如果对群组进行配置变更，如果配置失败影响面就会从单机上升到大批量）</p><p>硬件层面比较难做到一致化或者统一化（主要问题来源于设备供应商不同，不同设备的管理方式也不大一样），所以在设计自动化的时候尽量兼容更多的硬件平台或者指定CFI（以上两个选择一个就好，后者可能更贴合当前实际），同样在设计时也尽量使用通用管理方式，比如ipmi协议。</p><h2 id="平台化"><a href="#平台化" class="headerlink" title="平台化"></a>平台化</h2><p>对于变更或者运维操作具象成平台上操作的一个对象，对象可以是单机，也可以是集群，也可以是自定的群组。平台化是自动化运维的上层构建，通过平台对于下层管理的资源进行自动化运维，降低操作时间和操作量。同样也可以化解很多模式化或者重复性高的工作。</p><p>同样，一个成熟的环境不会只有一个平台进行运维支撑，所以设计平台时一般都会流出很大的横向扩展的裕量（比如API或者通用化模板）。</p><h2 id="修复速度快，行动敏捷，时间和人力成本有效节省"><a href="#修复速度快，行动敏捷，时间和人力成本有效节省" class="headerlink" title="修复速度快，行动敏捷，时间和人力成本有效节省"></a>修复速度快，行动敏捷，时间和人力成本有效节省</h2><p>自动化运维主体就是通过代码的方式解决很多的手动才能解决的问题，代码的执行效率总是高于人工的，所以在一些工作上面能显著提高效率。而且自动化可以针对一些常规故障的自动修复，或者在系统变更错误甚至系统崩溃时能进行快速回滚，实现MTTR指标提升。</p><h1 id="自动化运维软件的定义"><a href="#自动化运维软件的定义" class="headerlink" title="自动化运维软件的定义"></a>自动化运维软件的定义</h1><p>自动化运维软件可以定义为一个元软件。自动化软件通常是进行操作其他软件以达到全流程自动化的目的，其主要原理其实就是把人的机械性操作转换成软件。所以在自动化软件中经常能看见使用shell调用其他脚本或者命令，比如常见的useradd。</p><blockquote><p>在我就职于小米的时候，我的主要工作就是进行硬件层自动化的软件研发。因为采购的服务器品牌较多，比如dell的服务器使用的带外管理工具是idracadm，但是浪潮又使用的是其他自研工具。程序的通用性就不是很大，所以在真正的代码中基本都是使用命令行在调用厂商提供的管理工具（有些数据采集或者操作ipmi实在是做不了）。对照上边的三个价值review这项工作，可以很明了的看出自动化的价值在实际生产中的意义。</p></blockquote><h1 id="自动化分级"><a href="#自动化分级" class="headerlink" title="自动化分级"></a>自动化分级</h1><p>自动化分级通常也是使用的L5分级法，即L1是全无自动化，所有工作都需要认为干预，L5是自动化的最高等级，一切工作都无需认为干预，运维支持系统能完全实现自动化运维。</p><blockquote><p>人工智能分为三个阶段，自动化、智能化、智慧化。目前人工智能只是实现了自动化，正在向智能化发展。 –沃兹基硕德</p></blockquote><p>运维也是一样的。运维目前整个行业都在逐渐构建自动化运维体系，且国内没有一个规范的建设标准，都是以google为标杆建设自己的自动化运维系统。近年有公司的自动化已经趋于完善，开始实现AIOps（智能运维），但是始终离L5的运维支持系统还有很大差距。</p><h1 id="集群运维自动化"><a href="#集群运维自动化" class="headerlink" title="集群运维自动化"></a>集群运维自动化</h1><p>集群运维自动化，通常是要依赖于强大的基础设施的，比如：</p><ol><li>稳定安全的数据中心以及网络中心（排除非人为和不可抗外力）</li><li>强大的数据托管平台（比如CMDB）及其他的支持数据和资源</li><li>稳定良好的基础服务环境（比如无污染的DNS）</li><li>等等……</li></ol><blockquote><p>林子大了什么鸟都有</p></blockquote><p>难免自动化在集群中应用时会出现部分机器故障或者自动化失败，这个在上线以后是很致命的。所以可以引入一个Prodtest的概念，在上线之前进行测试，验证：</p><ol><li>服务器依赖是不是都可用，配置是不是想要的</li><li>一致性怎么样</li><li>是否能确定“例外”都是合理的</li></ol><p>比如对于一个基础设施服务的Prodtest就可以以如下方式进行：</p><ol><li>机器基本环境检查，装包检查，配置一致性检查</li><li>双线检查 a. 服务线：服务是否可用，稳定 b. 监控线：监控是否健康，稳定，数据是否准确（我确实见过监控数据和人工查数据不一致的情况的） </li></ol><p>注意，以上测试都是链式的，任何一个节点出问题都要停下来解决完成再向后推进。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;自动化运维的价值&quot;&gt;&lt;a href=&quot;#自动化运维的价值&quot; class=&quot;headerlink&quot; title=&quot;自动化运维的价值&quot;&gt;&lt;/a&gt;自动化运维的价值&lt;/h1&gt;&lt;p&gt;对于大型的计算集群或者线上（测试）环境，有成千上万台服务器在运行，单凭SRE进行手动变更或者</summary>
      
    
    
    
    <category term="架构" scheme="http://www.huckops.xyz/categories/%E6%9E%B6%E6%9E%84/"/>
    
    
    <category term="SRE" scheme="http://www.huckops.xyz/tags/SRE/"/>
    
  </entry>
  
  <entry>
    <title>密码管理中心</title>
    <link href="http://www.huckops.xyz/2021/11/20/cmdb/%E5%AF%86%E7%A0%81%E7%AE%A1%E7%90%86%E4%B8%AD%E5%BF%83(%E6%9C%8D%E5%8A%A1%E7%AB%AF)/"/>
    <id>http://www.huckops.xyz/2021/11/20/cmdb/%E5%AF%86%E7%A0%81%E7%AE%A1%E7%90%86%E4%B8%AD%E5%BF%83(%E6%9C%8D%E5%8A%A1%E7%AB%AF)/</id>
    <published>2021-11-20T20:57:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<p>对于生产传环境服务器，密码一定不是固定的。对于安全起见，密码需要设计成动态的，主要防止服务器而已登录和服务器权限管理。</p><p>密码管理中心的主要功能就是对于服务器的密码进行动态修改，并对生成的密码进行存储，密码展现给指定的业务线和系统管理员。</p><h1 id="JWT认证-Tag认证"><a href="#JWT认证-Tag认证" class="headerlink" title="JWT认证/Tag认证"></a>JWT认证/Tag认证</h1><h2 id="JWT认证"><a href="#JWT认证" class="headerlink" title="JWT认证"></a>JWT认证</h2><p>JWT是登录鉴权认证。用户登录系统时向API发送自己的账号和密码，服务端做账号和密码校验，如果校验成功，向前端返回一个加密JWT Token，作为用户的认证标志。</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line"><span class="attr">&quot;code&quot;</span><span class="punctuation">:</span> <span class="number">100</span><span class="punctuation">,</span></span><br><span class="line"><span class="attr">&quot;tag&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line"><span class="string">&quot;owt.sa&quot;</span><span class="punctuation">,</span></span><br><span class="line"><span class="string">&quot;owt.mi_img&quot;</span></span><br><span class="line"><span class="punctuation">]</span><span class="punctuation">,</span></span><br><span class="line"><span class="attr">&quot;token&quot;</span><span class="punctuation">:</span> <span class="string">&quot;eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6ImFkbWluIiwiZXhwIjoxNjM3NDIzNjU2LCJpc3MiOiJwYXNzd29yZCJ9.2jEUJyowdtzdz13zkgS6q_iS54ES0wk37uEKj0NN3-Y&quot;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>用户每次登录都会返回Token和自己用户Tag，前端可以根据传回来的Tag对一些页面进行有选择的渲染，达到前端权限控制的目的。</p><h2 id="Tag认证"><a href="#Tag认证" class="headerlink" title="Tag认证"></a>Tag认证</h2><p>Tag认证是不同于JWT认证的，Tag认证主要是认证用户权限。对于一个公司来说会有很多个部门，每个部门负责自己的一个业务线，每个业务线之间权限应该是独立的，业务线SRE都不能查询其他业务线的服务器信息。这里使用Tag，可以把主机绑定到业务线的Tag，把用户绑定到业务线的Tag组，用户查询密码或者服务器信息的时候只能查到自己业务线的主机，这样就杜绝了密码随意查看的问题。</p><p>在每个公司都会有一个SA组，这个组的主要工作是对服务器进行管理、维修，这个组应该持有最高权限，可以查询所有业务线的密码，所以单独使用一个业务线Tag，<code>owt.sa</code>。用户使用这个Tag之后可以随意查询所有业务线的服务器信息。</p><p><code>owt.sa</code>业务线Tag表明是系统最高管理员，所以也享有添加Tag和添加用户账号的权限。业务线SRE无权添加自己业务线的用户，只可以向SA申请添加（正在考虑是否给业务线设置一个业务线Super SRE作为业务线的维护领导，可以添加业务线SRE用户）。</p><h1 id="主机管理"><a href="#主机管理" class="headerlink" title="主机管理"></a>主机管理</h1><h2 id="主机发现-添加"><a href="#主机发现-添加" class="headerlink" title="主机发现/添加"></a>主机发现/添加</h2><h3 id="主机发现（待后期完善）"><a href="#主机发现（待后期完善）" class="headerlink" title="主机发现（待后期完善）"></a>主机发现（待后期完善）</h3><p>对于大型生产集群，服务器的体量可能会非常大，可能是几万台甚至十几万台，这样进行手动主机清单维护就是很麻烦的了。所以可以使用PasswordCenterClient配合PasswordCenterServer做一个自动发现，当主机装机完成之后把自己的信息转给服务端，服务端进行表校验，如果不存在告知服务器上报自己的设备信息，自动进行入库。但是这种方式对于小IDC机房来说是完全没有必要的，因为自动发现可能会出现一些问题，比如脏数据，或者发现失败等…..对于小机房来说成本会被无意间拉高，所以还有一个方式是手动录入。</p><h3 id="手动录入"><a href="#手动录入" class="headerlink" title="手动录入"></a>手动录入</h3><p>手动录入就是将服务器信息手动录入到系统中。PasswordCenterClient每次上报密码的时候都会先检查自己是否被录入到系统中，如果没有被录入则不更新密码，防止自动改密导致密码丢失无法登录主机（生产服务器的大忌）。这种方法对于小型生产环境的成本是最低的，指挥稍微增大一点点工作量，但是基本不会出现脏数据，不会给运维带来麻烦。</p><h2 id="主机密码管理"><a href="#主机密码管理" class="headerlink" title="主机密码管理"></a>主机密码管理</h2><h3 id="主机密码推送"><a href="#主机密码推送" class="headerlink" title="主机密码推送"></a>主机密码推送</h3><p>对于服务器密码来说，是绝密的，如果服务器密码被泄露可能会导致服务器被攻陷，所以在密码传输的时候必须进行密码加密。本项目使用AES加密，且在服务端存储的时候也是以AES密钥串的形式进行存储的，防止在密码传输过程中被恶意抓包而密码泄露。</p><h3 id="主机查询和主机密码查询"><a href="#主机查询和主机密码查询" class="headerlink" title="主机查询和主机密码查询"></a>主机查询和主机密码查询</h3><p>用户可以使用前端进行主机查询，但是在查询过程中只能返回自己Tag下面的服务器信息（除了<code>owt.sa</code>）。且在密码向前端传输的时候也是使用AES进行加密的，直接从源头上遏制了密码在传输过程中发生泄露。</p><p><strong>其他功能正在计划开发中，后续继续更新这个项目的解析</strong></p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;对于生产传环境服务器，密码一定不是固定的。对于安全起见，密码需要设计成动态的，主要防止服务器而已登录和服务器权限管理。&lt;/p&gt;
&lt;p&gt;密码管理中心的主要功能就是对于服务器的密码进行动态修改，并对生成的密码进行存储，密码展现给指定的业务线和系统管理员。&lt;/p&gt;
&lt;h1 id=</summary>
      
    
    
    
    <category term="运维平台" scheme="http://www.huckops.xyz/categories/%E8%BF%90%E7%BB%B4%E5%B9%B3%E5%8F%B0/"/>
    
    
    <category term="CMDB平台" scheme="http://www.huckops.xyz/tags/CMDB%E5%B9%B3%E5%8F%B0/"/>
    
  </entry>
  
  <entry>
    <title>生产风险管理</title>
    <link href="http://www.huckops.xyz/2021/10/05/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E9%A3%8E%E9%99%A9%E7%AE%A1%E7%90%86/"/>
    <id>http://www.huckops.xyz/2021/10/05/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E9%A3%8E%E9%99%A9%E7%AE%A1%E7%90%86/</id>
    <published>2021-10-05T16:26:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="e-x-效应"><a href="#e-x-效应" class="headerlink" title="$e^x$效应"></a>$e^x$效应</h1><p>在生产环境下，所有服务的可用性都不可能是100%，大多数企业都会尽量的向100%靠近，但是在极度接近100%的过程中，成本也会成指数上涨。可以理解为可靠性每提升一个等级，就会指出更多的成本，类似于$e^x$函数。</p><h1 id="风险管理"><a href="#风险管理" class="headerlink" title="风险管理"></a>风险管理</h1><p>规避风险的成本主要可以分为两部分：</p><ol><li><p>计算资源成本：包括服务器成本和运维成本</p></li><li><p>机会成本：由某组织承担，构建资源分配和减少风险系统、监控。</p></li></ol><h1 id="风险度量"><a href="#风险度量" class="headerlink" title="风险度量"></a>风险度量</h1><p>风险度量的指标主要是服务的可用性，但是可用性通常被分为两种：</p><h2 id="基于时间"><a href="#基于时间" class="headerlink" title="基于时间"></a>基于时间</h2><p>基于时间的服务可用性主要是衡量服务在整个服务时间内的可用时间，其计算公式为$\frac {T_e} {T_a}$。</p><h2 id="基于请求"><a href="#基于请求" class="headerlink" title="基于请求"></a>基于请求</h2><p>基于请求的服务可用性主要是衡量正确请求在所有请求中的比例，其计算公式为$\frac {R_e} {R_a}$。</p><h1 id="服务风险容忍"><a href="#服务风险容忍" class="headerlink" title="服务风险容忍"></a>服务风险容忍</h1><h2 id="可用性目标及可用性容忍"><a href="#可用性目标及可用性容忍" class="headerlink" title="可用性目标及可用性容忍"></a>可用性目标及可用性容忍</h2><p>可用性目标和可用性容忍通常是一个服务的最低可用性，如果低于这个可用性就可能导致用户对该服务产生负面的评价，同时也会影响服务的收入和上游甲方的评价下降。所以该部分应该更加专注于消费者。</p><h2 id="故障类型"><a href="#故障类型" class="headerlink" title="故障类型"></a>故障类型</h2><p>常见的故障主要是宕机故障，但是其他种类的故障也不容忽视。比如用户信息返回错乱，A用户可以免鉴权等操作直接可以查到B用户的个人信息甚至隐私信息，这样的服务就不能满足服务的几个基本特性之一的安全性。所以在用户侧的认知看来，安全故障的权重是高于宕机故障的，因为安全故障在很大程度上可能会损害用户对于该种服务的信任程度。所以在安全面前，质量应该让路。</p><h2 id="成本"><a href="#成本" class="headerlink" title="成本"></a>成本</h2><p>前面说过，可用性和成本并非是线性的，而是乘方关系。对于常见服务来说，可能99.95%的可用性已经足够了，但是如果想把服务优化到99.99%，可能需要花费100万的成本，如果想进一步提升，将可用性提升到99.999%，就可能会花费1000万甚至更高。通过成本和可用性的衡量可以得出，在用户容忍的范围内，应该做到成本和可靠性平衡。</p><h2 id="其他风险容忍"><a href="#其他风险容忍" class="headerlink" title="其他风险容忍"></a>其他风险容忍</h2><p>对于其他风险，只要集中在延迟方面。如打开网页速度缓慢等情况。对于普通网页来说，过长的相应周期会导致用户对该网页的评价降低。如广告服务，响应时间过长可能在用户侧的评价变化不会很大，但是在广告主会对服务的评价产生很大的变化、</p><h1 id="基础设施风险容忍"><a href="#基础设施风险容忍" class="headerlink" title="基础设施风险容忍"></a>基础设施风险容忍</h1><h2 id="可用性目标及可用性容忍-1"><a href="#可用性目标及可用性容忍-1" class="headerlink" title="可用性目标及可用性容忍"></a>可用性目标及可用性容忍</h2><p>定义和服务风险容忍相同。</p><h2 id="故障类型-1"><a href="#故障类型-1" class="headerlink" title="故障类型"></a>故障类型</h2><p>该部分的故障可能存在偏差。如高消耗低延迟的服务请求，用户想要的是队列中无数据，即处理速度够快。但是对于高IO的请求来说，希望队列中永远都不为空，这样可以最大程度上降低因为CPU空闲导致的效率低。所以对于基础服务来说，是不是故障完全取决于基础设施种类。</p><h2 id="成本容忍"><a href="#成本容忍" class="headerlink" title="成本容忍"></a>成本容忍</h2><p>权衡低延迟和高IO两种基础设施，可以得出一个这种方案，即按照计算需求进行资源配比，创建两套计算集群，每个集群做与自己性能相匹配的job，这样可以提高资源的合理配置率和计算的质量。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;e-x-效应&quot;&gt;&lt;a href=&quot;#e-x-效应&quot; class=&quot;headerlink&quot; title=&quot;$e^x$效应&quot;&gt;&lt;/a&gt;$e^x$效应&lt;/h1&gt;&lt;p&gt;在生产环境下，所有服务的可用性都不可能是100%，大多数企业都会尽量的向100%靠近，但是在极度接近10</summary>
      
    
    
    
    <category term="架构" scheme="http://www.huckops.xyz/categories/%E6%9E%B6%E6%9E%84/"/>
    
    
    <category term="SRE" scheme="http://www.huckops.xyz/tags/SRE/"/>
    
  </entry>
  
  <entry>
    <title>生产动力环境</title>
    <link href="http://www.huckops.xyz/2021/10/04/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E7%94%9F%E4%BA%A7%E5%8A%A8%E5%8A%9B%E7%8E%AF%E5%A2%83/"/>
    <id>http://www.huckops.xyz/2021/10/04/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/%E7%94%9F%E4%BA%A7%E5%8A%A8%E5%8A%9B%E7%8E%AF%E5%A2%83/</id>
    <published>2021-10-04T23:58:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="硬件层"><a href="#硬件层" class="headerlink" title="硬件层"></a>硬件层</h1><p>通常来说，目前服务器大多都使用的是本地硬件服务器或者云服务器。本地服务器常见是使用X86架构，目前华为鲲鹏在发展ARM架构服务器。云服务器通常是虚拟服务器，大部分都是基于KVM（QEMU）的。</p><p>目前企业很少会使用单种云，大部分都是使用混合云，对于涉密业务或者大量运算的业务则放在本地机房的物理服务器中，对于公网服务就放在公有云服务器。</p><p>混合云和多地多中心的机房优势在于：</p><ol><li><p>多地容灾，单计算中心故障有其余中心冗余</p></li><li><p>服务隔离，涉密服务本地运行，保证数据安全</p></li></ol><h1 id="软件层"><a href="#软件层" class="headerlink" title="软件层"></a>软件层</h1><h2 id="管理"><a href="#管理" class="headerlink" title="管理"></a>管理</h2><h3 id="计算资源管理"><a href="#计算资源管理" class="headerlink" title="计算资源管理"></a>计算资源管理</h3><p>构建CMDB，集中化管理线上和测试计算资源，将服务器信息和网络资源信息进行精细化管理，落实到计算实例上，有效提高资源管理效率。</p><p>对于批量运维任务，可以使用Ansible等自动化运维工具做自动化管理。</p><h3 id="计算任务管理"><a href="#计算任务管理" class="headerlink" title="计算任务管理"></a>计算任务管理</h3><p>将计算任务批量化，使用批量部署工具进行任务提交，多机部署多计算实例，提高计算任务管理效率。</p><p>本处可以使用Google提供的Borg或者Kubernetes做计算任务管理。批量提交计算任务后，每个计算实体都是以一个（类）hostname代替。该种方式在Borg中被叫做BNS，即Borg Name Server，将Borg任务解析到指定IP。</p><h3 id="存储服务"><a href="#存储服务" class="headerlink" title="存储服务"></a>存储服务</h3><p>计算节点应配备本地磁盘，但是该部分磁盘不作为存储盘使用，只保存计算任务产生的临时文件。对于持久化文件可以写入到HDFS等文件存储系统。</p><h1 id="网络层"><a href="#网络层" class="headerlink" title="网络层"></a>网络层</h1><p>在生产环境中常用到SDN进行网络组网，通过SDN实现负载均衡，并将宽带合理的分配给计算资源，防止因某计算任务发生网络独占，由此可以提高网络方面的成本转化率。</p><p>对于文件存储服务或者大文件传输比较多的服务，通常会使用到CDN服务，但是在Google的生产环境中，其使用的是GSLB，即全球负载均衡服务。该种架构即在全球多地部署计算节点（中心），当用户请求服务器时，智能DNS会返回距离用户最近、负载最小的计算节点，由此可以大幅提高用户的访问效果。</p><h1 id="其他层"><a href="#其他层" class="headerlink" title="其他层"></a>其他层</h1><h2 id="监控服务"><a href="#监控服务" class="headerlink" title="监控服务"></a>监控服务</h2><p>构建快速有效的监控系统，尽可能多的收集有效数据（并不是越多越好），在某些大型互联网公司，如果有相关的技术支持和财力支持，完全可以使用监控抓取到的数据进行分析，从而可以进行故障判断，提前规避故障。</p><h1 id="研发管理"><a href="#研发管理" class="headerlink" title="研发管理"></a>研发管理</h1><p>软件研发必要的两个要素：团队合作和文档输出。</p><h2 id="团队合作"><a href="#团队合作" class="headerlink" title="团队合作"></a>团队合作</h2><p>尽可能使用代码托管平台，如SVN或者Git。这样有利于团队内代码的代码需求拉通。代码一定要有review机制，开发者禁止直接将代码推入正式分支，所有请求必须提交pr，并由第三人进行代码审计，着重关注pr中的变更。</p><h2 id="文档输出"><a href="#文档输出" class="headerlink" title="文档输出"></a>文档输出</h2><p>软件研发过程中会输出很多文档，如代码规范、API接口等，这些文档要保证团队内的所有成员随时可编辑，随时可查看。可以使用wiki进行文档托管。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;硬件层&quot;&gt;&lt;a href=&quot;#硬件层&quot; class=&quot;headerlink&quot; title=&quot;硬件层&quot;&gt;&lt;/a&gt;硬件层&lt;/h1&gt;&lt;p&gt;通常来说，目前服务器大多都使用的是本地硬件服务器或者云服务器。本地服务器常见是使用X86架构，目前华为鲲鹏在发展ARM架构服务器。云</summary>
      
    
    
    
    <category term="架构" scheme="http://www.huckops.xyz/categories/%E6%9E%B6%E6%9E%84/"/>
    
    
    <category term="SRE" scheme="http://www.huckops.xyz/tags/SRE/"/>
    
  </entry>
  
  <entry>
    <title>SRE架构</title>
    <link href="http://www.huckops.xyz/2021/10/03/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/SRE%E6%9E%B6%E6%9E%84/"/>
    <id>http://www.huckops.xyz/2021/10/03/SRE%E6%96%B9%E6%B3%95%E8%AE%BA/SRE%E6%9E%B6%E6%9E%84/</id>
    <published>2021-10-03T12:36:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="SA和Dev-Ops的挑战"><a href="#SA和Dev-Ops的挑战" class="headerlink" title="SA和Dev/Ops的挑战"></a>SA和Dev/Ops的挑战</h1><p>通常企业会招聘很多的SA工程师（系统管理员），该部分工程师负责企业生产环境组件的运维，对于需要人工干预的操作进行人工干预。</p><p>目前在国内，各公司通常做法是SRE和SA分离，SRE负责业务侧管理和维护，SA工程师负责企业基础架构的维护，包括服务器、基础环境维护。同时Dev和Ops部门分离，所以带来两种成本。</p><ol><li><p>直接成本：业务线扩展，Dev和Ops同步增加，人数共同增长。</p></li><li><p>间接成本： 消息拉通、同步问题，可能会导致各部门信息不同步，间接导致团队目标不统一。</p></li></ol><p>Dev和Ops部门最大的工作分歧通常出现在新版本发布、变更期间，也可能是发布速度原因。</p><ul><li><p>版本发布： Dev希望随时发布，但是Ops部门希望生产环境一成不变，避免非必要情况下的变更和发版（新版本发布、变更和动力环境割接都有可能导致系统崩溃）</p></li><li><p>发布速度： Dev部门希望发布是0秒，质量部门希望服务是没有抖动的，但通常来说是不可能的（即使使用平稳变更），可能在变更或者发布的时候会出现服务短暂不可用的情况，所以就可能出现发版或者变更速度跟不上Dev和质量部门的要求。</p></li></ul><h1 id="大型互联网企业SRE解决方案"><a href="#大型互联网企业SRE解决方案" class="headerlink" title="大型互联网企业SRE解决方案"></a>大型互联网企业SRE解决方案</h1><p>大型互联网企业会组建DevOps团队，其中，该团队的大部分工程师来自软件开发，其余部分工程师可能不是软件开发工程师，但是该部分成员都具有软件开发工程师的基本技能，可以以软件工程的思路解决SRE问题，同样也可以很容易的同步其他开发部门的需求和信息。</p><p>对于常见的互联网架构中，SRE需要完成的大部分工作都是机械性的，所以配备软件开发工程师的优势就在这里体现出来了。机械性的操作，软件开发工程师会构建一些自动化工具进行机械性操作，提高运维效率。</p><p>目前国内大型企业中，都在逐步取消Ops工程师的岗位，取而代之的是SRE工程师或者DevOps工程师。如小米，目前已经基本不招聘运维工程师了，招聘主要是DevOps工程师，该部分工程师都具有较强的软件开发能力。这部分工程师的工作内容大部分都是构建开发自己的运维平台，解决自己企业目前运维所遇到的问题。</p><h1 id="SRE方法论"><a href="#SRE方法论" class="headerlink" title="SRE方法论"></a>SRE方法论</h1><p>SRE基本系统构建：</p><ul><li>性能优化，延迟优化，可用性优化，效率优化</li><li>变更管理</li><li>监控</li><li>紧急事务处理</li><li>容量规划</li></ul><h2 id="研发工作跟进"><a href="#研发工作跟进" class="headerlink" title="研发工作跟进"></a>研发工作跟进</h2><p>SRE团队应设备on call机制，on call工程师在工作时间内1on1对接业务侧需求。单个工程师on call时间不应过长，过长会导致处理效率下降。</p><p>On call工程师在完成当日工作后应对当天处理的故障及紧急情况做文本输出，作为部门或者全司的wiki输入。</p><h2 id="保证SLO下提升迭代速度"><a href="#保证SLO下提升迭代速度" class="headerlink" title="保证SLO下提升迭代速度"></a>保证SLO下提升迭代速度</h2><p>通常情况下，快速迭代和稳定的服务是很难沾上边的。所以SRE的一项很重要的工作就是保证SLO的情况下提高迭代速度。</p><p>对于任何服务，可用性都不可能是100%，即使真的能把服务做到100%，整个互联网传输过程也不会是100%。如本服务的可用性是99.99%，互联网会有很多层的交换路由，这些设备的可用性也不可能是100%，这里可用性是要通过乘法计算的。</p><p><a href="https://imgtu.com/i/4qTrt0"><img src="https://z3.ax1x.com/2021/10/03/4qTrt0.png" alt="4qTrt0.png"></a></p><p>所以，SRE需要考虑如何提升SLO，那么就需要考虑一下三个方向：</p><ol><li>最低可用性是多少，低于这个指标可能会导致用户使用体验降低。</li><li>可不可以用替代方案提高可用性。</li><li>可用性降低会不会影响用户的使用模式</li></ol><h2 id="监控"><a href="#监控" class="headerlink" title="监控"></a>监控</h2><p>监控在运维体系是非常有必要的。监控并不是一个软件，而是一个方法论。监控可以发现软硬件故障并进行报警。监控体系通常有三个途径：</p><ol><li>监控及报警</li><li>工单系统</li><li>日志系统</li></ol><h2 id="应急情况"><a href="#应急情况" class="headerlink" title="应急情况"></a>应急情况</h2><p>对于应急情况需要指定MTTF（平均失效时间）和MTTR（平均恢复时间）。通常对于变更产生的应急状况，要制定严格的MTTF，即失效多久之内必须恢复，恢复时间应小于MTTR。</p><h2 id="变更管理"><a href="#变更管理" class="headerlink" title="变更管理"></a>变更管理</h2><p>对于常见的发布和变更必须要进行备份即容灾，要保证变更有迹可循，变更故障随时回滚。通常对于大型企业都会构建一个配置分发平台，对于变更的配置进行统一下发并讲原配置进行备份。对于每一条变更都要能对应到责任人，并且保证在MTTF内恢复服务，超时即回滚。</p><h2 id="需求和容量"><a href="#需求和容量" class="headerlink" title="需求和容量"></a>需求和容量</h2><p>发布上线之前要进行服务器容量评估，其数据必须建立在已有的运维和运营数据之上进行评估，要设计一部分的冗余。并且要定期对服务器定期进行打压测试，以便把资源信息和容量对应。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;SA和Dev-Ops的挑战&quot;&gt;&lt;a href=&quot;#SA和Dev-Ops的挑战&quot; class=&quot;headerlink&quot; title=&quot;SA和Dev/Ops的挑战&quot;&gt;&lt;/a&gt;SA和Dev/Ops的挑战&lt;/h1&gt;&lt;p&gt;通常企业会招聘很多的SA工程师（系统管理员），该部分</summary>
      
    
    
    
    <category term="架构" scheme="http://www.huckops.xyz/categories/%E6%9E%B6%E6%9E%84/"/>
    
    
    <category term="SRE" scheme="http://www.huckops.xyz/tags/SRE/"/>
    
  </entry>
  
  <entry>
    <title>K8S代码生成器</title>
    <link href="http://www.huckops.xyz/2021/09/12/container/K8S%E4%BB%A3%E7%A0%81%E7%94%9F%E6%88%90%E5%99%A8/"/>
    <id>http://www.huckops.xyz/2021/09/12/container/K8S%E4%BB%A3%E7%A0%81%E7%94%9F%E6%88%90%E5%99%A8/</id>
    <published>2021-09-12T10:42:01.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<figure class="highlight makefile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta"><span class="keyword">.PHONY</span>: generated_files</span></span><br><span class="line"><span class="section">generated_files: gen_prerelease_lifecycle gen_deepcopy gen_defaulter gen_conversion gen_openapi</span></span><br></pre></td></tr></table></figure><p>generated_files文件定义了代码生成器，以上为k8s默认的几种代码生成器。</p><h1 id="Tags"><a href="#Tags" class="headerlink" title="Tags"></a>Tags</h1><p>代码生成器通过Tags识别要生成的代码和代码生成的方式。</p><h2 id="全局tags"><a href="#全局tags" class="headerlink" title="全局tags"></a>全局tags</h2><p>全局tags定义在doc.go中，对整个包中类型自动生成代码。</p><p><strong>占坑，后期填</strong></p>]]></content>
    
    
      
      
    <summary type="html">&lt;figure class=&quot;highlight makefile&quot;&gt;&lt;table&gt;&lt;tr&gt;&lt;td class=&quot;gutter&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;1&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;2&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/</summary>
      
    
    
    
    <category term="云计算" scheme="http://www.huckops.xyz/categories/%E4%BA%91%E8%AE%A1%E7%AE%97/"/>
    
    
    <category term="容器技术" scheme="http://www.huckops.xyz/tags/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
  </entry>
  
  <entry>
    <title>K8S源码构建</title>
    <link href="http://www.huckops.xyz/2021/09/12/container/K8S%E6%BA%90%E7%A0%81%E6%9E%84%E5%BB%BA/"/>
    <id>http://www.huckops.xyz/2021/09/12/container/K8S%E6%BA%90%E7%A0%81%E6%9E%84%E5%BB%BA/</id>
    <published>2021-09-12T09:36:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<p>K8S是使用golang进行编写的，所以在运行的时候需要将golang代码转换成二进制，K8S代码的构建方式有三种，本地构建、容器环境构建和Bazel环境构建。</p><h1 id="本地构建"><a href="#本地构建" class="headerlink" title="本地构建"></a>本地构建</h1><p>和C++项目类似的，大型项目不可能使用命令行逐个进行<code>go build</code>，所以可以使用makefile的方法构建项目。</p><p>在k8s所有项目中，存在两个MakeFile文件：</p><ul><li><p>Makefile：描述项目的编译顺序、编译规则和输出。</p></li><li><p>Makefile.generated_files： 描述代码生成逻辑。</p></li></ul><h2 id="Makefile文件解析"><a href="#Makefile文件解析" class="headerlink" title="Makefile文件解析"></a>Makefile文件解析</h2><figure class="highlight makefile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta"><span class="keyword">.PHONY</span>: all</span></span><br><span class="line"><span class="keyword">ifeq</span> (<span class="variable">$(PRINT_HELP)</span>,y)</span><br><span class="line"><span class="section">all:</span></span><br><span class="line">@echo <span class="string">&quot;$$ALL_HELP_INFO&quot;</span></span><br><span class="line"><span class="keyword">else</span></span><br><span class="line"><span class="section">all: generated_files</span></span><br><span class="line">hack/make-rules/build.sh <span class="variable">$(WHAT)</span></span><br><span class="line"><span class="keyword">endif</span></span><br></pre></td></tr></table></figure><p>这是执行<code>make all</code>的第一步，判断了是否为帮助输出，非帮助输出时，执行generated_files，用于代码生成，然后调用<code>hack/make-rules/build.sh</code>进行构建，参数<code>$(WHAT)</code>为欲构建的组件列表。</p><p>追溯到<code>hack/make-rules/build.sh</code>文件，其中调用的第一段函数是<code>kube::golang::build_binaries &quot;$@&quot;</code>，该段函数进行二进制构建，传入值即为上边说过的<code>$(WHAT)</code>。</p><p>调用链：</p><p>kube::golang::build_binaries -&gt; kube::golang::host_platform(获取平台类型) -&gt; kube::golang::get_physmem(判断内存是否达到标准) -&gt; kube::golang::build_binaries_for_platform(构建指定平台的二进制) -&gt; kube::golang::build_some_binaries(构建二进制) -&gt; go install</p><h1 id="容器构建"><a href="#容器构建" class="headerlink" title="容器构建"></a>容器构建</h1><p>以下为<code>make release</code>的构建代码：</p><figure class="highlight makefile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta"><span class="keyword">.PHONY</span>: release release-in-a-container</span></span><br><span class="line"><span class="keyword">ifeq</span> (<span class="variable">$(PRINT_HELP)</span>,y)</span><br><span class="line">release release-in-a-container:</span><br><span class="line">@echo <span class="string">&quot;$$RELEASE_HELP_INFO&quot;</span></span><br><span class="line"><span class="keyword">else</span></span><br><span class="line">release release-in-a-container: KUBE_BUILD_CONFORMANCE = y</span><br><span class="line"><span class="section">release:</span></span><br><span class="line">build/release.sh</span><br><span class="line"><span class="section">release-in-a-container:</span></span><br><span class="line">build/release-in-a-container.sh</span><br><span class="line"><span class="keyword">endif</span></span><br></pre></td></tr></table></figure><p>以下为<code>make quick-release</code>的构建代码：</p><figure class="highlight makefile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta"><span class="keyword">.PHONY</span>: release-skip-tests quick-release</span></span><br><span class="line"><span class="keyword">ifeq</span> (<span class="variable">$(PRINT_HELP)</span>,y)</span><br><span class="line">release-skip-tests quick-release:</span><br><span class="line">@echo <span class="string">&quot;$$RELEASE_SKIP_TESTS_HELP_INFO&quot;</span></span><br><span class="line"><span class="keyword">else</span></span><br><span class="line">release-skip-tests quick-release: KUBE_RELEASE_RUN_TESTS = n</span><br><span class="line">release-skip-tests quick-release: KUBE_FASTBUILD = true</span><br><span class="line">release-skip-tests quick-release:</span><br><span class="line">build/release.sh</span><br><span class="line"><span class="keyword">endif</span></span><br></pre></td></tr></table></figure><p>可以看出，两种构建方式使用的家谱本是同一个，但是<code>quick-release</code>多了两个变量<code>KUBE_RELEASE_RUN_TESTS</code>和<code>KUBE_FASTBUILD</code>。</p><p>追溯到<code>build/release.sh</code>:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">kube::build::verify_prereqs   #检查构建环境</span><br><span class="line">kube::build::build_image    #构建镜像</span><br><span class="line">kube::build::run_build_command make cross   #构建</span><br><span class="line"></span><br><span class="line">if [[ $KUBE_RELEASE_RUN_TESTS =~ ^[yY]$ ]]; then  #是否进行检查/测试</span><br><span class="line">  kube::build::run_build_command make test</span><br><span class="line">  kube::build::run_build_command make test-integration</span><br><span class="line">fi</span><br><span class="line"></span><br><span class="line">kube::build::copy_output    #拷贝输出</span><br><span class="line"></span><br><span class="line">kube::release::package_tarballs #打包</span><br></pre></td></tr></table></figure><p>构建时会使用三个容器：</p><p>build： 进行构建工作的容器</p><p>data： 数据存储容器</p><p>sync：同步容器</p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;K8S是使用golang进行编写的，所以在运行的时候需要将golang代码转换成二进制，K8S代码的构建方式有三种，本地构建、容器环境构建和Bazel环境构建。&lt;/p&gt;
&lt;h1 id=&quot;本地构建&quot;&gt;&lt;a href=&quot;#本地构建&quot; class=&quot;headerlink&quot; tit</summary>
      
    
    
    
    <category term="云计算" scheme="http://www.huckops.xyz/categories/%E4%BA%91%E8%AE%A1%E7%AE%97/"/>
    
    
    <category term="容器技术" scheme="http://www.huckops.xyz/tags/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
  </entry>
  
  <entry>
    <title>基于K8s的LNMP架构（HPA）</title>
    <link href="http://www.huckops.xyz/2021/09/04/container/%E5%9F%BA%E4%BA%8EK8S%E7%9A%84LNMP%E6%9E%B6%E6%9E%84%EF%BC%88HPA%EF%BC%89/"/>
    <id>http://www.huckops.xyz/2021/09/04/container/%E5%9F%BA%E4%BA%8EK8S%E7%9A%84LNMP%E6%9E%B6%E6%9E%84%EF%BC%88HPA%EF%BC%89/</id>
    <published>2021-09-04T23:58:34.000Z</published>
    <updated>2026-06-06T17:04:29.927Z</updated>
    
    <content type="html"><![CDATA[<h1 id="LNMP"><a href="#LNMP" class="headerlink" title="LNMP"></a>LNMP</h1><p>LNMP架构是常见的Web全栈架构，目前许多网站都使用了该种方法进行开发。对于常见的传统架构，服务器可靠性不是很高，Nginx、PHP或者MySQL任意一个中间件发生故障都可能导致生产环境Web页面崩溃。或者PHP网站在发生高并发时，如果使用传统架构的单节点可能会服务器性能不足，不足以支持过高的并发量，所以将LNMP迁移到k8s架构上会解决以上问题。</p><h1 id="架构设计"><a href="#架构设计" class="headerlink" title="架构设计"></a>架构设计</h1><h2 id="MySQL"><a href="#MySQL" class="headerlink" title="MySQL"></a>MySQL</h2><p>MySQL是一种有状态服务，MySQL在某些情况下如果发生故障性退出可能会出现服务无法再次启动的情况，所以数据库一般都不会被创建到容器中，通常都是将数据库使用一个物理节点进行运行的。</p><p>本实例是使用k8s对LNMP进行全架构迁移，所以将数据库生成到k8s容器中。数据库中存放的是重要的业务数据，无论在任何时候都要保证数据不丢失。对于k8s的pod，其文件具有易失性，如果在pod发生故障的时候会重新生成pod，原pod中的文件将不会被保存。理论上来说，可以使用pod的hostPath进行本地存储，但是通常k8s集群都是多台主机运行的，数据库副本也通常是飘移的，当数据库副本被调度到其他节点的时候原数据将不会被保留。所以这里可以使用一个外部文件存储StorageClass生成PVC进行动态挂载，MySQL调度到其他节点时也可以读取到数据文件。</p><p>从上边描述可以看出，如果数据库发生故障的时候可能会影响业务，但是对于k8s来说，使用MySQL做主从幅值和读写分离是比较麻烦的，所以在使用Deployment的时候不宜生成多副本。本处只生成一个副本进行测试。</p><h2 id="PHP和Nginx"><a href="#PHP和Nginx" class="headerlink" title="PHP和Nginx"></a>PHP和Nginx</h2><p>在LNMP架构中，Nginx作为前端和反向代理服务器。当Nginx接收到请求时，如果是静态文件请求，则Nginx直接进行响应。当请求为php请求时，nginx将请求转发给php后端服务进行逻辑处理。</p><p>从上边原理可见得，lnmp请求处理是由两部分构成的，也并非所有请求都要经过这两个环节，所以可能会出现两个环节的负载/请求量不同的情况。在使用k8s做lnmp架构时，可以将nginx和php进行分离，分别使用多副本动态调度，以应对不同环节的不同负载状况。</p><h1 id="配置文件"><a href="#配置文件" class="headerlink" title="配置文件"></a>配置文件</h1><h2 id="PVC声明"><a href="#PVC声明" class="headerlink" title="PVC声明"></a>PVC声明</h2><p>本实例已经事先声明了Cephfs的StorageClass，故本处不再赘述。直接使用声明好的SC进行PVC申请声明。</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">PersistentVolumeClaim</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span>  <span class="string">mysql</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">resources:</span></span><br><span class="line">    <span class="attr">requests:</span></span><br><span class="line">      <span class="attr">storage:</span> <span class="string">5Gi</span></span><br><span class="line">  <span class="attr">accessModes:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">ReadWriteMany</span></span><br><span class="line">  <span class="attr">storageClassName:</span> <span class="string">ceph</span></span><br><span class="line"></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">PersistentVolumeClaim</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span>  <span class="string">dz</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">resources:</span></span><br><span class="line">    <span class="attr">requests:</span></span><br><span class="line">      <span class="attr">storage:</span> <span class="string">5Gi</span></span><br><span class="line">  <span class="attr">accessModes:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="string">ReadWriteMany</span></span><br><span class="line">  <span class="attr">storageClassName:</span> <span class="string">ceph</span></span><br></pre></td></tr></table></figure><p>mysql PVC作为mysql的文件存储PVC，主要存储数据库生成的文件以及其他数据库产生的数据文件。前边有提及到不使用容器生成数据库的原因，还有一个较大的原因是使用网络存储，无论是网络延迟还是存储网络掉线都可能会影响数据库的服务稳定性。本处使用的权限符可以是ReadWriteOnce，这样只可被一个容器进行挂载，也可以满足mysql单副本的要求。</p><p>dz PVC用来存放php程序。php程序的前后端通常都是不分离的，php程序中有很大一部分都是静态资源，所以这个PVC要被设置成ReadWriteMany，用以挂载到多个容器上，实现nginx和php的文件共享。</p><h2 id="MySQL资源声明"><a href="#MySQL资源声明" class="headerlink" title="MySQL资源声明"></a>MySQL资源声明</h2><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">mysql</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">matchLabels:</span></span><br><span class="line">      <span class="attr">app:</span> <span class="string">mysql</span></span><br><span class="line">  <span class="attr">template:</span></span><br><span class="line">    <span class="attr">metadata:</span></span><br><span class="line">      <span class="attr">labels:</span></span><br><span class="line">        <span class="attr">app:</span> <span class="string">mysql</span></span><br><span class="line">    <span class="attr">spec:</span></span><br><span class="line">      <span class="attr">containers:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">mysql</span></span><br><span class="line">        <span class="attr">image:</span> <span class="string">mysql:5.6</span></span><br><span class="line">        <span class="attr">resources:</span></span><br><span class="line">          <span class="attr">limits:</span></span><br><span class="line">            <span class="attr">memory:</span> <span class="string">&quot;512Mi&quot;</span></span><br><span class="line">            <span class="attr">cpu:</span> <span class="string">&quot;500m&quot;</span></span><br><span class="line">        <span class="attr">ports:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">containerPort:</span> <span class="number">3306</span></span><br><span class="line">        <span class="attr">livenessProbe:</span></span><br><span class="line">          <span class="attr">tcpSocket:</span></span><br><span class="line">            <span class="attr">port:</span> <span class="number">3306</span></span><br><span class="line">          <span class="attr">initialDelaySeconds:</span> <span class="number">90</span></span><br><span class="line">          <span class="attr">periodSeconds:</span> <span class="number">10</span></span><br><span class="line">        <span class="attr">env:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">MYSQL_ROOT_PASSWORD</span></span><br><span class="line">            <span class="attr">value:</span> <span class="string">sjh080815</span></span><br><span class="line">        <span class="attr">volumeMounts:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">mountPath:</span> <span class="string">/var/lib/mysql</span></span><br><span class="line">            <span class="attr">name:</span> <span class="string">mysql-pvc</span></span><br><span class="line">      <span class="attr">volumes:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">mysql-pvc</span></span><br><span class="line">          <span class="attr">persistentVolumeClaim:</span></span><br><span class="line">              <span class="attr">claimName:</span> <span class="string">mysql</span></span><br><span class="line">              </span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Service</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">mysql</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">app:</span> <span class="string">mysql</span></span><br><span class="line">  <span class="attr">ports:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">mysql</span></span><br><span class="line">    <span class="attr">port:</span> <span class="number">3306</span></span><br><span class="line">    <span class="attr">targetPort:</span> <span class="number">3306</span> </span><br></pre></td></tr></table></figure><p>创建一个单副本的MySQL Deployment，将前边创建的pvc挂载给pod，并且创建一个3306端口的监听。因为数据库初始化过程是比较缓慢的，所以创建生存探针的时候将initialDelaySeconds时间设置较长，若90秒内端口还未正常工作，控制器将会删除这个pod重新生成。90秒后如果正常运行即每10s进行一次端口探测，若不存在就重生pod。</p><p>创建一个Service，将pod生成一个endpoint后映射到cluster IP上，将容器的3306端口映射到cluster IP上。</p><h2 id="PHP资源声明"><a href="#PHP资源声明" class="headerlink" title="PHP资源声明"></a>PHP资源声明</h2><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">php</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">matchLabels:</span></span><br><span class="line">      <span class="attr">app:</span> <span class="string">php</span></span><br><span class="line">  <span class="attr">template:</span></span><br><span class="line">    <span class="attr">metadata:</span></span><br><span class="line">      <span class="attr">labels:</span></span><br><span class="line">        <span class="attr">app:</span> <span class="string">php</span></span><br><span class="line"></span><br><span class="line">    <span class="attr">spec:</span></span><br><span class="line">      <span class="attr">containers:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">php</span></span><br><span class="line">        <span class="attr">image:</span> <span class="string">registry.cn-hangzhou.aliyuncs.com/sjh080815/php-mysqli</span></span><br><span class="line">        <span class="attr">resources:</span></span><br><span class="line">          <span class="attr">limits:</span></span><br><span class="line">            <span class="attr">memory:</span> <span class="string">&quot;128Mi&quot;</span></span><br><span class="line">            <span class="attr">cpu:</span> <span class="string">&quot;500m&quot;</span></span><br><span class="line">        <span class="attr">ports:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">containerPort:</span> <span class="number">9000</span></span><br><span class="line">        <span class="attr">livenessProbe:</span></span><br><span class="line">          <span class="attr">tcpSocket:</span></span><br><span class="line">            <span class="attr">port:</span> <span class="number">9000</span></span><br><span class="line">          <span class="attr">initialDelaySeconds:</span> <span class="number">20</span></span><br><span class="line">          <span class="attr">periodSeconds:</span> <span class="number">10</span></span><br><span class="line">        <span class="attr">volumeMounts:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">php-static</span></span><br><span class="line">            <span class="attr">mountPath:</span> <span class="string">/usr/share/nginx/html</span></span><br><span class="line">      <span class="attr">volumes:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">php-static</span></span><br><span class="line">          <span class="attr">persistentVolumeClaim:</span></span><br><span class="line">              <span class="attr">claimName:</span> <span class="string">dz</span></span><br><span class="line"></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">autoscaling/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">HorizontalPodAutoscaler</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">php</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">maxReplicas:</span> <span class="number">5</span></span><br><span class="line">  <span class="attr">minReplicas:</span> <span class="number">2</span></span><br><span class="line">  <span class="attr">scaleTargetRef:</span></span><br><span class="line">    <span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line">    <span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line">    <span class="attr">name:</span> <span class="string">php</span></span><br><span class="line">  <span class="attr">targetCPUUtilizationPercentage:</span> <span class="number">85</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Service</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">php</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">app:</span> <span class="string">php</span></span><br><span class="line">  <span class="attr">ports:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">php</span></span><br><span class="line">    <span class="attr">port:</span> <span class="number">9000</span></span><br><span class="line">    <span class="attr">targetPort:</span> <span class="number">9000</span></span><br></pre></td></tr></table></figure><p>生成一个php Deployment，挂载PVC到容器中，设置一个生存探针，防止容器发生故障。</p><p>创建一个HPA，以应对高并发场景。HPA扩展的依据是当CPU使用量超过85%，当容器的CPU使用量超过85%时，调度器创建新的pod。</p><p>创建一个Service，将php pod网络映射到cluster中。创建Service创建了endpoint，每当生成一个pod的时候，selector会自动选择出带有指定label的pod加入到endpoint中。</p><h2 id="Nginx资源声明"><a href="#Nginx资源声明" class="headerlink" title="Nginx资源声明"></a>Nginx资源声明</h2><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ConfigMap</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">nginx-config</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">data:</span></span><br><span class="line">  <span class="attr">default.conf:</span> <span class="string">|-</span></span><br><span class="line"><span class="string">    server &#123;</span></span><br><span class="line"><span class="string">        listen 80;</span></span><br><span class="line"><span class="string">        server_name localhost;</span></span><br><span class="line"><span class="string">        root /usr/share/nginx/html;</span></span><br><span class="line"><span class="string">        index index.html index.php;</span></span><br><span class="line"><span class="string"></span> </span><br><span class="line">        <span class="string">location</span> <span class="string">~</span> <span class="string">\.php$</span> &#123;</span><br><span class="line">            <span class="string">root</span> <span class="string">/usr/local/nginx/html;</span></span><br><span class="line">            <span class="string">fastcgi_pass</span> <span class="string">php:9000;</span></span><br><span class="line">            <span class="string">fastcgi_param</span> <span class="string">SCRIPT_FILENAME</span> <span class="string">/usr/share/nginx/html$fastcgi_script_name;</span></span><br><span class="line">            <span class="string">include</span> <span class="string">fastcgi_params;</span></span><br><span class="line">            <span class="string">fastcgi_connect_timeout</span> <span class="string">60s;</span></span><br><span class="line">            <span class="string">fastcgi_read_timeout</span> <span class="string">300s;</span></span><br><span class="line">            <span class="string">fastcgi_send_timeout</span> <span class="string">300s;</span></span><br><span class="line">        &#125;</span><br><span class="line">    <span class="string">&#125;</span></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">matchLabels:</span></span><br><span class="line">      <span class="attr">app:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">template:</span></span><br><span class="line">    <span class="attr">metadata:</span></span><br><span class="line">      <span class="attr">labels:</span></span><br><span class="line">        <span class="attr">app:</span> <span class="string">nginx</span></span><br><span class="line">    <span class="attr">spec:</span></span><br><span class="line">      <span class="attr">containers:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">        <span class="attr">image:</span> <span class="string">nginx</span></span><br><span class="line">        <span class="attr">resources:</span></span><br><span class="line">          <span class="attr">limits:</span></span><br><span class="line">            <span class="attr">memory:</span> <span class="string">&quot;128Mi&quot;</span></span><br><span class="line">            <span class="attr">cpu:</span> <span class="string">&quot;500m&quot;</span></span><br><span class="line">        <span class="attr">ports:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">containerPort:</span> <span class="number">80</span></span><br><span class="line">        <span class="attr">livenessProbe:</span></span><br><span class="line">            <span class="attr">httpGet:</span></span><br><span class="line">                <span class="attr">path:</span> <span class="string">/</span></span><br><span class="line">                <span class="attr">port:</span> <span class="number">80</span></span><br><span class="line">            <span class="attr">initialDelaySeconds:</span> <span class="number">20</span></span><br><span class="line">            <span class="attr">periodSeconds:</span> <span class="number">10</span></span><br><span class="line">        <span class="attr">volumeMounts:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">php-static</span></span><br><span class="line">            <span class="attr">mountPath:</span> <span class="string">/usr/share/nginx/html</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">config</span></span><br><span class="line">            <span class="attr">mountPath:</span> <span class="string">/etc/nginx/conf.d/default.conf</span></span><br><span class="line">            <span class="attr">subPath:</span> <span class="string">default.conf</span></span><br><span class="line">      <span class="attr">volumes:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">php-static</span></span><br><span class="line">          <span class="attr">persistentVolumeClaim:</span></span><br><span class="line">              <span class="attr">claimName:</span> <span class="string">dz</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">config</span></span><br><span class="line">          <span class="attr">configMap:</span></span><br><span class="line">              <span class="attr">name:</span> <span class="string">nginx-config</span></span><br><span class="line"></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">autoscaling/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">HorizontalPodAutoscaler</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">maxReplicas:</span> <span class="number">5</span></span><br><span class="line">  <span class="attr">minReplicas:</span> <span class="number">2</span></span><br><span class="line">  <span class="attr">scaleTargetRef:</span></span><br><span class="line">    <span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line">    <span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line">    <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">targetCPUUtilizationPercentage:</span> <span class="number">85</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Service</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">app:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">ports:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">    <span class="attr">port:</span> <span class="number">80</span></span><br><span class="line">    <span class="attr">targetPort:</span> <span class="number">80</span> </span><br></pre></td></tr></table></figure><p>创建一个nginx配置ConfigMap，在生成nginx Deployment的时候将ConfigMap中的数据以文件的方式进行挂载，创建livenessProbe探针检测80端口。</p><p>创建HPA和Service，原理和上边的php相同。</p><h2 id="Ingress配置"><a href="#Ingress配置" class="headerlink" title="Ingress配置"></a>Ingress配置</h2><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">extensions/v1beta1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Ingress</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">dz</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">dz</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">      <span class="attr">name:</span> <span class="string">dz</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">ingressClassName:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">rules:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">host:</span> <span class="string">www.test.com</span></span><br><span class="line">    <span class="attr">http:</span></span><br><span class="line">      <span class="attr">paths:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">pathType:</span> <span class="string">ImplementationSpecific</span></span><br><span class="line">        <span class="attr">path:</span> <span class="string">/</span></span><br><span class="line">        <span class="attr">backend:</span></span><br><span class="line">            <span class="attr">serviceName:</span> <span class="string">nginx</span></span><br><span class="line">            <span class="attr">servicePort:</span> <span class="number">80</span></span><br></pre></td></tr></table></figure><p>创建一个Ingress，将<code>www.test.com</code>解析指向nginx的Service中，访问时nginx被轮询，当单pod发生故障的时候对业务的影响会被降到最小。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;LNMP&quot;&gt;&lt;a href=&quot;#LNMP&quot; class=&quot;headerlink&quot; title=&quot;LNMP&quot;&gt;&lt;/a&gt;LNMP&lt;/h1&gt;&lt;p&gt;LNMP架构是常见的Web全栈架构，目前许多网站都使用了该种方法进行开发。对于常见的传统架构，服务器可靠性不是很高，Ngin</summary>
      
    
    
    
    <category term="云计算" scheme="http://www.huckops.xyz/categories/%E4%BA%91%E8%AE%A1%E7%AE%97/"/>
    
    
    <category term="容器技术" scheme="http://www.huckops.xyz/tags/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
  </entry>
  
</feed>
