<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
    <channel>
        <title>VLLM - 标签 - Simi Studio</title>
        <link>/tags/vllm/</link>
        <description>VLLM - 标签 - Simi Studio</description>
        <generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>simi@simi.studio (Simi)</managingEditor>
            <webMaster>simi@simi.studio (Simi)</webMaster><lastBuildDate>Sat, 15 Jun 2024 10:00:00 &#43;0800</lastBuildDate><atom:link href="/tags/vllm/" rel="self" type="application/rss+xml" /><item>
    <title>vLLM 实战：如何在 GPU 服务器上高效跑起开源 LLM</title>
    <link>/posts/vllm-local-llm-serving/</link>
    <pubDate>Sat, 15 Jun 2024 10:00:00 &#43;0800</pubDate>
    <author>simi@simi.studio (Simi)</author>
    <guid>/posts/vllm-local-llm-serving/</guid>
    <description><![CDATA[vLLM 是当下最流行的开源 LLM 推理引擎。它的 PagedAttention 技术让同等硬件下 throughput 提升 24 倍。这篇文章讲清楚 vLLM 是什么、怎么部署、以及实际使用中的注意事项。]]></description>
</item>
</channel>
</rss>
