go_bunzee

Serverless 2.0: Streaming | 매거진에 참여하세요

questTypeString.01quest1SubTypeString.04
publish_date : 25.08.27

Serverless 2.0: Streaming

#Serverless #Streaming #Cost #Optimizati #Function #Lambda #Architectu #2.0

content_guide

How Function Streaming Is Redefining Serverless : Serverless Evolves

Serverless used to feel almost magical for developers:
"Don’t worry about servers—just deploy your code, we handle the infrastructure."

AWS Lambda, Azure Functions, and Google Cloud Functions let startups run services like tech giants.

But over time, limitations surfaced:

  • - Cold start delays

  • - State management headaches

  • - Execution time limits

Enter 2025 and Serverless 2.0, featuring Function Streaming.

What Is Function Streaming?

Traditional serverless functions were designed for short, fast executions, image resizing, simple API responses.

But modern workloads,

LLM calls, real-time data processing, streaming responses require functions that run longer and emit partial results as they execute.

Function Streaming enables:

  • - Continuous result streaming while the function runs

  • - Real-time UX (e.g., ChatGPT streaming answers, live video conversion)

Why Now?

  • AI API era:

  • OpenAI, Anthropic, Google APIs all support streaming natively

  • Real-time pipelines:

  • IoT, gaming, trading, monitoring need instant reactions

  • User experience:

  • Streaming prevents “frozen” responses, delivering interactive experiences

Function Streaming Support (2025)

Platform

Streaming Support

Features

Strengths

Limitations

AWS Lambda

Limited (via Kinesis/Bedrock)

Event-driven

Tight AWS ecosystem

Not optimized for streaming UX

Azure Functions

Yes (Durable Functions)

Long-running, stateful

Strong state management

Steeper learning curve

Google Cloud Run

Full (HTTP streaming)

Serverless containers, Vertex AI

Optimized for AI/data streaming

Higher runtime cost

Vercel Edge Functions

Basic

Next.js integration

Excellent developer experience

Limited for large enterprise workloads

Render / Fly.io

Improving

Simple streaming, startup-friendly

Cheap & fast deployment

Global scale limited

Serverless 1.0 vs 2.0

Feature

Serverless 1.0

Serverless 2.0 (Function Streaming)

Execution Time

Short (seconds–minutes)

Long-running, continuous stream

Response

Single result

Streaming results continuously

Use Cases

Image resize, API, event triggers

LLM calls, interactive apps, real-time pipelines

State

External storage required

Built-in state support (Durable Functions)

Billing

Per invocation

Per stream execution / runtime

UX

Batch-focused, delayed responses

Real-time interaction, improved user experience

Cost Considerations

  1. Serverless 1.0

  • Invocation-based billing: execution time × memory

  • Pros: No cost when idle

  • Cons: Sudden traffic spikes → unpredictable costs

  1. Serverless 2.0

  • Streaming-based billing: function consumes resources for duration

  • Real-time workflows → longer single execution → higher cost

  • LLM calls consume GPU/memory → more expensive

  1. Cold Start vs Cost

  • Cold start reduces idle costs but adds latency

  • Solution: Provisioned Concurrency / Always-On instances

    • AWS Lambda: Provisioned Concurrency

    • GCP Cloud Run: Always-On minimum instances

    • Azure Functions: Always Ready plan

Cost Optimization Strategies

  • - Architecture Level

  • Hybrid separation: streaming → Cloud Run/Edge, non-streaming → Lambda

  • 3-stage pipeline: Fast Gate → Worker → Post Sink

  • Edge-first: handle initial response at edge, main model call minimized

  • - Platform Settings

  • Minimize pre-warming: Always-on only during peak

  • Tune concurrency & timeout

  • Co-locate functions, DB, models in same region

  • - Application Level

  • Limit token / response length for LLMs

  • Query caching

  • Tiered quality: Free → lightweight, Pro → advanced

  • Protocol choice: SSE for one-way, WebSocket for bidirectional

Key Takeaways

Serverless 2.0 isn’t just a feature upgrade, it’s serverless reborn for AI and real-time applications:

  • Serverless 1.0: short, fast execution, zero cost when idle

  • Serverless 2.0: real-time streaming, interactive UX, but careful cost management required

Future apps will increasingly run on Serverless 2.0, making cold start reduction and execution cost optimization essential considerations for developers and planners alike.