What are Zendesk API rate limits by plan?

Zendesk rate limits vary by plan tier. On Enterprise plans, the Incremental Export API ceiling is approximately 10 requests per minute per endpoint, with a separate rolling bucket for the standard REST API (typically 400–700 requests per minute depending on plan). Suite Growth and Professional plans have lower ceilings. The exact limits are not documented publicly by Zendesk — you discover them from the X-Rate-Limit header on your first authenticated request. The headers X-Rate-Limit, X-Rate-Limit-Remaining, and Retry-After are your authoritative source regardless of plan.

How do I check my remaining Zendesk API quota?

Every Zendesk API response includes X-Rate-Limit (the size of your current bucket) and X-Rate-Limit-Remaining (how many requests remain before exhaustion). Read these headers on every response. When X-Rate-Limit-Remaining drops below your safety threshold (typically 10-25 depending on your parallelism), insert a computed delay before the next batch of requests. Do not wait for a 429 to tell you that you are over — by that point, your workers have already been blocked.

What is the difference between Zendesk's per-minute and daily API rate limits?

Zendesk enforces two distinct rate limit mechanisms: a rolling per-minute window (the one tracked by X-Rate-Limit headers) and a separate daily API quota that resets at midnight UTC. The per-minute limit is what causes 429 errors during active extraction. The daily limit is larger and only relevant for very long-running, high-volume extractions. Most teams only hit the per-minute limit. If you are consistently being rate-limited even after adding predictive pacing, check whether you are approaching your daily quota by comparing total requests made that day against the X-Rate-Limit value.

EngineeringApril 25, 20269 min read

Navigating the Zendesk Incremental API: Pacing, 429s, and Ghost Loops

Writing a script to "just pull everything" from Zendesk is a rite of passage for many developers. It usually ends with a 429 Too Many Requests error, a corrupted local file, and a Slack message to the team lead asking for help.

At Evicta, we spent months hardening our engine against the specific quirks of the Zendesk Increment Export API. This is a field guide to the failure modes you'll hit if you write the extraction yourself — and what a correct implementation looks like.

Related: if you're hitting these rate limits because you're trying to export Sunshine Custom Objects before the July 2026 sunset, the deadline compounds every issue in this guide.

Zendesk API Rate Limits by Plan

Before diving into failure modes, here is the baseline. Zendesk does not publish exact rate limits in their public documentation — you discover them from response headers. These are the observed ceilings across plan tiers:

Plan	Standard API (req/min)	Incremental Export (req/min)	Notes
Suite Team / Growth	~400 req/min	~5 req/min	Lower ceiling, hits fast under parallel load
Suite Professional	~700 req/min	~10 req/min	Standard for mid-market accounts
Suite Enterprise	~700–2,500 req/min	~10–20 req/min	Custom limits negotiable; confirm with headers

These numbers are approximate. Zendesk can throttle accounts dynamically based on global load, and limits sometimes vary per endpoint. Always read X-Rate-Limit from your own response headers — it is the only authoritative number for your account at the time of the request.

1. The Rate Ceiling

Zendesk enforces hard rate limits on the Incremental Export API. The exact ceiling depends on your plan — for most Enterprise plans the limit on incremental endpoints sits around 10 requests per minute, with separate limits on the standard API surfaces. The headers Zendesk returns on every response tell you where you stand:

X-Rate-Limit: 700
X-Rate-Limit-Remaining: 142
Retry-After: 38

Those three values answer different questions. X-Rate-Limit is the size of the current bucket. X-Rate-Limit-Remaining is how many requests remain before the bucket is exhausted. Retry-After appears when you have already crossed the line and tells you how long to wait before retrying. Treat Retry-After as authoritative. If Zendesk tells you to wait 38 seconds, retrying after 20 seconds is not faster; it usually extends the failure window.

Most developers reach for exponential backoff as the default fix. In a one-off script, that works. In a high-volume ETL run pulling millions of records, backoff compounds the problem: workers idle, TCP connections time out, and your effective throughput collapses to near-zero. A 4-hour extraction stretches to 18 hours.

A correct implementation tracks X-Rate-Limit-Remaining proactively and slows down before hitting 429, not after. The goal is to run at the maximum sustained throughput without ever tripping the ceiling. Reactive backoff is a smell; predictive pacing is the fix.

# Naive — reactive, fails at scale
if response.status_code == 429:
    time.sleep(int(response.headers["Retry-After"]))
    retry()

# Correct — predictive, sustains throughput
remaining = int(response.headers["X-Rate-Limit-Remaining"])
if remaining < THRESHOLD:
    sleep_for = compute_pacing_delay(remaining, request_quota_window)
    time.sleep(sleep_for)

A more complete pacing wrapper centralizes the decision so parallel workers do not each invent their own delay. If five workers all see X-Rate-Limit-Remaining: 5 and continue independently, the next five requests can drain the bucket before any worker sees the updated headers.

class ZendeskRateGate:
    def __init__(self, low_watermark=25):
        self.low_watermark = low_watermark
        self.next_request_at = 0

    def before_request(self):
        now = time.time()
        if now < self.next_request_at:
            time.sleep(self.next_request_at - now)

    def after_response(self, response):
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", "60"))
            self.next_request_at = time.time() + retry_after
            return

        remaining = int(response.headers.get("X-Rate-Limit-Remaining", "1"))
        if remaining < self.low_watermark:
            # Spread the remaining calls over the rest of the minute.
            delay = max(1, 60 / max(remaining, 1))
            self.next_request_at = time.time() + delay

Evicta's engine implements predictive pacing across all endpoint families, keeping extraction running at safe throughput without manual intervention.

2. The "Ghost Page" Loop

This is the most dangerous quirk in Zendesk. Occasionally, the Incremental Export returns an empty tickets array but still provides a next_page link. A naive extraction loop that only checks for the link will spin indefinitely — burning your entire API quota in seconds while retrieving zero records.

The minimal response looks like this:

{
  "tickets": [],
  "next_page": "https://your-subdomain.zendesk.com/api/v2/incremental/tickets.json?start_time=1746384912",
  "count": 0,
  "end_time": 1746384912,
  "end_of_stream": false
}

Note end_of_stream: false. The API is telling you there's more data, just not in this page. The fix isn't to ignore next_page — sometimes the next page legitimately has records. The fix is to track repeated empty-page responses and break out after N consecutive empties:

empty_page_count = 0

while next_page:
    response = fetch(next_page)
    tickets = response.get("tickets", [])

    if not tickets:
        empty_page_count += 1
        if empty_page_count >= MAX_CONSECUTIVE_EMPTIES:
            if response.get("end_of_stream"):
                break
            log.warning("ghost page loop detected, exiting")
            break
    else:
        empty_page_count = 0

    yield from tickets
    next_page = response.get("next_page")

This is the kind of edge case that doesn't surface in testing — it only appears on large historical extractions where the start_time cursor lands on a window with no ticket activity.

A production loop should also verify that the cursor is moving. Empty pages are one symptom; repeated cursors are another. Store the last next_page, end_time, and record count. If all three repeat, the loop is no longer making progress.

last_seen = None

state = (
    response.get("next_page"),
    response.get("end_time"),
    response.get("count"),
)

if state == last_seen:
    raise RuntimeError("Zendesk pagination stopped advancing")

last_seen = state

3. Dual Pagination Strategies

Zendesk uses two incompatible pagination strategies across its API surfaces, and they don't compose cleanly:

Time-based incremental pagination for Tickets, Users, Organizations, and Ticket Events (using start_time and end_time).
Cursor-based pagination for Organization Memberships, Group Memberships, Ticket Audits, and Comments (using opaque cursor tokens in meta.after_cursor).

The cursor from a tickets endpoint does not work with a memberships endpoint, and vice versa. You need to implement and maintain two separate pagination state machines. They interact in ways that aren't documented — for example, a cursor can become invalid mid-extraction if the underlying record set changes, and the API may return no explicit error, just a different result shape.

A time-based checkpoint stores the last safe end_time only after the page has been written. A cursor-based checkpoint stores the opaque cursor only after the batch is durable. Advancing either checkpoint before the write completes creates a data-loss bug: the next run resumes after records that never landed in storage.

def save_incremental_checkpoint(endpoint, response):
    checkpoint = {
        "endpoint": endpoint,
        "mode": "time",
        "end_time": response["end_time"],
        "next_page": response.get("next_page"),
    }
    checkpoint_store.write(checkpoint)


def save_cursor_checkpoint(endpoint, response):
    checkpoint = {
        "endpoint": endpoint,
        "mode": "cursor",
        "after_cursor": response["meta"].get("after_cursor"),
        "has_more": response["meta"].get("has_more"),
    }
    checkpoint_store.write(checkpoint)

A production extraction layer maintains separate checkpoint state for each pagination strategy and serializes it to disk so an interrupted job can resume without re-reading data.

4. Memory Bloat at Scale

When pulling hundreds of thousands of records, storing results in-process is not an option — the process crashes before the job finishes. A 2-million-ticket Zendesk instance with comments expanded can easily exceed 30 GB in raw JSON.

The fix is streaming: write each record to durable storage as it arrives, never holding the full result set in memory. Encrypted S3, GCS, or R2 are the standard destinations. The structure of a correct streaming pipeline looks like:

fetch → decompress → parse → validate → encrypt → write

with bounded queues between stages so a slow downstream consumer, such as the validation step, backpressures the upstream fetcher rather than letting in-memory buffers grow unbounded.

async def stream_records(pages, writer):
    async for page in pages:
        for ticket in page.get("tickets", []):
            await writer.write_jsonl(ticket)
        await writer.flush()

That simple pattern is doing important work: memory usage depends on the current page and queue depth, not on the total Zendesk account size. If each ticket averages 15 KB with comments and metadata attached, buffering 2 million tickets implies roughly 30 GB before Python object overhead. Streaming keeps the process stable even when the export takes hours.

Evicta streams data directly to encrypted storage as it arrives, keeping memory usage flat at any extraction volume. The result is a clean Postgres schema ready for queries, without any manual transformation.

5. Cursor Invalidation on Schema Changes

The most painful failure mode surfaces only on multi-day extractions: if a Zendesk admin modifies a Custom Field schema or a Custom Object definition while your extraction is running, in-flight cursors can become invalid. The API doesn't always return a clean error — it can return records with a different shape, and you discover the inconsistency hours later during validation.

Defensive extraction implementations checkpoint the schema snapshot at job start and reject mid-run schema drift, either by restarting from the last checkpoint or by failing loudly so a human can decide.

def schema_fingerprint(fields):
    normalized = sorted(
        (field["id"], field.get("key"), field.get("type"))
        for field in fields
    )
    payload = json.dumps(normalized, separators=(",", ":"))
    return hashlib.sha256(payload.encode()).hexdigest()

job_start_fingerprint = schema_fingerprint(fetch_ticket_fields())
current_fingerprint = schema_fingerprint(fetch_ticket_fields())

if current_fingerprint != job_start_fingerprint:
    raise RuntimeError("Zendesk schema changed during extraction")

This is not paranoia. A field type change can alter downstream parsing. A renamed custom field can change generated column names. A deleted field can make old records and new records look incompatible even though both came from the same Zendesk instance. The safest behavior is to detect drift early and make the restart explicit.

Built So You Don't Have To

We absorbed these failure modes so you don't have to spend your weekend debugging rate limits, ghost cursors, and dual pagination. For the broader migration failure modes beyond just the API layer, see our deep dive on the helpdesk migration trap. For AI/data team workflows, the same extraction infrastructure produces clean JSONL for RAG pipelines.

Frequently Asked Questions

What causes Zendesk API 429 errors during data export?

Zendesk enforces a strict rate ceiling on incremental endpoints. Standard exponential backoff strategies fail here — they cause idle workers and timed-out connections instead of sustained throughput. Purpose-built extraction tools handle this through predictive throttling, keeping extractions running without manual intervention.

What is the Zendesk Ghost Page loop?

The Ghost Page loop is a dangerous Zendesk API quirk where the API returns an empty results page but still provides a next_page link. Naive ETL implementations enter an infinite loop that burns your entire API quota in seconds. A correctly implemented extraction engine detects and exits this condition automatically.

How do you avoid memory bloat when extracting millions of Zendesk records?

Buffering large Zendesk extractions in memory will crash any standard process before the job completes. Purpose-built extraction engines stream data directly to encrypted storage without holding result sets in RAM, keeping memory usage flat regardless of extraction volume.

WHY WE BUILT EVICTA

We solved every one of these failure modes once. So you don't have to.

Evicta is a flat-fee Zendesk data extraction tool. We handle the 429 backoff, the ghost cursor loop, the Incremental Export edge cases — every quirk this post describes. The output is clean Postgres or JSONL. Free schema preview against your real data — create an account and connect Zendesk inside the dashboard.

Create free account →Read the security architecture →