Skip to main content

Bulk Retrieval

This guide covers strategies for retrieving multiple documents efficiently.

Basic Bulk Retrieval

Retrieve Multiple by ID

curl -X POST https://api.docyard.io/v1/dock/retrieve/batch \
  -H "X-API-Key: dk_live_coll_aaaaaaaa" \
  -H "X-API-Secret: dk_secret_coll_bbbbbbbb" \
  -H "Content-Type: application/json" \
  -d '{
    "artifact_ids": ["art-001", "art-002", "art-003"],
    "keys": {
      "policy_number": "POL-12345678"
    }
  }'
Response:
{
  "results": [
    {
      "artifact_id": "art-001",
      "status": "granted",
      "content": "<pdf-data>",
      "content_type": "application/pdf"
    },
    {
      "artifact_id": "art-002",
      "status": "granted",
      "content": "<pdf-data>",
      "content_type": "application/pdf"
    },
    {
      "artifact_id": "art-003",
      "status": "denied",
      "score": 5,
      "threshold": 20,
      "message": "Score below threshold"
    }
  ]
}

Asynchronous Bulk Retrieval

For large requests (50+ documents), use async retrieval:

1. Create Retrieval Job

curl -X POST https://api.docyard.io/v1/dock/retrieve/async \
  -H "X-API-Key: dk_live_coll_aaaaaaaa" \
  -H "X-API-Secret: dk_secret_coll_bbbbbbbb" \
  -H "Content-Type: application/json" \
  -d '{
    "artifact_ids": ["art-001", "art-002", "art-003", "art-004", "art-005"],
    "keys": {
      "policy_number": "POL-12345678"
    },
    "notify_url": "https://your-server.com/webhook/docyard"
  }'
Response:
{
  "job_id": "job-xyz789",
  "status": "processing",
  "estimated_completion": "2026-03-15T11:00:00Z"
}

2. Poll for Job Status

curl -X GET https://api.docyard.io/v1/dock/retrieve/jobs/job-xyz789 \
  -H "X-API-Key: dk_live_coll_aaaaaaaa"
Response:
{
  "job_id": "job-xyz789",
  "status": "processing",
  "total": 5,
  "processed": 3,
  "succeeded": 2,
  "failed": 1
}

3. Get Results When Complete

curl -X GET https://api.docyard.io/v1/dock/retrieve/jobs/job-xyz789/results \
  -H "X-API-Key: dk_live_coll_aaaaaaaa"
Response:
{
  "job_id": "job-xyz789",
  "status": "completed",
  "results": [
    {
      "artifact_id": "art-001",
      "status": "granted",
      "download_url": "https://api.docyard.io/v1/dock/download/token-abc123"
    },
    {
      "artifact_id": "art-002",
      "status": "granted",
      "download_url": "https://api.docyard.io/v1/dock/download/token-def456"
    },
    {
      "artifact_id": "art-003",
      "status": "denied"
    }
  ],
  "expires_at": "2026-03-16T11:00:00Z"
}

Search + Retrieve Pattern

A common pattern is to search first, then bulk retrieve:

1. Search for Matching Artifacts

curl -X POST https://api.docyard.io/v1/dock/search \
  -H "X-API-Key: dk_live_coll_aaaaaaaa" \
  -H "Content-Type: application/json" \
  -d '{
    "filters": {
      "document_type": "declaration_page",
      "mortgagee_name": "FirstCity Bank",
      "effective_date_after": "2026-01-01"
    },
    "pagination": {
      "limit": 100
    }
  }'
Response:
{
  "results": [
    { "id": "art-001", "locks": { "policy_number": "POL-111" } },
    { "id": "art-002", "locks": { "policy_number": "POL-222" } },
    { "id": "art-003", "locks": { "policy_number": "POL-333" } }
  ],
  "pagination": {
    "page": 1,
    "limit": 100,
    "total": 3
  }
}

2. Bulk Retrieve

curl -X POST https://api.docyard.io/v1/dock/retrieve/batch \
  -H "X-API-Key: dk_live_coll_aaaaaaaa" \
  -H "Content-Type: application/json" \
  -d '{
    "artifact_ids": ["art-001", "art-002", "art-003"],
    "keys": {
      "policy_number": "POL-111"
    }
  }'

Delta Retrieval

Retrieve only newly uploaded documents since last check:

1. Track Last Retrieval Time

Store the timestamp of your last retrieval.

2. Search with Date Filter

curl -X POST https://api.docyard.io/v1/dock/search \
  -H "X-API-Key: dk_live_coll_aaaaaaaa" \
  -H "Content-Type: application/json" \
  -d '{
    "filters": {
      "mortgagee_name": "FirstCity Bank",
      "created_after": "2026-03-14T00:00:00Z"
    }
  }'

3. Process Results

Only process documents created since your last check.

Nightly Sync Pattern

Many collectors implement nightly batch retrieval:
async function nightlySync() {
  const lastSync = await getLastSyncTimestamp();
  
  // Search for new artifacts
  const searchResult = await dock.search({
    filters: {
      created_after: lastSync,
      document_type: 'declaration_page'
    }
  });
  
  if (searchResult.results.length === 0) {
    console.log('No new artifacts');
    return;
  }
  
  // Create async retrieval job
  const job = await dock.retrieveAsync({
    artifact_ids: searchResult.results.map(r => r.id),
    keys: { mortgagee_name: 'FirstCity Bank' },
    notify_url: 'https://your-server.com/webhook/docyard'
  });
  
  console.log(`Created job ${job.job_id} for ${searchResult.results.length} artifacts`);
  await setLastSyncTimestamp(new Date().toISOString());
}

Webhook Notifications

For async retrieval, receive notifications when jobs complete:

Set Notification URL

curl -X POST https://api.docyard.io/v1/dock/retrieve/async \
  -H "X-API-Key: dk_live_coll_aaaaaaaa" \
  -H "Content-Type: application/json" \
  -d '{
    "artifact_ids": ["art-001", "art-002"],
    "keys": { "policy_number": "POL-123" },
    "notify_url": "https://your-server.com/webhook/docyard"
  }'

Webhook Payload

{
  "event": "retrieval_job_completed",
  "job_id": "job-xyz789",
  "status": "completed",
  "total": 2,
  "succeeded": 2,
  "failed": 0,
  "timestamp": "2026-03-15T11:00:00Z"
}

Verify Webhook Signature

const crypto = require('crypto');

function verifyWebhook(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  
  return signature === expected;
}

Rate Limits

OperationLimit
Search300/minute
Sync retrieve100/minute
Batch retrieve20 batches/minute
Async retrieval10 jobs/minute

Best Practices

1. Use Async for Large Requests

Requests with 50+ artifacts should use async retrieval.

2. Implement Retry Logic

async function retrieveWithRetry(artifactIds, keys, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await dock.retrieveBatch(artifactIds, keys);
    } catch (error) {
      if (error.status === 429) {
        await sleep(60000); // Wait 1 minute
        continue;
      }
      throw error;
    }
  }
}

3. Handle Partial Success

const result = await dock.retrieveBatch(artifactIds, keys);

const succeeded = result.results.filter(r => r.status === 'granted');
const failed = result.results.filter(r => r.status === 'denied');

if (failed.length > 0) {
  console.log(`${failed.length} artifacts were denied`);
  for (const f of failed) {
    console.log(`${f.artifact_id}: Score ${f.score} < Threshold ${f.threshold}`);
  }
}
Download URLs are valid for 24 hours. Download promptly.

5. Monitor Job Status

For async jobs, implement polling or webhooks:
// Polling
async function waitForJob(jobId) {
  while (true) {
    const status = await dock.getJobStatus(jobId);
    if (status.status === 'completed' || status.status === 'failed') {
      return status;
    }
    await sleep(5000); // Wait 5 seconds
  }
}

Next Steps