Skip to main content

Documentation

Everything you need to integrate CrowdSorcerer.

Quickstart

Post your first task and pull the worker's response as typed JSON.

  1. Request early access — we're in closed beta and onboarding one cohort at a time
  2. Generate an API key from the API Keys page
  3. POST a task, then poll /v1/tasks/{id} or subscribe to a webhook for the result

TypeScript / Node.js

npm install @crowdsourcerer/sdk

Python

pip install crowdsourcerer-sdk

cURL

curl -X POST https://crowdsourcerer.rebaselabs.online/v1/tasks \
  -H "Authorization: Bearer csk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "label_text",
    "input": {
      "text": "The new iPhone is great!",
      "categories": ["positive", "negative", "neutral"],
      "question": "What is the sentiment?"
    },
    "assignments_required": 3
  }'

Authentication

All API requests require a Bearer token in the Authorization header.

Authorization: Bearer csk_YOUR_API_KEY

Two token types are accepted:

  • API Keys (csk_...) — recommended for production. Create in the dashboard.
  • JWT tokens — returned from POST /v1/auth/login. Use for short-lived user sessions.

Task lifecycle

Human tasks go through the worker marketplace. Once posted, the task is visible to skill-matched workers who can claim it, do the work, and submit a typed response within the timeout you set.

open assigned completed failed cancelled

Submit a task → get a task_id → poll GET /v1/tasks/{task_id} or subscribe to a webhook. Credits are reserved on submission (worker_reward × assignments plus platform fee) and released to the worker(s) on approval. Unused slots are refunded if the task closes early or no worker claims it before claim_timeout_minutes.

Submit a task

POST /v1/tasks
{
  "type": "label_text",
  "input": {
    "text": "The new iPhone is great!",
    "categories": ["positive", "negative", "neutral"],
    "question": "What is the sentiment?"
  },
  "assignments_required": 3,           // 1–20 workers
  "consensus_strategy": "majority_vote", // any_first | majority_vote | unanimous | requester_review
  "claim_timeout_minutes": 60,         // how long a worker has to submit
  "priority": "normal",                // low | normal | high | urgent
  "webhook_url": "https://..."         // optional
}

Poll for result

GET /v1/tasks/{task_id}

// Response when completed:
{
  "id": "uuid",
  "type": "label_text",
  "status": "completed",
  "execution_mode": "human",
  "output": {
    "summary": "positive (3/3 workers agree)",
    "raw": { "submissions": [...] }
  },
  "credits_used": 9,
  "assignments_required": 3,
  "assignments_completed": 3,
  "created_at": "2026-04-11T00:00:00Z",
  "completed_at": "2026-04-11T00:12:31Z"
}

TypeScript SDK

npm install @crowdsourcerer/sdk
import { CrowdSorcerer } from "@crowdsourcerer/sdk";

const crowd = new CrowdSorcerer({ apiKey: process.env.CROWDSOURCERER_API_KEY! });

// Submit + poll until the worker finishes (or timeout)
const task = await crowd.runTask({
  type: "label_text",
  input: {
    text: "The new iPhone is great!",
    categories: ["positive", "negative", "neutral"],
    question: "What is the sentiment?",
  },
  assignments_required: 3,        // get 3 independent workers
});
console.log(task.output?.summary); // → "positive (3/3 workers agree)"

// Or submit async and poll manually
const { task_id } = await crowd.submitTask({
  type: "moderate_content",
  input: { content: "User-submitted post goes here", guidelines: "..." },
});
const result = await crowd.getTask(task_id);

// Credits
const balance = await crowd.getCredits();
console.log(balance.available); // → 87

Python SDK

pip install crowdsourcerer-sdk

Requires Python 3.9+. Supports both sync and async clients.

Sync client

from crowdsourcerer import CrowdSorcerer

client = CrowdSorcerer(api_key="csk_YOUR_KEY")

# Submit a human task and wait for the worker(s) to finish
task = client.tasks.create("label_text", {
    "text": "The new iPhone is great!",
    "categories": ["positive", "negative", "neutral"],
    "question": "What is the sentiment?",
}, assignments_required=3)

completed = client.tasks.wait(task.id)
print(completed.output)

# Check credits
balance = client.credits.balance()
print(f"Available: {balance.available} credits")

Async client

import asyncio
from crowdsourcerer import AsyncCrowdSorcerer

async def main():
    async with AsyncCrowdSorcerer(api_key="csk_YOUR_KEY") as client:
        # Fan out a batch of rate_quality tasks concurrently
        tasks = await asyncio.gather(*[
            client.tasks.create("rate_quality", {
                "content": ai_output,
                "criteria": "Rate the factual accuracy 1–5",
            }, assignments_required=2)
            for ai_output in pending_outputs
        ])
        results = await asyncio.gather(*[client.tasks.wait(t.id) for t in tasks])
        for r in results:
            print(r.id, r.status, r.output)

asyncio.run(main())

Error handling

from crowdsourcerer import (
    CrowdSorcerer, AuthError, RateLimitError,
    InsufficientCreditsError, TaskError,
)

client = CrowdSorcerer(api_key="csk_YOUR_KEY")

try:
    task = client.tasks.create("verify_fact", {
        "claim": "The Eiffel Tower is 330 metres tall",
    })
    result = client.tasks.wait(task.id)
except InsufficientCreditsError:
    print("Top up at crowdsourcerer.rebaselabs.online/dashboard/credits")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except TaskError as e:
    print(f"Task failed: {e}")
except AuthError:
    print("Invalid API key")

Webhooks

Get notified when tasks complete instead of polling. Set webhook_url when creating a task, or register a persistent endpoint in the dashboard. Deliveries retry up to 3 times with exponential backoff.

POST https://your-server.com/webhook

{
  "task_id": "3f4a1b2c-...",
  "event": "task.completed"
}

Fetch full details from GET /v1/tasks/{task_id} after receiving the event.

Signature verification

Every webhook delivery is signed with your endpoint's secret using HMAC-SHA256. Always verify signatures to ensure requests came from CrowdSorcerer.

The signature is in the X-Crowdsorcerer-Signature header in the format t=TIMESTAMP,v1=HMAC_HEX. The timestamp is also available separately in X-Crowdsorcerer-Timestamp.

Python

import hmac, hashlib, time

def verify_webhook(payload_bytes: bytes, secret: str, sig_header: str,
                   tolerance: int = 300) -> bool:
    """Verify a CrowdSorcerer webhook signature."""
    # Parse "t=TIMESTAMP,v1=SIGNATURE"
    parts = {}
    for part in sig_header.split(","):
        key, _, value = part.partition("=")
        parts[key.strip()] = value.strip()

    timestamp = parts.get("t")
    signature = parts.get("v1")
    if not timestamp or not signature:
        return False

    # Reject old deliveries (replay protection)
    if abs(time.time() - int(timestamp)) > tolerance:
        return False

    # Reconstruct the signed payload: "{timestamp}.{body}"
    sig_input = f"{timestamp}.".encode() + payload_bytes
    expected = hmac.new(
        secret.encode(), sig_input, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

# In your Flask / FastAPI handler:
# sig = request.headers["X-Crowdsorcerer-Signature"]
# verify_webhook(request.get_data(), YOUR_SECRET, sig)

Node.js / TypeScript

import crypto from "crypto";

function verifyWebhook(
  payloadBytes: Buffer, secret: string,
  sigHeader: string, toleranceSec = 300
): boolean {
  const parts: Record<string, string> = {};
  for (const p of sigHeader.split(",")) {
    const [k, ...v] = p.split("=");
    parts[k.trim()] = v.join("=").trim();
  }
  const ts = parts["t"], sig = parts["v1"];
  if (!ts || !sig) return false;

  if (Math.abs(Date.now() / 1000 - Number(ts)) > toleranceSec) return false;

  const sigInput = Buffer.concat([
    Buffer.from(ts + "."),
    payloadBytes,
  ]);
  const expected = crypto
    .createHmac("sha256", secret)
    .update(sigInput)
    .digest("hex");
  return crypto.timingSafeEqual(
    Buffer.from(expected), Buffer.from(sig)
  );
}

// In your Express handler:
// const sig = req.headers["x-crowdsorcerer-signature"];
// verifyWebhook(req.body, YOUR_SECRET, sig);

Secret rotation

When you rotate a webhook secret, both the old and new signatures are sent for 24 hours (v1 = new, v0 = old). Verify against both during the rotation window, then switch to the new secret.

SDK helpers

Both SDKs include built-in webhook verification so you don't have to implement it yourself:

Python

from crowdsourcerer import verify_webhook

is_valid = verify_webhook(
    payload=request.get_data(),
    secret=YOUR_SECRET,
    signature_header=request.headers[
        "X-Crowdsorcerer-Signature"
    ],
)

TypeScript

import { verifyWebhook } from
  "@crowdsourcerer/sdk";

const valid = verifyWebhook(
  req.body,
  process.env.WEBHOOK_SECRET!,
  req.headers[
    "x-crowdsorcerer-signature"
  ],
);

Pipeline AI primitives

Six in-process primitives that pipelines can chain with human steps. They are not submittable via POST /v1/tasks directly — that endpoint rejects them with a 422 pointing at /v1/pipelines. Documented here so pipeline authors know the input schema for each step type.

Credit costs shown are per step. Credits are charged on pipeline execution and refunded on failure.

LLM Generate

llm_generate 1 credit

Direct LLM completion via the configured provider (Anthropic, Gemini, or OpenAI — picked by LLM_PROVIDER env or auto-detected from whichever API key is set).

Input schema

Field Type Req? Description
messages Message[] Array of {role, content} messages. system, user, assistant roles supported.
system_prompt string System prompt; merged with any system messages in the array.
model string Provider-specific model id; defaults to the provider's configured default.
temperature number 0–2 (default: 0.7)
max_tokens number Max output tokens (default: 2048)

Example

{
  "type": "llm_generate",
  "input": {
    "messages": [{"role": "user", "content": "Summarize the pros and cons of microservices in 3 bullet points each."}]
  }
}

Data Transform

data_transform 2 credits

Wraps your data and a natural-language instruction in a structured prompt and runs it through the LLM. Use for CSV↔JSON conversion, field renaming, normalization, filtering, etc.

Input schema

Field Type Req? Description
data any Input data (object, array, or plain text).
transform string Natural-language description of what to do.
output_format "json" | "csv" | "markdown" | "text" Desired output format (default: json).

Example

{
  "type": "data_transform",
  "input": {
    "data": [{"name": "Alice", "score": "87"}, {"name": "Bob", "score": "92"}],
    "transform": "Sort by score descending and add a rank column",
    "output_format": "json"
  }
}

Web Research

web_research 10 credits

Fetches a URL with httpx, extracts visible text with BeautifulSoup, and summarises through the LLM using your instruction.

Input schema

Field Type Req? Description
url string URL to fetch. Must be publicly reachable (private/loopback IPs are blocked).
instruction string What to summarise or extract. Defaults to a general page summary.

Example

{
  "type": "web_research",
  "input": {
    "url": "https://news.ycombinator.com",
    "instruction": "List the top 5 story titles"
  }
}

Document Parse

document_parse 3 credits

Local extraction for PDF (pypdf), DOCX (python-docx), XLSX (openpyxl), and plain text. No network calls once the document is fetched.

Input schema

Field Type Req? Description
url string URL to a document. Mutually exclusive with content_base64.
content_base64 string Base64-encoded document content (20 MB cap).
mime_type string Hint when content_base64 is used and the magic bytes are ambiguous.

Example

{
  "type": "document_parse",
  "input": {
    "url": "https://example.com/report.pdf"
  }
}

PII Detect

pii_detect 2 credits

In-process regex detector for email, phone, SSN, credit-card (Luhn-validated), IBAN, IPv4/6, and passport numbers. Optionally returns a redacted copy.

Input schema

Field Type Req? Description
text string Text to scan.
entities string[] Subset of entity types to detect (default: all).
mask boolean Include a redacted copy of the input in the output.

Example

{
  "type": "pii_detect",
  "input": {
    "text": "Call John Smith at john@example.com or 555-123-4567",
    "mask": true
  }
}

Code Execute

code_execute 3 credits

Sandboxed Python subprocess. Runs python -I in a temp directory with rlimits on CPU, file size, and memory. Python only — other languages are not supported.

Input schema

Field Type Req? Description
code string Python source to run.
language "python" Only python is accepted.
timeout_seconds number Wall-clock timeout (default: 10, max: 30).
stdin string Standard input passed to the script.

Example

{
  "type": "code_execute",
  "input": {
    "code": "import json\ndata = [1, 2, 3, 4, 5]\nprint(json.dumps({'sum': sum(data), 'mean': sum(data)/len(data)}))",
    "language": "python"
  }
}

Human task types

8 task types completed by human workers. Best for subjective judgments, quality evaluation, and tasks requiring human context.

Credit costs are the base worker reward per assignment. Total cost = (worker_reward × assignments) + platform_fee where platform fee is 20% (minimum 1 credit).

Human task options

These fields can be set on any human task type alongside the task-specific input:

assignments_required (1-10, default: 1)
Number of workers needed
consensus_strategy
any_first | majority_vote | unanimous | requester_review
worker_reward_credits
Override default worker reward (1-10,000)
claim_timeout_minutes (5-480, default: 30)
Time for worker to submit
min_skill_level (1-5)
Minimum worker proficiency
task_instructions
Guidance text shown to workers

Label Image

label_image 3 credits human

Workers classify images by selecting from a set of predefined labels.

Input schema

Field Type Req? Description
image_url string URL of the image to label
labels string[] Array of possible labels to choose from
description string Additional context for workers

Example

{
  "type": "label_image",
  "input": {
    "image_url": "https://example.com/photo.jpg",
    "labels": ["cat", "dog", "bird", "other"],
    "description": "Select the animal in this photo"
  }
}

Label Text

label_text 2 credits human

Workers categorize text into one of the provided categories.

Input schema

Field Type Req? Description
text string Text to categorize
categories string[] Array of possible categories

Example

{
  "type": "label_text",
  "input": {
    "text": "The new iPhone has amazing battery life and a great camera",
    "categories": ["positive", "negative", "neutral"]
  }
}

Rate Quality

rate_quality 2 credits human

Workers rate content quality on a 1–5 scale based on specified criteria.

Input schema

Field Type Req? Description
title string Title of the content to rate
content string The content to evaluate
criteria string What to evaluate (e.g. clarity, accuracy)

Example

{
  "type": "rate_quality",
  "input": {
    "title": "Introduction to Machine Learning",
    "content": "Machine learning is a subset of AI...",
    "criteria": "Rate the accuracy and clarity of this explanation"
  }
}

Verify Fact

verify_fact 3 credits human

Workers verify whether a claim is true, false, or indeterminate given the provided context.

Input schema

Field Type Req? Description
claim string The factual claim to verify
context string Supporting context or evidence

Example

{
  "type": "verify_fact",
  "input": {
    "claim": "Python was created by Guido van Rossum in 1991",
    "context": "Python is a high-level programming language first released in 1991 by Guido van Rossum."
  }
}

Moderate Content

moderate_content 2 credits human

Workers review content against a policy and decide to approve, reject, or escalate.

Input schema

Field Type Req? Description
content string The content to moderate
content_type string Type of content (e.g. comment, review, post)
policy_context string Moderation policy or guidelines

Example

{
  "type": "moderate_content",
  "input": {
    "content": "This product is absolutely terrible, worst purchase ever!",
    "content_type": "product_review",
    "policy_context": "Reject spam, hate speech, and threats. Allow negative opinions."
  }
}

Compare & Rank

compare_rank 2 credits human

Workers compare two options and select the better one based on given criteria.

Input schema

Field Type Req? Description
option_a string First option to compare
option_b string Second option to compare
criteria string What to compare on (e.g. readability, accuracy)

Example

{
  "type": "compare_rank",
  "input": {
    "option_a": "Machine learning uses algorithms to learn from data.",
    "option_b": "ML is when computers figure stuff out from examples.",
    "criteria": "Which explanation is clearer and more professional?"
  }
}

Answer Question

answer_question 4 credits human

Workers read content and answer a question about it. Supports free-text and multiple-choice formats.

Input schema

Field Type Req? Description
content string Source content to read
question string Question to answer
answer_format "free_text" | "multiple_choice" Answer format (default: free_text)
choices string[] Options for multiple-choice format

Example

{
  "type": "answer_question",
  "input": {
    "content": "The Eiffel Tower was built in 1889 for the World's Fair...",
    "question": "When was the Eiffel Tower built?",
    "answer_format": "multiple_choice",
    "choices": ["1876", "1889", "1901", "1912"]
  }
}

Transcription Review

transcription_review 5 credits human

Workers review and correct an AI-generated transcript against the original audio.

Input schema

Field Type Req? Description
audio_url string URL of the audio file
ai_transcript string AI-generated transcript to review
language string Language of the audio (e.g. en, es)

Example

{
  "type": "transcription_review",
  "input": {
    "audio_url": "https://example.com/recording.mp3",
    "ai_transcript": "Welcome to the podast about artifical inteligence...",
    "language": "en"
  }
}