Documentation
Everything you need to integrate CrowdSorcerer.
Quickstart
Post your first task and pull the worker's response as typed JSON.
- Request early access — we're in closed beta and onboarding one cohort at a time
- Generate an API key from the API Keys page
- POST a task, then poll
/v1/tasks/{id}or subscribe to a webhook for the result
TypeScript / Node.js
npm install @crowdsourcerer/sdk Python
pip install crowdsourcerer-sdk cURL
curl -X POST https://crowdsourcerer.rebaselabs.online/v1/tasks \
-H "Authorization: Bearer csk_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "label_text",
"input": {
"text": "The new iPhone is great!",
"categories": ["positive", "negative", "neutral"],
"question": "What is the sentiment?"
},
"assignments_required": 3
}' Authentication
All API requests require a Bearer token in the Authorization header.
Authorization: Bearer csk_YOUR_API_KEY
Two token types are accepted:
- API Keys (
csk_...) — recommended for production. Create in the dashboard. - JWT tokens —
returned from
POST /v1/auth/login. Use for short-lived user sessions.
Task lifecycle
Human tasks go through the worker marketplace. Once posted, the task is visible to skill-matched workers who can claim it, do the work, and submit a typed response within the timeout you set.
Submit a task → get a task_id →
poll GET /v1/tasks/{task_id} or subscribe to a webhook.
Credits are reserved on submission (worker_reward × assignments plus platform fee)
and released to the worker(s) on approval. Unused slots are refunded if the task closes early or no
worker claims it before claim_timeout_minutes.
Submit a task
POST /v1/tasks
{
"type": "label_text",
"input": {
"text": "The new iPhone is great!",
"categories": ["positive", "negative", "neutral"],
"question": "What is the sentiment?"
},
"assignments_required": 3, // 1–20 workers
"consensus_strategy": "majority_vote", // any_first | majority_vote | unanimous | requester_review
"claim_timeout_minutes": 60, // how long a worker has to submit
"priority": "normal", // low | normal | high | urgent
"webhook_url": "https://..." // optional
} Poll for result
GET /v1/tasks/{task_id}
// Response when completed:
{
"id": "uuid",
"type": "label_text",
"status": "completed",
"execution_mode": "human",
"output": {
"summary": "positive (3/3 workers agree)",
"raw": { "submissions": [...] }
},
"credits_used": 9,
"assignments_required": 3,
"assignments_completed": 3,
"created_at": "2026-04-11T00:00:00Z",
"completed_at": "2026-04-11T00:12:31Z"
} TypeScript SDK
npm install @crowdsourcerer/sdk import { CrowdSorcerer } from "@crowdsourcerer/sdk";
const crowd = new CrowdSorcerer({ apiKey: process.env.CROWDSOURCERER_API_KEY! });
// Submit + poll until the worker finishes (or timeout)
const task = await crowd.runTask({
type: "label_text",
input: {
text: "The new iPhone is great!",
categories: ["positive", "negative", "neutral"],
question: "What is the sentiment?",
},
assignments_required: 3, // get 3 independent workers
});
console.log(task.output?.summary); // → "positive (3/3 workers agree)"
// Or submit async and poll manually
const { task_id } = await crowd.submitTask({
type: "moderate_content",
input: { content: "User-submitted post goes here", guidelines: "..." },
});
const result = await crowd.getTask(task_id);
// Credits
const balance = await crowd.getCredits();
console.log(balance.available); // → 87 Python SDK
pip install crowdsourcerer-sdk Requires Python 3.9+. Supports both sync and async clients.
Sync client
from crowdsourcerer import CrowdSorcerer
client = CrowdSorcerer(api_key="csk_YOUR_KEY")
# Submit a human task and wait for the worker(s) to finish
task = client.tasks.create("label_text", {
"text": "The new iPhone is great!",
"categories": ["positive", "negative", "neutral"],
"question": "What is the sentiment?",
}, assignments_required=3)
completed = client.tasks.wait(task.id)
print(completed.output)
# Check credits
balance = client.credits.balance()
print(f"Available: {balance.available} credits") Async client
import asyncio
from crowdsourcerer import AsyncCrowdSorcerer
async def main():
async with AsyncCrowdSorcerer(api_key="csk_YOUR_KEY") as client:
# Fan out a batch of rate_quality tasks concurrently
tasks = await asyncio.gather(*[
client.tasks.create("rate_quality", {
"content": ai_output,
"criteria": "Rate the factual accuracy 1–5",
}, assignments_required=2)
for ai_output in pending_outputs
])
results = await asyncio.gather(*[client.tasks.wait(t.id) for t in tasks])
for r in results:
print(r.id, r.status, r.output)
asyncio.run(main()) Error handling
from crowdsourcerer import (
CrowdSorcerer, AuthError, RateLimitError,
InsufficientCreditsError, TaskError,
)
client = CrowdSorcerer(api_key="csk_YOUR_KEY")
try:
task = client.tasks.create("verify_fact", {
"claim": "The Eiffel Tower is 330 metres tall",
})
result = client.tasks.wait(task.id)
except InsufficientCreditsError:
print("Top up at crowdsourcerer.rebaselabs.online/dashboard/credits")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except TaskError as e:
print(f"Task failed: {e}")
except AuthError:
print("Invalid API key") Webhooks
Get notified when tasks complete instead of polling.
Set webhook_url when creating a task,
or register a persistent endpoint in the dashboard.
Deliveries retry up to 3 times with exponential backoff.
POST https://your-server.com/webhook
{
"task_id": "3f4a1b2c-...",
"event": "task.completed"
}
Fetch full details from GET /v1/tasks/{task_id} after receiving the event.
Signature verification
Every webhook delivery is signed with your endpoint's secret using HMAC-SHA256. Always verify signatures to ensure requests came from CrowdSorcerer.
The signature is in the X-Crowdsorcerer-Signature header
in the format t=TIMESTAMP,v1=HMAC_HEX.
The timestamp is also available separately in X-Crowdsorcerer-Timestamp.
Python
import hmac, hashlib, time
def verify_webhook(payload_bytes: bytes, secret: str, sig_header: str,
tolerance: int = 300) -> bool:
"""Verify a CrowdSorcerer webhook signature."""
# Parse "t=TIMESTAMP,v1=SIGNATURE"
parts = {}
for part in sig_header.split(","):
key, _, value = part.partition("=")
parts[key.strip()] = value.strip()
timestamp = parts.get("t")
signature = parts.get("v1")
if not timestamp or not signature:
return False
# Reject old deliveries (replay protection)
if abs(time.time() - int(timestamp)) > tolerance:
return False
# Reconstruct the signed payload: "{timestamp}.{body}"
sig_input = f"{timestamp}.".encode() + payload_bytes
expected = hmac.new(
secret.encode(), sig_input, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
# In your Flask / FastAPI handler:
# sig = request.headers["X-Crowdsorcerer-Signature"]
# verify_webhook(request.get_data(), YOUR_SECRET, sig) Node.js / TypeScript
import crypto from "crypto";
function verifyWebhook(
payloadBytes: Buffer, secret: string,
sigHeader: string, toleranceSec = 300
): boolean {
const parts: Record<string, string> = {};
for (const p of sigHeader.split(",")) {
const [k, ...v] = p.split("=");
parts[k.trim()] = v.join("=").trim();
}
const ts = parts["t"], sig = parts["v1"];
if (!ts || !sig) return false;
if (Math.abs(Date.now() / 1000 - Number(ts)) > toleranceSec) return false;
const sigInput = Buffer.concat([
Buffer.from(ts + "."),
payloadBytes,
]);
const expected = crypto
.createHmac("sha256", secret)
.update(sigInput)
.digest("hex");
return crypto.timingSafeEqual(
Buffer.from(expected), Buffer.from(sig)
);
}
// In your Express handler:
// const sig = req.headers["x-crowdsorcerer-signature"];
// verifyWebhook(req.body, YOUR_SECRET, sig); Secret rotation
When you rotate a webhook secret, both the old and new signatures are sent for 24 hours
(v1 = new, v0 = old).
Verify against both during the rotation window, then switch to the new secret.
SDK helpers
Both SDKs include built-in webhook verification so you don't have to implement it yourself:
Python
from crowdsourcerer import verify_webhook
is_valid = verify_webhook(
payload=request.get_data(),
secret=YOUR_SECRET,
signature_header=request.headers[
"X-Crowdsorcerer-Signature"
],
) TypeScript
import { verifyWebhook } from
"@crowdsourcerer/sdk";
const valid = verifyWebhook(
req.body,
process.env.WEBHOOK_SECRET!,
req.headers[
"x-crowdsorcerer-signature"
],
); Pipeline AI primitives
Six in-process primitives that pipelines can chain with human steps.
They are not submittable via POST /v1/tasks directly —
that endpoint rejects them with a 422 pointing at
/v1/pipelines. Documented here so pipeline authors know the input schema for each step type.
Credit costs shown are per step. Credits are charged on pipeline execution and refunded on failure.
LLM Generate
llm_generate 1 credit Direct LLM completion via the configured provider (Anthropic, Gemini, or OpenAI — picked by LLM_PROVIDER env or auto-detected from whichever API key is set).
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| messages | Message[] | ✓ | Array of {role, content} messages. system, user, assistant roles supported. |
| system_prompt | string | System prompt; merged with any system messages in the array. | |
| model | string | Provider-specific model id; defaults to the provider's configured default. | |
| temperature | number | 0–2 (default: 0.7) | |
| max_tokens | number | Max output tokens (default: 2048) |
Example
{
"type": "llm_generate",
"input": {
"messages": [{"role": "user", "content": "Summarize the pros and cons of microservices in 3 bullet points each."}]
}
} Data Transform
data_transform 2 credits Wraps your data and a natural-language instruction in a structured prompt and runs it through the LLM. Use for CSV↔JSON conversion, field renaming, normalization, filtering, etc.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| data | any | ✓ | Input data (object, array, or plain text). |
| transform | string | ✓ | Natural-language description of what to do. |
| output_format | "json" | "csv" | "markdown" | "text" | Desired output format (default: json). |
Example
{
"type": "data_transform",
"input": {
"data": [{"name": "Alice", "score": "87"}, {"name": "Bob", "score": "92"}],
"transform": "Sort by score descending and add a rank column",
"output_format": "json"
}
} Web Research
web_research 10 credits Fetches a URL with httpx, extracts visible text with BeautifulSoup, and summarises through the LLM using your instruction.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| url | string | ✓ | URL to fetch. Must be publicly reachable (private/loopback IPs are blocked). |
| instruction | string | What to summarise or extract. Defaults to a general page summary. |
Example
{
"type": "web_research",
"input": {
"url": "https://news.ycombinator.com",
"instruction": "List the top 5 story titles"
}
} Document Parse
document_parse 3 credits Local extraction for PDF (pypdf), DOCX (python-docx), XLSX (openpyxl), and plain text. No network calls once the document is fetched.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| url | string | URL to a document. Mutually exclusive with content_base64. | |
| content_base64 | string | Base64-encoded document content (20 MB cap). | |
| mime_type | string | Hint when content_base64 is used and the magic bytes are ambiguous. |
Example
{
"type": "document_parse",
"input": {
"url": "https://example.com/report.pdf"
}
} PII Detect
pii_detect 2 credits In-process regex detector for email, phone, SSN, credit-card (Luhn-validated), IBAN, IPv4/6, and passport numbers. Optionally returns a redacted copy.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| text | string | ✓ | Text to scan. |
| entities | string[] | Subset of entity types to detect (default: all). | |
| mask | boolean | Include a redacted copy of the input in the output. |
Example
{
"type": "pii_detect",
"input": {
"text": "Call John Smith at john@example.com or 555-123-4567",
"mask": true
}
} Code Execute
code_execute 3 credits Sandboxed Python subprocess. Runs python -I in a temp directory with rlimits on CPU, file size, and memory. Python only — other languages are not supported.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| code | string | ✓ | Python source to run. |
| language | "python" | Only python is accepted. | |
| timeout_seconds | number | Wall-clock timeout (default: 10, max: 30). | |
| stdin | string | Standard input passed to the script. |
Example
{
"type": "code_execute",
"input": {
"code": "import json\ndata = [1, 2, 3, 4, 5]\nprint(json.dumps({'sum': sum(data), 'mean': sum(data)/len(data)}))",
"language": "python"
}
} Human task types
8 task types completed by human workers. Best for subjective judgments, quality evaluation, and tasks requiring human context.
Credit costs are the base worker reward per assignment. Total cost =
(worker_reward × assignments) + platform_fee
where platform fee is 20% (minimum 1 credit).
Human task options
These fields can be set on any human task type alongside the task-specific input:
assignments_required (1-10, default: 1)consensus_strategyworker_reward_creditsclaim_timeout_minutes (5-480, default: 30)min_skill_level (1-5)task_instructionsLabel Image
label_image 3 credits human Workers classify images by selecting from a set of predefined labels.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| image_url | string | ✓ | URL of the image to label |
| labels | string[] | ✓ | Array of possible labels to choose from |
| description | string | Additional context for workers |
Example
{
"type": "label_image",
"input": {
"image_url": "https://example.com/photo.jpg",
"labels": ["cat", "dog", "bird", "other"],
"description": "Select the animal in this photo"
}
} Label Text
label_text 2 credits human Workers categorize text into one of the provided categories.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| text | string | ✓ | Text to categorize |
| categories | string[] | ✓ | Array of possible categories |
Example
{
"type": "label_text",
"input": {
"text": "The new iPhone has amazing battery life and a great camera",
"categories": ["positive", "negative", "neutral"]
}
} Rate Quality
rate_quality 2 credits human Workers rate content quality on a 1–5 scale based on specified criteria.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| title | string | ✓ | Title of the content to rate |
| content | string | ✓ | The content to evaluate |
| criteria | string | What to evaluate (e.g. clarity, accuracy) |
Example
{
"type": "rate_quality",
"input": {
"title": "Introduction to Machine Learning",
"content": "Machine learning is a subset of AI...",
"criteria": "Rate the accuracy and clarity of this explanation"
}
} Verify Fact
verify_fact 3 credits human Workers verify whether a claim is true, false, or indeterminate given the provided context.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| claim | string | ✓ | The factual claim to verify |
| context | string | ✓ | Supporting context or evidence |
Example
{
"type": "verify_fact",
"input": {
"claim": "Python was created by Guido van Rossum in 1991",
"context": "Python is a high-level programming language first released in 1991 by Guido van Rossum."
}
} Moderate Content
moderate_content 2 credits human Workers review content against a policy and decide to approve, reject, or escalate.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| content | string | ✓ | The content to moderate |
| content_type | string | Type of content (e.g. comment, review, post) | |
| policy_context | string | Moderation policy or guidelines |
Example
{
"type": "moderate_content",
"input": {
"content": "This product is absolutely terrible, worst purchase ever!",
"content_type": "product_review",
"policy_context": "Reject spam, hate speech, and threats. Allow negative opinions."
}
} Compare & Rank
compare_rank 2 credits human Workers compare two options and select the better one based on given criteria.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| option_a | string | ✓ | First option to compare |
| option_b | string | ✓ | Second option to compare |
| criteria | string | What to compare on (e.g. readability, accuracy) |
Example
{
"type": "compare_rank",
"input": {
"option_a": "Machine learning uses algorithms to learn from data.",
"option_b": "ML is when computers figure stuff out from examples.",
"criteria": "Which explanation is clearer and more professional?"
}
} Answer Question
answer_question 4 credits human Workers read content and answer a question about it. Supports free-text and multiple-choice formats.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| content | string | ✓ | Source content to read |
| question | string | ✓ | Question to answer |
| answer_format | "free_text" | "multiple_choice" | Answer format (default: free_text) | |
| choices | string[] | Options for multiple-choice format |
Example
{
"type": "answer_question",
"input": {
"content": "The Eiffel Tower was built in 1889 for the World's Fair...",
"question": "When was the Eiffel Tower built?",
"answer_format": "multiple_choice",
"choices": ["1876", "1889", "1901", "1912"]
}
} Transcription Review
transcription_review 5 credits human Workers review and correct an AI-generated transcript against the original audio.
Input schema
| Field | Type | Req? | Description |
|---|---|---|---|
| audio_url | string | ✓ | URL of the audio file |
| ai_transcript | string | ✓ | AI-generated transcript to review |
| language | string | Language of the audio (e.g. en, es) |
Example
{
"type": "transcription_review",
"input": {
"audio_url": "https://example.com/recording.mp3",
"ai_transcript": "Welcome to the podast about artifical inteligence...",
"language": "en"
}
}