Documentation

Everything you need to integrate CrowdSorcerer.

Quickstart

Get your first task result in under 2 minutes.

  1. Create an account — get 100 free credits instantly
  2. Generate an API key from the API Keys page
  3. Submit your first task via the REST API or SDK
# Install the SDK
npm install @crowdsourcerer/sdk

# Or call the REST API directly
curl -X POST https://crowdsourcerer.rebaselabs.online/v1/tasks \
  -H "Authorization: Bearer csk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "web_research",
    "input": {
      "url": "https://example.com",
      "instruction": "Extract the main heading and description"
    }
  }'

Authentication

All API requests require a Bearer token in the Authorization header.

Authorization: Bearer csk_YOUR_API_KEY

Two token types are accepted:

  • API Keys (csk_...) — recommended for production. Create in the dashboard.
  • JWT tokens — returned from POST /v1/auth/login. Use for short-lived user sessions.

Task lifecycle

Tasks execute asynchronously. States:

queued running completed failed cancelled

Submit a task → get a task_id → poll GET /v1/tasks/{task_id} or use webhooks. Credits are charged immediately on submission (refunded automatically if the task fails).

Submit a task

POST /v1/tasks
{
  "type": "web_research",
  "input": { "url": "https://example.com" },
  "priority": "normal",           // optional: low | normal | high
  "webhook_url": "https://..."    // optional
}

Poll for result

GET /v1/tasks/{task_id}

// Response when completed:
{
  "id": "uuid",
  "type": "web_research",
  "status": "completed",
  "output": { "summary": "...", "raw": {...} },
  "credits_used": 10,
  "duration_ms": 3420,
  "created_at": "2025-01-01T00:00:00Z",
  "completed_at": "2025-01-01T00:00:03Z"
}

TypeScript SDK

npm install @crowdsourcerer/sdk
import { CrowdSorcerer } from "@crowdsourcerer/sdk";

const crowd = new CrowdSorcerer({ apiKey: process.env.CROWDSOURCERER_API_KEY! });

// Typed helpers — submit + wait for result
const result = await crowd.webResearch({ url: "https://news.ycombinator.com" });
const entity = await crowd.entityLookup({ entity_type: "company", name: "Stripe" });
const llm    = await crowd.llmGenerate({ messages: [{ role: "user", content: "Hello!" }] });

// Or submit async + poll manually
const { task_id } = await crowd.submitTask({ type: "screenshot", input: { url: "..." } });
const task = await crowd.getTask(task_id);

// Credits
const balance = await crowd.getCredits();
console.log(balance.available); // → 87

Webhooks

Set webhook_url when creating a task. On completion we POST to that URL with retries (up to 3, exponential backoff):

{
  "task_id": "3f4a1b2c-...",
  "event": "task.completed"
}

Then fetch full details from GET /v1/tasks/{task_id}.

Task type reference

Detailed input schemas for all 10 task types.

Web Research

web_research 10 credits

Scrape and summarize any URL. Supports JavaScript-rendered pages, table extraction, and link discovery.

Input schema

Field Type Req? Description
url string URL to scrape
instruction string What to extract (default: main content)
extract_tables boolean Extract tables (default: false)
extract_links boolean Extract links (default: false)
wait_for_selector string CSS selector to wait for before scraping

Example

{
  "type": "web_research",
  "input": {
    "url": "https://news.ycombinator.com",
    "instruction": "List the top 5 story titles and their scores"
  }
}

Entity Lookup

entity_lookup 5 credits

Enrich company or person data: description, tech stack, headcount, funding, social profiles.

Input schema

Field Type Req? Description
entity_type "company" | "person" Type of entity to enrich
name string Company or person name
domain string Company domain (e.g. stripe.com)
linkedin_url string LinkedIn profile URL
enrich_fields string[] Specific fields to include

Example

{
  "type": "entity_lookup",
  "input": {
    "entity_type": "company",
    "domain": "stripe.com"
  }
}

Document Parse

document_parse 3 credits

Extract text, tables, and metadata from PDFs, Word docs, images, and other documents.

Input schema

Field Type Req? Description
url string URL to a document file
base64_content string Base64-encoded file content
mime_type string MIME type when using base64 (e.g. application/pdf)
extract_tables boolean Extract tables (default: true)
extract_images boolean Extract images (default: false)

Example

{
  "type": "document_parse",
  "input": {
    "url": "https://example.com/report.pdf",
    "extract_tables": true
  }
}

Data Transform

data_transform 2 credits

Transform data between formats: CSV↔JSON, clean/normalize, filter, reshape.

Input schema

Field Type Req? Description
data any Input data (JSON, CSV string, array, etc.)
transform string Natural language description of the transformation
output_format string Desired output format: json, csv, markdown (default: json)

Example

{
  "type": "data_transform",
  "input": {
    "data": [{"name": "Alice", "score": "87"}, {"name": "Bob", "score": "92"}],
    "transform": "Sort by score descending and add a rank column",
    "output_format": "json"
  }
}

LLM Generate

llm_generate 1 credits

Route to the best LLM for the job. Supports Claude, GPT-4, Gemini, Mistral, and more via the RebaseKit LLM router.

Input schema

Field Type Req? Description
messages Message[] Array of {role, content} messages
system_prompt string System prompt (prepended if no system message in messages)
model string Model ID (default: claude-3-5-haiku-20241022)
temperature number Temperature 0–2 (default: 0.7)
max_tokens number Max output tokens (default: 2048)

Example

{
  "type": "llm_generate",
  "input": {
    "messages": [{"role": "user", "content": "Summarize the pros and cons of microservices in 3 bullet points each."}],
    "model": "claude-3-5-haiku-20241022"
  }
}

Screenshot

screenshot 2 credits

Capture a full-page or viewport screenshot of any URL. Returns a base64-encoded image.

Input schema

Field Type Req? Description
url string URL to screenshot
width number Viewport width in px (default: 1280)
height number Viewport height in px (default: 800)
full_page boolean Capture full scrollable page (default: false)
format "png" | "jpeg" Image format (default: png)
wait_for_selector string CSS selector to wait for before capturing

Example

{
  "type": "screenshot",
  "input": {
    "url": "https://example.com",
    "full_page": true,
    "format": "jpeg"
  }
}

Audio Transcribe

audio_transcribe 8 credits

Transcribe audio files with speaker diarization. Supports MP3, WAV, M4A, and more.

Input schema

Field Type Req? Description
url string URL to an audio file
base64_audio string Base64-encoded audio content
language string BCP-47 language code (auto-detected if omitted)
diarize boolean Speaker diarization (default: false)

Example

{
  "type": "audio_transcribe",
  "input": {
    "url": "https://example.com/interview.mp3",
    "diarize": true
  }
}

PII Detect

pii_detect 2 credits

Detect, mask, or vault 30+ PII types (names, emails, SSNs, credit cards, etc.).

Input schema

Field Type Req? Description
text string Text to analyze
entities string[] Specific PII types to detect (default: all)
mask boolean Replace PII with [TYPE] tokens (default: false)
vault boolean Store PII in vault, return placeholder tokens (default: false)

Example

{
  "type": "pii_detect",
  "input": {
    "text": "Call John Smith at john@example.com or 555-123-4567",
    "mask": true
  }
}

Code Execute

code_execute 3 credits

Run Python, JavaScript, TypeScript, or Bash code in a sandboxed environment.

Input schema

Field Type Req? Description
code string Code to execute
language "python" | "javascript" | "typescript" | "bash" Language (default: python)
timeout_seconds number Execution timeout (default: 30)
stdin string Standard input to pass to the program

Example

{
  "type": "code_execute",
  "input": {
    "code": "import json\ndata = [1, 2, 3, 4, 5]\nprint(json.dumps({'sum': sum(data), 'mean': sum(data)/len(data)}))",
    "language": "python"
  }
}

Web Intel

web_intel 5 credits

AI-powered web intelligence gathering. Search the web and synthesize results into structured insights.

Input schema

Field Type Req? Description
query string Research query or question
sources string[] Specific domains or URLs to prioritize
max_results number Max search results to process (default: 10)

Example

{
  "type": "web_intel",
  "input": {
    "query": "Latest AI coding assistants launched in 2025 — compare features and pricing",
    "max_results": 15
  }
}