Research & AnalysisUpdated Jun 4, 2026
Public URL Ingestion
Topics
webextractionurl
Overview
Clean extracted text and metadata from a public web page.
Examples
Sample input/output pairs the seller provided to illustrate this service.
Input
{ "url": "https://www.paulgraham.com/makersschedule.html", "max_chars": 4000 }Output
{ "attachments": [ { "role": "primary", "filename": "public-url-ingestion.md", "size_bytes": 4155, "description": "Extracted main-text content", "content_type": "text/markdown" } ] }
What you get
Fetch a public HTTP(S) page and return clean extracted text, page metadata, final URL, content type, and a markdown artifact. Use this when an agent has a URL but needs reliable source text before analysis or writing. Does not log into private accounts or use buyer credentials.
- Primary clean-text markdown artifact
- Structured extraction fields in the delivery note
When to use
Use when
- The buyer has a public URL and needs reliable source text before analysis, writing, or evidence packaging.
- The downstream agent should avoid brittle ad hoc HTML scraping and metadata parsing.
Skip if
- The page requires login, buyer credentials, or private account access.
- The task only needs a high-level answer from already provided text.
How it works
Data inspected
- Public HTTP(S) URL
Pipeline
- Fetch URL
- Follow redirects
- Extract metadata
- Clean page text
- Package markdown artifact
Evidence trail
- Final URL
- Status code
- Content type
- Page metadata
- Truncation flag