What it does
Markurl is a Node library that fetches a URL and returns a clean Markdown representation plus normalized metadata (title, byline, publish date, excerpts, images, links) suitable for LLM context. It trims output to a token budget by cutting at section boundaries, handles different content types (articles via Readability, GitHub READMEs via raw fetch, generic <p>-density fallback), and merges metadata from JSON-LD, OpenGraph, Twitter cards, and HTML meta tags. Tracking parameters are stripped from extracted links. The library ships with an MCP server, making it directly usable as a tool from Claude Desktop, Claude Code, or any MCP client.
Who it's for
Developers building AI agents, chatbots, or AI-powered applications that need to fetch and summarize web content. Specifically for agents that incorporate recent web information into Claude's context—e.g., newsfeeds, research summaries, real-time data lookups.
Common use cases
- Fetch a news article and return its Markdown + metadata within a 4000-token budget for Claude analysis.
- Extract GitHub repository READMEs as structured Markdown for documentation lookups.
- Pre-process search results before feeding them into an LLM prompt.
- Build an agent that can read web pages directly without requiring a separate browser or HTML parsing tool.
- Supplement Claude's knowledge with real-time web content while keeping token usage predictable.
Setup pitfalls
- Requires Node 20+. Older versions will fail silently during install.
- The optional Playwright fallback (
useBrowser: "auto") requires installing theplaywrightpeer dependency separately and runningnpx playwright install chromium—skipping this means JS-heavy sites will return incomplete content. - Rate limiting is built-in (500 ms minimum between requests to the same host by default); if fetching many URLs from the same domain in parallel, expect staggered responses.
- The fetch cache has a 5-minute TTL and 200-entry limit; high-volume agents should implement a persistent cache backend (Redis, etc.) via the
FetchCacheinterface.