Skip to main content

Add Websites

Train your chatbot with content from web pages.

How It Works

When you add a website URL, Leezy:

  1. Fetches the web page content
  2. Extracts text (removes navigation, ads, etc.)
  3. Processes and embeds the content
  4. Makes it available for chatbot responses

Adding a Website

  1. Go to your chatbot's Sources page
  2. Click Add Source
  3. Select Website
  4. Enter the full URL (including https://)
  5. Click Add

Example URLs:

  • https://example.com/help/getting-started
  • https://docs.example.com/faq
  • https://example.com/products/feature-guide

What Gets Crawled

Single Page Mode

By default, only the specified URL is crawled. This is useful for:

  • Individual help articles
  • Specific FAQ pages
  • Product pages

Content Extraction

The crawler extracts:

  • Main content text
  • Headings and paragraphs
  • Lists and tables
  • Code blocks

The crawler ignores:

  • Navigation menus
  • Sidebars
  • Footer content
  • Advertisements
  • Scripts and styles

Best Practices

Choose Good URLs

Ideal pages for training:

  • FAQ pages
  • Help center articles
  • Documentation pages
  • Product descriptions
  • Blog posts with useful information

Avoid These URLs

  • Login-required pages (authentication blocks crawling)
  • Dynamic single-page apps (content may not load)
  • Pages with mostly images/videos
  • Very long pages (may hit content limits)

Keep Content Fresh

Website content changes over time. To update:

  1. Delete the old website source
  2. Re-add the URL
Regular Updates

If your website content changes frequently, schedule regular re-crawls by removing and re-adding URLs.

URL Requirements

  • Must include protocol (https:// or http://)
  • Must be publicly accessible
  • Cannot require authentication
  • Must return HTML content

Processing Time

Web pages typically process faster than documents:

  • Simple pages: Under 1 minute
  • Content-heavy pages: 1-2 minutes

Troubleshooting

Crawl Failed

Common causes:

  • URL is incorrect or page doesn't exist
  • Page requires login
  • Website blocks crawlers
  • Server timeout

Solutions:

  • Verify the URL opens in your browser
  • Check if the page requires authentication
  • Try a different page from the same site

Little or No Content Extracted

Common causes:

  • Page content loads via JavaScript
  • Content is mostly images
  • Anti-bot protection

Solutions:

  • Try pages with static HTML content
  • Use documentation pages instead of marketing pages
  • Consider uploading content as a document instead

Content Is Outdated

Website sources capture content at the time of crawling. To refresh:

  1. Delete the existing website source
  2. Add the URL again

Limits

PlanWebsites
Free5
Starter20
Pro100
BusinessUnlimited

Next Steps