Add Websites

Train your chatbot with content from web pages.

How It Works

When you add a website URL, Leezy:

Fetches the web page content
Extracts text (removes navigation, ads, etc.)
Processes and embeds the content
Makes it available for chatbot responses

Adding a Website

Go to your chatbot's Sources page
Click Add Source
Select Website
Enter the full URL (including https://)
Click Add

Example URLs:

https://example.com/help/getting-started
https://docs.example.com/faq
https://example.com/products/feature-guide

What Gets Crawled

Single Page Mode

By default, only the specified URL is crawled. This is useful for:

Individual help articles
Specific FAQ pages
Product pages

Content Extraction

The crawler extracts:

Main content text
Headings and paragraphs
Lists and tables
Code blocks

The crawler ignores:

Navigation menus
Sidebars
Footer content
Advertisements
Scripts and styles

Best Practices

Choose Good URLs

Ideal pages for training:

FAQ pages
Help center articles
Documentation pages
Product descriptions
Blog posts with useful information

Avoid These URLs

Login-required pages (authentication blocks crawling)
Dynamic single-page apps (content may not load)
Pages with mostly images/videos
Very long pages (may hit content limits)

Keep Content Fresh

Website content changes over time. To update:

Delete the old website source
Re-add the URL

:::tip Regular Updates If your website content changes frequently, schedule regular re-crawls by removing and re-adding URLs. :::

URL Requirements

Must include protocol (https:// or http://)
Must be publicly accessible
Cannot require authentication
Must return HTML content

Processing Time

Web pages typically process faster than documents:

Simple pages: Under 1 minute
Content-heavy pages: 1-2 minutes

Troubleshooting

Crawl Failed

Common causes:

URL is incorrect or page doesn't exist
Page requires login
Website blocks crawlers
Server timeout

Solutions:

Verify the URL opens in your browser
Check if the page requires authentication
Try a different page from the same site

Little or No Content Extracted

Common causes:

Page content loads via JavaScript
Content is mostly images
Anti-bot protection

Solutions:

Try pages with static HTML content
Use documentation pages instead of marketing pages
Consider uploading content as a document instead

Content Is Outdated

Website sources capture content at the time of crawling. To refresh:

Delete the existing website source
Add the URL again

Limits

Plan	Websites
Free	5
Starter	20
Pro	100
Business	Unlimited

Next Steps

Create Q&A Pairs - Add specific answers
Upload Documents - Add file-based content
Test Your Chatbot - Verify training

How It Works​

Adding a Website​

What Gets Crawled​

Single Page Mode​

Content Extraction​

Best Practices​

Choose Good URLs​

Avoid These URLs​

Keep Content Fresh​

URL Requirements​

Processing Time​

Troubleshooting​

Crawl Failed​

Little or No Content Extracted​

Content Is Outdated​

Limits​

Next Steps​