Add Websites
Train your chatbot with content from web pages.
How It Works
When you add a website URL, Leezy:
- Fetches the web page content
- Extracts text (removes navigation, ads, etc.)
- Processes and embeds the content
- Makes it available for chatbot responses
Adding a Website
- Go to your chatbot's Sources page
- Click Add Source
- Select Website
- Enter the full URL (including
https://) - Click Add
Example URLs:
https://example.com/help/getting-startedhttps://docs.example.com/faqhttps://example.com/products/feature-guide
What Gets Crawled
Single Page Mode
By default, only the specified URL is crawled. This is useful for:
- Individual help articles
- Specific FAQ pages
- Product pages
Content Extraction
The crawler extracts:
- Main content text
- Headings and paragraphs
- Lists and tables
- Code blocks
The crawler ignores:
- Navigation menus
- Sidebars
- Footer content
- Advertisements
- Scripts and styles
Best Practices
Choose Good URLs
Ideal pages for training:
- FAQ pages
- Help center articles
- Documentation pages
- Product descriptions
- Blog posts with useful information
Avoid These URLs
- Login-required pages (authentication blocks crawling)
- Dynamic single-page apps (content may not load)
- Pages with mostly images/videos
- Very long pages (may hit content limits)
Keep Content Fresh
Website content changes over time. To update:
- Delete the old website source
- Re-add the URL
Regular Updates
If your website content changes frequently, schedule regular re-crawls by removing and re-adding URLs.
URL Requirements
- Must include protocol (
https://orhttp://) - Must be publicly accessible
- Cannot require authentication
- Must return HTML content
Processing Time
Web pages typically process faster than documents:
- Simple pages: Under 1 minute
- Content-heavy pages: 1-2 minutes
Troubleshooting
Crawl Failed
Common causes:
- URL is incorrect or page doesn't exist
- Page requires login
- Website blocks crawlers
- Server timeout
Solutions:
- Verify the URL opens in your browser
- Check if the page requires authentication
- Try a different page from the same site
Little or No Content Extracted
Common causes:
- Page content loads via JavaScript
- Content is mostly images
- Anti-bot protection
Solutions:
- Try pages with static HTML content
- Use documentation pages instead of marketing pages
- Consider uploading content as a document instead
Content Is Outdated
Website sources capture content at the time of crawling. To refresh:
- Delete the existing website source
- Add the URL again
Limits
| Plan | Websites |
|---|---|
| Free | 5 |
| Starter | 20 |
| Pro | 100 |
| Business | Unlimited |
Next Steps
- Create Q&A Pairs - Add specific answers
- Upload Documents - Add file-based content
- Test Your Chatbot - Verify training