❎ anti-AI measures
This commit is contained in:
144
README.md
144
README.md
@@ -24,6 +24,13 @@ A Flask-based webcomic website with server-side rendering using Jinja2 templates
|
||||
- [SEO Best Practices for Webcomics](#seo-best-practices-for-webcomics)
|
||||
- [SEO Checklist for Launch](#seo-checklist-for-launch)
|
||||
- [Common SEO Questions](#common-seo-questions)
|
||||
- [Content Protection & AI Scraping Prevention](#content-protection--ai-scraping-prevention)
|
||||
- [Protection Features](#protection-features)
|
||||
- [Optional: Additional Protection Measures](#optional-additional-protection-measures)
|
||||
- [Important Limitations](#important-limitations)
|
||||
- [Customizing Your Terms](#customizing-your-terms)
|
||||
- [Testing Your Protection](#testing-your-protection)
|
||||
- [Reporting Violations](#reporting-violations)
|
||||
- [Project Structure](#project-structure)
|
||||
- [Setup](#setup)
|
||||
- [Environment Variables](#environment-variables)
|
||||
@@ -457,6 +464,143 @@ A: Hashtags don't directly affect search engine SEO, but they help social media
|
||||
**Q: Should I create a blog for my comic?**
|
||||
A: Optional, but regular blog content about your comic's development can improve SEO through fresh content and more keywords.
|
||||
|
||||
## Content Protection & AI Scraping Prevention
|
||||
|
||||
Sunday Comics includes built-in measures to discourage AI web scrapers from using your creative work for training machine learning models without permission.
|
||||
|
||||
### Protection Features
|
||||
|
||||
#### robots.txt Blocking
|
||||
The dynamically generated `robots.txt` file blocks known AI crawlers while still allowing legitimate search engines:
|
||||
|
||||
**Blocked AI bots:**
|
||||
- **GPTBot** & **ChatGPT-User** (OpenAI)
|
||||
- **CCBot** (Common Crawl - used by many AI companies)
|
||||
- **anthropic-ai** & **Claude-Web** (Anthropic)
|
||||
- **Google-Extended** (Google's AI training crawler, separate from Googlebot)
|
||||
- **PerplexityBot** (Perplexity AI)
|
||||
- **Omgilibot**, **Diffbot**, **Bytespider**, **FacebookBot**, **ImagesiftBot**, **cohere-ai**
|
||||
|
||||
**Note:** Regular search engine crawlers (Googlebot, Bingbot, etc.) are still allowed so your comic can be discovered through search.
|
||||
|
||||
The robots.txt also includes a reference to your Terms of Service for transparency.
|
||||
|
||||
#### HTML Meta Tags
|
||||
Every page includes meta tags that signal to AI scrapers not to use the content:
|
||||
|
||||
```html
|
||||
<meta name="robots" content="noai, noimageai">
|
||||
<meta name="googlebot" content="noai, noimageai">
|
||||
```
|
||||
|
||||
- `noai` - Prevents AI training on text content
|
||||
- `noimageai` - Prevents AI training on images (your comics)
|
||||
|
||||
#### Terms of Service
|
||||
A comprehensive Terms of Service page at `/terms` legally prohibits:
|
||||
- Using content for AI training or machine learning
|
||||
- Scraping or harvesting content for datasets
|
||||
- Creating derivative works using AI trained on your content
|
||||
- Text and Data Mining (TDM) without permission
|
||||
|
||||
The Terms page is automatically linked in your footer and includes:
|
||||
- Copyright protection assertions
|
||||
- DMCA enforcement information
|
||||
- TDM rights reservation (EU Directive 2019/790 Article 4)
|
||||
- Clear permitted use guidelines
|
||||
|
||||
### Optional: Additional Protection Measures
|
||||
|
||||
#### HTTP Headers (Advanced)
|
||||
For stronger enforcement, you can add HTTP headers. Add this to `app.py` after the imports:
|
||||
|
||||
```python
|
||||
@app.after_request
|
||||
def add_ai_blocking_headers(response):
|
||||
"""Add headers to discourage AI scraping"""
|
||||
response.headers['X-Robots-Tag'] = 'noai, noimageai'
|
||||
return response
|
||||
```
|
||||
|
||||
#### TDM Reservation File (Advanced)
|
||||
Create a `/tdmrep.json` endpoint to formally reserve Text and Data Mining rights:
|
||||
|
||||
```python
|
||||
@app.route('/tdmrep.json')
|
||||
def tdm_reservation():
|
||||
"""TDM (Text and Data Mining) reservation"""
|
||||
from flask import jsonify
|
||||
return jsonify({
|
||||
"tdm": {
|
||||
"reservation": 1,
|
||||
"policy": f"{SITE_URL}/terms"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Important Limitations
|
||||
|
||||
**These measures are voluntary** - they only work if AI companies respect them:
|
||||
|
||||
✅ **What this does:**
|
||||
- Signals your intent to protect your content
|
||||
- Provides legal grounding for DMCA takedowns
|
||||
- Blocks responsible AI companies that honor robots.txt
|
||||
- Makes your copyright stance clear to users and crawlers
|
||||
|
||||
❌ **What this doesn't do:**
|
||||
- Cannot physically prevent determined bad actors from scraping
|
||||
- Cannot remove already-scraped historical data from existing datasets
|
||||
- No guarantee all AI companies will honor these signals
|
||||
|
||||
**Companies that claim to honor robots.txt:**
|
||||
- OpenAI (GPTBot blocking)
|
||||
- Anthropic (anthropic-ai blocking)
|
||||
- Google (Google-Extended blocking, separate from search)
|
||||
|
||||
### Customizing Your Terms
|
||||
|
||||
Edit `/Users/pori/PycharmProjects/sunday/content/terms.md` to customize:
|
||||
|
||||
1. **Jurisdiction** - Add your country/state for legal clarity
|
||||
2. **Permitted use** - Adjust what you allow (fan art, sharing, etc.)
|
||||
3. **Contact info** - Automatically populated from `comics_data.py`
|
||||
|
||||
The Terms page uses Jinja2 template variables that pull from your configuration:
|
||||
- `{{ copyright_name }}` - From `COPYRIGHT_NAME` in `comics_data.py`
|
||||
- `{{ social_email }}` - From `SOCIAL_EMAIL` in `comics_data.py`
|
||||
|
||||
### Testing Your Protection
|
||||
|
||||
**Verify robots.txt:**
|
||||
```bash
|
||||
curl https://yourcomic.com/robots.txt
|
||||
```
|
||||
|
||||
You should see AI bot blocks and a link to your terms.
|
||||
|
||||
**Check meta tags:**
|
||||
View page source and look for:
|
||||
```html
|
||||
<meta name="robots" content="noai, noimageai">
|
||||
```
|
||||
|
||||
**Validate Terms page:**
|
||||
Visit `https://yourcomic.com/terms` to ensure it renders correctly.
|
||||
|
||||
### Reporting Violations
|
||||
|
||||
If you discover your work in an AI training dataset or being used without permission:
|
||||
|
||||
1. **Document the violation** - Screenshots, URLs, timestamps
|
||||
2. **Review their TOS** - Many AI services have content dispute processes
|
||||
3. **Send DMCA takedown** - Your Terms of Service provides legal standing
|
||||
4. **Contact the platform** - Use your `SOCIAL_EMAIL` from the Terms page
|
||||
|
||||
Resources:
|
||||
- [US Copyright Office DMCA](https://www.copyright.gov/dmca/)
|
||||
- [EU Copyright Directive](https://digital-strategy.ec.europa.eu/en/policies/copyright-legislation)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user