❎ moar anti-AI measures

❎ anti-AI measures
2025-11-15 15:49:34 +10:00 · 2025-11-15 15:43:32 +10:00
5 changed files with 365 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -24,6 +24,13 @@ A Flask-based webcomic website with server-side rendering using Jinja2 templates
  - [SEO Best Practices for Webcomics](#seo-best-practices-for-webcomics)
  - [SEO Checklist for Launch](#seo-checklist-for-launch)
  - [Common SEO Questions](#common-seo-questions)
+- [Content Protection & AI Scraping Prevention](#content-protection--ai-scraping-prevention)
+  - [Protection Features](#protection-features)
+  - [Advanced: Image-Level Protection Tools](#advanced-image-level-protection-tools)
+  - [Important Limitations](#important-limitations)
+  - [Customizing Your Terms](#customizing-your-terms)
+  - [Testing Your Protection](#testing-your-protection)
+  - [Reporting Violations](#reporting-violations)
 - [Project Structure](#project-structure)
 - [Setup](#setup)
 - [Environment Variables](#environment-variables)
@@ -457,6 +464,168 @@ A: Hashtags don't directly affect search engine SEO, but they help social media
 **Q: Should I create a blog for my comic?**
 A: Optional, but regular blog content about your comic's development can improve SEO through fresh content and more keywords.

+## Content Protection & AI Scraping Prevention
+
+Sunday Comics includes built-in measures to discourage AI web scrapers from using your creative work for training machine learning models without permission.
+
+### Protection Features
+
+#### robots.txt Blocking
+The dynamically generated `robots.txt` file blocks known AI crawlers while still allowing legitimate search engines:
+
+**Blocked AI bots:**
+- **GPTBot** & **ChatGPT-User** (OpenAI)
+- **CCBot** (Common Crawl - used by many AI companies)
+- **anthropic-ai** & **Claude-Web** (Anthropic)
+- **Google-Extended** (Google's AI training crawler, separate from Googlebot)
+- **PerplexityBot** (Perplexity AI)
+- **Omgilibot**, **Diffbot**, **Bytespider**, **FacebookBot**, **ImagesiftBot**, **cohere-ai**
+
+**Note:** Regular search engine crawlers (Googlebot, Bingbot, etc.) are still allowed so your comic can be discovered through search.
+
+The robots.txt also includes a reference to your Terms of Service for transparency.
+
+#### HTML Meta Tags
+Every page includes meta tags that signal to AI scrapers not to use the content:
+
+```html
+<meta name="robots" content="noai, noimageai">
+<meta name="googlebot" content="noai, noimageai">
+```
+
+- `noai` - Prevents AI training on text content
+- `noimageai` - Prevents AI training on images (your comics)
+
+#### Terms of Service
+A comprehensive Terms of Service page at `/terms` legally prohibits:
+- Using content for AI training or machine learning
+- Scraping or harvesting content for datasets
+- Creating derivative works using AI trained on your content
+- Text and Data Mining (TDM) without permission
+
+The Terms page is automatically linked in your footer and includes:
+- Copyright protection assertions
+- DMCA enforcement information
+- TDM rights reservation (EU Directive 2019/790 Article 4)
+- Clear permitted use guidelines
+
+#### HTTP Headers
+Sunday Comics automatically adds `X-Robots-Tag: noai, noimageai` headers to all responses for additional AI blocking enforcement.
+
+#### TDM Reservation File
+The `/tdmrep.json` endpoint formally reserves Text and Data Mining rights under EU Directive 2019/790, pointing to your Terms of Service.
+
+### Advanced: Image-Level Protection Tools
+
+For artists who want to protect their work at the image level, consider these specialized tools:
+
+#### Glaze (Style Protection)
+**What it does:** Adds imperceptible changes to images that prevent AI models from accurately learning your artistic style.
+
+**Best for:**
+- Protecting your unique art style from being copied by AI
+- Making AI-generated imitations look wrong or distorted
+- Artists concerned about style mimicry (e.g., "draw like [artist name]" prompts)
+
+**How to use:**
+1. Download from [glaze.cs.uchicago.edu](https://glaze.cs.uchicago.edu)
+2. Process your comic images before uploading to your site
+3. The changes are invisible to humans but confuse AI models
+
+**Trade-offs:**
+- Processing time: Can take several minutes per image
+- Slight file size increase
+- Requires reprocessing all comics
+
+#### Nightshade (Data Poisoning)
+**What it does:** Makes images appear as something completely different to AI models while looking normal to humans.
+
+**Best for:**
+- Active defense against unauthorized AI training
+- Making scraped data actively harmful to AI models
+- Artists who want to fight back against scraping
+
+**How to use:**
+1. Download from [nightshade.cs.uchicago.edu](https://nightshade.cs.uchicago.edu)
+2. Process images before uploading (can combine with Glaze)
+3. AI models trained on these images will produce incorrect results
+
+**Trade-offs:**
+- More aggressive than Glaze (may violate some ToS)
+- Processing time similar to Glaze
+- Ongoing research tool, effectiveness may vary
+
+#### Recommendations
+- **Use Glaze if:** You want passive protection for your art style
+- **Use Nightshade if:** You want active defense and accept the risks
+- **Use both if:** Maximum protection is your priority
+- **Combine with Sunday Comics protections:** These tools complement the web-based protections (robots.txt, meta tags, etc.)
+
+**Note:** Both tools are free, open-source projects from the University of Chicago's SAND Lab, specifically designed to help artists protect their work from AI exploitation.
+
+### Important Limitations
+
+**These measures are voluntary** - they only work if AI companies respect them:
+
+✅ **What this does:**
+- Signals your intent to protect your content
+- Provides legal grounding for DMCA takedowns
+- Blocks responsible AI companies that honor robots.txt
+- Makes your copyright stance clear to users and crawlers
+
+❌ **What this doesn't do:**
+- Cannot physically prevent determined bad actors from scraping
+- Cannot remove already-scraped historical data from existing datasets
+- No guarantee all AI companies will honor these signals
+
+**Companies that claim to honor robots.txt:**
+- OpenAI (GPTBot blocking)
+- Anthropic (anthropic-ai blocking)
+- Google (Google-Extended blocking, separate from search)
+
+### Customizing Your Terms
+
+Edit `/Users/pori/PycharmProjects/sunday/content/terms.md` to customize:
+
+1. **Jurisdiction** - Add your country/state for legal clarity
+2. **Permitted use** - Adjust what you allow (fan art, sharing, etc.)
+3. **Contact info** - Automatically populated from `comics_data.py`
+
+The Terms page uses Jinja2 template variables that pull from your configuration:
+- `{{ copyright_name }}` - From `COPYRIGHT_NAME` in `comics_data.py`
+- `{{ social_email }}` - From `SOCIAL_EMAIL` in `comics_data.py`
+
+### Testing Your Protection
+
+**Verify robots.txt:**
+```bash
+curl https://yourcomic.com/robots.txt
+```
+
+You should see AI bot blocks and a link to your terms.
+
+**Check meta tags:**
+View page source and look for:
+```html
+<meta name="robots" content="noai, noimageai">
+```
+
+**Validate Terms page:**
+Visit `https://yourcomic.com/terms` to ensure it renders correctly.
+
+### Reporting Violations
+
+If you discover your work in an AI training dataset or being used without permission:
+
+1. **Document the violation** - Screenshots, URLs, timestamps
+2. **Review their TOS** - Many AI services have content dispute processes
+3. **Send DMCA takedown** - Your Terms of Service provides legal standing
+4. **Contact the platform** - Use your `SOCIAL_EMAIL` from the Terms page
+
+Resources:
+- [US Copyright Office DMCA](https://www.copyright.gov/dmca/)
+- [EU Copyright Directive](https://digital-strategy.ec.europa.eu/en/policies/copyright-legislation)
+
 ## Project Structure

 ```
--- a/app.py
+++ b/app.py
@@ -19,6 +19,13 @@ app = Flask(__name__)
 app.config['SECRET_KEY'] = os.environ.get('SECRET_KEY', 'your-secret-key')


+@app.after_request
+def add_ai_blocking_headers(response):
+    """Add headers to discourage AI scraping"""
+    response.headers['X-Robots-Tag'] = 'noai, noimageai'
+    return response
+
+
@app.context_processor
 def inject_global_settings():
    """Make global settings available to all templates"""
@@ -217,6 +224,28 @@ def about():
    return render_template('page.html', title='About', content=html_content)


+@app.route('/terms')
+def terms():
+    """Terms of Service page"""
+    from jinja2 import Template
+    # Read and render the markdown file with template variables
+    terms_path = os.path.join(os.path.dirname(__file__), 'content', 'terms.md')
+    try:
+        with open(terms_path, 'r', encoding='utf-8') as f:
+            content = f.read()
+        # First render as Jinja template to substitute variables
+        template = Template(content)
+        rendered_content = template.render(
+            copyright_name=COPYRIGHT_NAME,
+            social_email=SOCIAL_EMAIL if SOCIAL_EMAIL else '[Contact Email]'
+        )
+        # Then convert markdown to HTML
+        html_content = markdown.markdown(rendered_content)
+    except FileNotFoundError:
+        html_content = '<p>Terms of Service content not found.</p>'
+    return render_template('page.html', title='Terms of Service', content=html_content)
+
+
@app.route('/api/comics')
 def api_comics():
    """API endpoint - returns all comics as JSON"""
@@ -244,6 +273,9 @@ def robots():
    """Generate robots.txt dynamically with correct SITE_URL"""
    from flask import Response
    robots_txt = f"""# Sunday Comics - Robots.txt
+# Content protected by copyright. AI training prohibited.
+# See terms: {SITE_URL}/terms
+
 User-agent: *
 Allow: /

@@ -252,10 +284,61 @@ Sitemap: {SITE_URL}/sitemap.xml

 # Disallow API endpoints from indexing
 Disallow: /api/
+
+# Block AI crawlers and scrapers
+User-agent: GPTBot
+Disallow: /
+
+User-agent: ChatGPT-User
+Disallow: /
+
+User-agent: CCBot
+Disallow: /
+
+User-agent: anthropic-ai
+Disallow: /
+
+User-agent: Claude-Web
+Disallow: /
+
+User-agent: Google-Extended
+Disallow: /
+
+User-agent: PerplexityBot
+Disallow: /
+
+User-agent: Omgilibot
+Disallow: /
+
+User-agent: Diffbot
+Disallow: /
+
+User-agent: Bytespider
+Disallow: /
+
+User-agent: FacebookBot
+Disallow: /
+
+User-agent: ImagesiftBot
+Disallow: /
+
+User-agent: cohere-ai
+Disallow: /
 """
    return Response(robots_txt, mimetype='text/plain')


+@app.route('/tdmrep.json')
+def tdm_reservation():
+    """TDM (Text and Data Mining) reservation - signals AI training prohibition"""
+    return jsonify({
+        "tdm": {
+            "reservation": 1,
+            "policy": f"{SITE_URL}/terms"
+        }
+    })
+
+
@app.errorhandler(404)
 def page_not_found(e):
    """404 error handler"""
--- a/content/terms.md
+++ b/content/terms.md
@@ -0,0 +1,93 @@
+# Terms of Service
+
+**Last Updated:** January 2025
+
+By accessing and using this website, you agree to be bound by these Terms of Service. If you do not agree to these terms, please do not use this site.
+
+## Copyright and Ownership
+
+All comics, artwork, text, graphics, and other content on this website are protected by copyright and owned by {{ copyright_name }}. All rights reserved.
+
+## Permitted Use
+
+**Personal Use:** You may:
+- Read and enjoy the comics for personal, non-commercial purposes
+- Share links to individual comic pages on social media
+- Embed comics on personal websites with proper attribution and a link back to the original
+
+**Attribution Required:** When sharing or embedding, you must:
+- Provide clear credit to {{ copyright_name }}
+- Include a link back to this website
+- Not alter, crop, or modify the comic images
+
+## Prohibited Use
+
+You are **expressly prohibited** from:
+
+### AI Training and Machine Learning
+- Using any content from this site for training artificial intelligence models
+- Scraping, crawling, or harvesting content for machine learning purposes
+- Including any images, text, or data in AI training datasets
+- Using content to develop, train, or improve generative AI systems
+- Creating derivative works using AI trained on this content
+
+### Commercial Use
+- Reproducing, distributing, or selling comics without explicit written permission
+- Using comics or artwork for commercial purposes without a license
+- Printing comics on merchandise (t-shirts, mugs, etc.) without authorization
+
+### Modification and Redistribution
+- Altering, editing, or creating derivative works from the comics
+- Removing watermarks, signatures, or attribution
+- Rehosting images on other servers or websites
+- Claiming comics as your own work
+
+## Data Mining and Web Scraping
+
+**Automated Access Prohibition:** Automated scraping, crawling, or systematic downloading of content is strictly prohibited without prior written consent. This includes but is not limited to:
+- Web scrapers and bots (except authorized search engines)
+- Automated downloads of images or data
+- RSS feed abuse or bulk downloading
+- Any form of data harvesting for commercial purposes
+
+**Text and Data Mining (TDM) Reservation:** We formally reserve all rights under applicable copyright law regarding text and data mining, including but not limited to EU Directive 2019/790 Article 4. No TDM exceptions apply to this content.
+
+## DMCA and Copyright Enforcement
+
+Unauthorized use of copyrighted material from this site may violate copyright law and be subject to legal action under the Digital Millennium Copyright Act (DMCA) and other applicable laws.
+
+If you discover unauthorized use of content from this site, please report it to {{ social_email }}.
+
+## Fair Use
+
+Limited use for purposes of commentary, criticism, news reporting, teaching, or research may qualify as fair use. If you believe your use qualifies as fair use, please contact us first.
+
+## License Requests
+
+If you wish to use content in ways not permitted by these terms, please contact us to discuss licensing arrangements.
+
+## Privacy
+
+We respect your privacy. This site may use cookies for basic functionality and analytics. We do not sell personal information to third parties.
+
+## External Links
+
+This site may contain links to external websites. We are not responsible for the content or practices of third-party sites.
+
+## Modifications to Terms
+
+We reserve the right to modify these Terms of Service at any time. Changes will be posted on this page with an updated "Last Updated" date.
+
+## Contact
+
+For questions about these terms, licensing requests, or to report copyright violations:
+
+{{ social_email }}
+
+## Governing Law
+
+These Terms of Service are governed by applicable copyright law and the laws of [Your Jurisdiction].
+
+---
+
+**Summary:** You can read and share links to comics, but you cannot use them for AI training, scrape the site, use them commercially, or create modified versions without permission.
--- a/static/css/style.css
+++ b/static/css/style.css
@@ -754,7 +754,8 @@ main {
        gap: var(--space-sm);
    }

-    .footer-bottom p {
+    .footer-bottom p,
+    .footer-terms {
        flex-basis: 100%;
        text-align: center;
    }
@@ -963,6 +964,18 @@ footer {
    text-decoration: underline;
 }

+.footer-terms {
+    color: var(--color-text);
+    text-decoration: none;
+    font-size: var(--font-size-md);
+    transition: opacity 0.2s ease;
+}
+
+.footer-terms:hover {
+    text-decoration: underline;
+    opacity: 0.8;
+}
+
 /* Compact Footer Mode */
 footer.compact-footer {
    border-top: none;
--- a/templates/base.html
+++ b/templates/base.html
@@ -9,6 +9,10 @@
    <meta name="description" content="{% block meta_description %}A webcomic about life, the universe, and everything{% endblock %}">
    <link rel="canonical" href="{% block canonical %}{{ site_url }}{{ request.path }}{% endblock %}">

+    <!-- AI Scraping Prevention -->
+    <meta name="robots" content="noai, noimageai">
+    <meta name="googlebot" content="noai, noimageai">
+
    <!-- Open Graph / Facebook -->
    <meta property="og:type" content="website">
    <meta property="og:url" content="{% block meta_url %}{{ site_url }}{{ request.path }}{% endblock %}">
@@ -164,6 +168,8 @@
            <div class="footer-bottom">
                <p>&copy; {{ current_year }} {{ copyright_name }}. All rights reserved.</p>
                <span class="footer-divider" aria-hidden="true">|</span>
+                <a href="{{ url_for('terms') }}" class="footer-terms">Terms of Service</a>
+                <span class="footer-divider" aria-hidden="true">|</span>
                <div class="site-credit">
                    <a href="https://git.puercito.net/mi/sunday" target="_blank" rel="noopener noreferrer" aria-label="Sunday Comics - Webcomic platform">
                        <img src="{{ url_for('static', filename='images/sunday.jpg') }}" alt="Sunday Comics" class="credit-image">
Author	SHA1	Message	Date
mi	418ba6e4ba	❎ moar anti-AI measures	2025-11-15 15:49:34 +10:00
mi	14415dfcd2	❎ anti-AI measures	2025-11-15 15:43:32 +10:00