From 14415dfcd2b6f16f79c70dcf5733c839bbae77ba Mon Sep 17 00:00:00 2001
From: mi <hola@puercito.net>
Date: Sat, 15 Nov 2025 15:43:32 +1000
Subject: [PATCH] :negative_squared_cross_mark: anti-AI measures

---
 README.md            | 144 +++++++++++++++++++++++++++++++++++++++++++
 app.py               |  65 +++++++++++++++++++
 content/terms.md     |  93 ++++++++++++++++++++++++++++
 static/css/style.css |  15 ++++-
 templates/base.html  |   6 ++
 5 files changed, 322 insertions(+), 1 deletion(-)
 create mode 100644 content/terms.md
diff --git a/README.md b/README.md
index 980b459..1eb43e6 100644
--- a/README.md
+++ b/README.md
@@ -24,6 +24,13 @@ A Flask-based webcomic website with server-side rendering using Jinja2 templates
   - [SEO Best Practices for Webcomics](#seo-best-practices-for-webcomics)
   - [SEO Checklist for Launch](#seo-checklist-for-launch)
   - [Common SEO Questions](#common-seo-questions)
+- [Content Protection & AI Scraping Prevention](#content-protection--ai-scraping-prevention)
+  - [Protection Features](#protection-features)
+  - [Optional: Additional Protection Measures](#optional-additional-protection-measures)
+  - [Important Limitations](#important-limitations)
+  - [Customizing Your Terms](#customizing-your-terms)
+  - [Testing Your Protection](#testing-your-protection)
+  - [Reporting Violations](#reporting-violations)
 - [Project Structure](#project-structure)
 - [Setup](#setup)
 - [Environment Variables](#environment-variables)
@@ -457,6 +464,143 @@ A: Hashtags don't directly affect search engine SEO, but they help social media
 **Q: Should I create a blog for my comic?**
 A: Optional, but regular blog content about your comic's development can improve SEO through fresh content and more keywords.
 
+## Content Protection & AI Scraping Prevention
+
+Sunday Comics includes built-in measures to discourage AI web scrapers from using your creative work for training machine learning models without permission.
+
+### Protection Features
+
+#### robots.txt Blocking
+The dynamically generated `robots.txt` file blocks known AI crawlers while still allowing legitimate search engines:
+
+**Blocked AI bots:**
+- **GPTBot** & **ChatGPT-User** (OpenAI)
+- **CCBot** (Common Crawl - used by many AI companies)
+- **anthropic-ai** & **Claude-Web** (Anthropic)
+- **Google-Extended** (Google's AI training crawler, separate from Googlebot)
+- **PerplexityBot** (Perplexity AI)
+- **Omgilibot**, **Diffbot**, **Bytespider**, **FacebookBot**, **ImagesiftBot**, **cohere-ai**
+
+**Note:** Regular search engine crawlers (Googlebot, Bingbot, etc.) are still allowed so your comic can be discovered through search.
+
+The robots.txt also includes a reference to your Terms of Service for transparency.
+
+#### HTML Meta Tags
+Every page includes meta tags that signal to AI scrapers not to use the content:
+
+```html
+<meta name="robots" content="noai, noimageai">
+<meta name="googlebot" content="noai, noimageai">
+```
+
+- `noai` - Prevents AI training on text content
+- `noimageai` - Prevents AI training on images (your comics)
+
+#### Terms of Service
+A comprehensive Terms of Service page at `/terms` legally prohibits:
+- Using content for AI training or machine learning
+- Scraping or harvesting content for datasets
+- Creating derivative works using AI trained on your content
+- Text and Data Mining (TDM) without permission
+
+The Terms page is automatically linked in your footer and includes:
+- Copyright protection assertions
+- DMCA enforcement information
+- TDM rights reservation (EU Directive 2019/790 Article 4)
+- Clear permitted use guidelines
+
+### Optional: Additional Protection Measures
+
+#### HTTP Headers (Advanced)
+For stronger enforcement, you can add HTTP headers. Add this to `app.py` after the imports:
+
+```python
+@app.after_request
+def add_ai_blocking_headers(response):
+    """Add headers to discourage AI scraping"""
+    response.headers['X-Robots-Tag'] = 'noai, noimageai'
+    return response
+```
+
+#### TDM Reservation File (Advanced)
+Create a `/tdmrep.json` endpoint to formally reserve Text and Data Mining rights:
+
+```python
+@app.route('/tdmrep.json')
+def tdm_reservation():
+    """TDM (Text and Data Mining) reservation"""
+    from flask import jsonify
+    return jsonify({
+        "tdm": {
+            "reservation": 1,
+            "policy": f"{SITE_URL}/terms"
+        }
+    })
+```
+
+### Important Limitations
+
+**These measures are voluntary** - they only work if AI companies respect them:
+
+✅ **What this does:**
+- Signals your intent to protect your content
+- Provides legal grounding for DMCA takedowns
+- Blocks responsible AI companies that honor robots.txt
+- Makes your copyright stance clear to users and crawlers
+
+❌ **What this doesn't do:**
+- Cannot physically prevent determined bad actors from scraping
+- Cannot remove already-scraped historical data from existing datasets
+- No guarantee all AI companies will honor these signals
+
+**Companies that claim to honor robots.txt:**
+- OpenAI (GPTBot blocking)
+- Anthropic (anthropic-ai blocking)
+- Google (Google-Extended blocking, separate from search)
+
+### Customizing Your Terms
+
+Edit `/Users/pori/PycharmProjects/sunday/content/terms.md` to customize:
+
+1. **Jurisdiction** - Add your country/state for legal clarity
+2. **Permitted use** - Adjust what you allow (fan art, sharing, etc.)
+3. **Contact info** - Automatically populated from `comics_data.py`
+
+The Terms page uses Jinja2 template variables that pull from your configuration:
+- `{{ copyright_name }}` - From `COPYRIGHT_NAME` in `comics_data.py`
+- `{{ social_email }}` - From `SOCIAL_EMAIL` in `comics_data.py`
+
+### Testing Your Protection
+
+**Verify robots.txt:**
+```bash
+curl https://yourcomic.com/robots.txt
+```
+
+You should see AI bot blocks and a link to your terms.
+
+**Check meta tags:**
+View page source and look for:
+```html
+<meta name="robots" content="noai, noimageai">
+```
+
+**Validate Terms page:**
+Visit `https://yourcomic.com/terms` to ensure it renders correctly.
+
+### Reporting Violations
+
+If you discover your work in an AI training dataset or being used without permission:
+
+1. **Document the violation** - Screenshots, URLs, timestamps
+2. **Review their TOS** - Many AI services have content dispute processes
+3. **Send DMCA takedown** - Your Terms of Service provides legal standing
+4. **Contact the platform** - Use your `SOCIAL_EMAIL` from the Terms page
+
+Resources:
+- [US Copyright Office DMCA](https://www.copyright.gov/dmca/)
+- [EU Copyright Directive](https://digital-strategy.ec.europa.eu/en/policies/copyright-legislation)
+
 ## Project Structure
 
 ```
diff --git a/app.py b/app.py
index cb58073..060108a 100644
--- a/app.py
+++ b/app.py
@@ -217,6 +217,28 @@ def about():
     return render_template('page.html', title='About', content=html_content)
 
 
+@app.route('/terms')
+def terms():
+    """Terms of Service page"""
+    from jinja2 import Template
+    # Read and render the markdown file with template variables
+    terms_path = os.path.join(os.path.dirname(__file__), 'content', 'terms.md')
+    try:
+        with open(terms_path, 'r', encoding='utf-8') as f:
+            content = f.read()
+        # First render as Jinja template to substitute variables
+        template = Template(content)
+        rendered_content = template.render(
+            copyright_name=COPYRIGHT_NAME,
+            social_email=SOCIAL_EMAIL if SOCIAL_EMAIL else '[Contact Email]'
+        )
+        # Then convert markdown to HTML
+        html_content = markdown.markdown(rendered_content)
+    except FileNotFoundError:
+        html_content = '<p>Terms of Service content not found.</p>'
+    return render_template('page.html', title='Terms of Service', content=html_content)
+
+
 @app.route('/api/comics')
 def api_comics():
     """API endpoint - returns all comics as JSON"""
@@ -244,6 +266,9 @@ def robots():
     """Generate robots.txt dynamically with correct SITE_URL"""
     from flask import Response
     robots_txt = f"""# Sunday Comics - Robots.txt
+# Content protected by copyright. AI training prohibited.
+# See terms: {SITE_URL}/terms
+
 User-agent: *
 Allow: /
 
@@ -252,6 +277,46 @@ Sitemap: {SITE_URL}/sitemap.xml
 
 # Disallow API endpoints from indexing
 Disallow: /api/
+
+# Block AI crawlers and scrapers
+User-agent: GPTBot
+Disallow: /
+
+User-agent: ChatGPT-User
+Disallow: /
+
+User-agent: CCBot
+Disallow: /
+
+User-agent: anthropic-ai
+Disallow: /
+
+User-agent: Claude-Web
+Disallow: /
+
+User-agent: Google-Extended
+Disallow: /
+
+User-agent: PerplexityBot
+Disallow: /
+
+User-agent: Omgilibot
+Disallow: /
+
+User-agent: Diffbot
+Disallow: /
+
+User-agent: Bytespider
+Disallow: /
+
+User-agent: FacebookBot
+Disallow: /
+
+User-agent: ImagesiftBot
+Disallow: /
+
+User-agent: cohere-ai
+Disallow: /
 """
     return Response(robots_txt, mimetype='text/plain')
 
diff --git a/content/terms.md b/content/terms.md
new file mode 100644
index 0000000..959e417
--- /dev/null
+++ b/content/terms.md
@@ -0,0 +1,93 @@
+# Terms of Service
+
+**Last Updated:** January 2025
+
+By accessing and using this website, you agree to be bound by these Terms of Service. If you do not agree to these terms, please do not use this site.
+
+## Copyright and Ownership
+
+All comics, artwork, text, graphics, and other content on this website are protected by copyright and owned by {{ copyright_name }}. All rights reserved.
+
+## Permitted Use
+
+**Personal Use:** You may:
+- Read and enjoy the comics for personal, non-commercial purposes
+- Share links to individual comic pages on social media
+- Embed comics on personal websites with proper attribution and a link back to the original
+
+**Attribution Required:** When sharing or embedding, you must:
+- Provide clear credit to {{ copyright_name }}
+- Include a link back to this website
+- Not alter, crop, or modify the comic images
+
+## Prohibited Use
+
+You are **expressly prohibited** from:
+
+### AI Training and Machine Learning
+- Using any content from this site for training artificial intelligence models
+- Scraping, crawling, or harvesting content for machine learning purposes
+- Including any images, text, or data in AI training datasets
+- Using content to develop, train, or improve generative AI systems
+- Creating derivative works using AI trained on this content
+
+### Commercial Use
+- Reproducing, distributing, or selling comics without explicit written permission
+- Using comics or artwork for commercial purposes without a license
+- Printing comics on merchandise (t-shirts, mugs, etc.) without authorization
+
+### Modification and Redistribution
+- Altering, editing, or creating derivative works from the comics
+- Removing watermarks, signatures, or attribution
+- Rehosting images on other servers or websites
+- Claiming comics as your own work
+
+## Data Mining and Web Scraping
+
+**Automated Access Prohibition:** Automated scraping, crawling, or systematic downloading of content is strictly prohibited without prior written consent. This includes but is not limited to:
+- Web scrapers and bots (except authorized search engines)
+- Automated downloads of images or data
+- RSS feed abuse or bulk downloading
+- Any form of data harvesting for commercial purposes
+
+**Text and Data Mining (TDM) Reservation:** We formally reserve all rights under applicable copyright law regarding text and data mining, including but not limited to EU Directive 2019/790 Article 4. No TDM exceptions apply to this content.
+
+## DMCA and Copyright Enforcement
+
+Unauthorized use of copyrighted material from this site may violate copyright law and be subject to legal action under the Digital Millennium Copyright Act (DMCA) and other applicable laws.
+
+If you discover unauthorized use of content from this site, please report it to {{ social_email }}.
+
+## Fair Use
+
+Limited use for purposes of commentary, criticism, news reporting, teaching, or research may qualify as fair use. If you believe your use qualifies as fair use, please contact us first.
+
+## License Requests
+
+If you wish to use content in ways not permitted by these terms, please contact us to discuss licensing arrangements.
+
+## Privacy
+
+We respect your privacy. This site may use cookies for basic functionality and analytics. We do not sell personal information to third parties.
+
+## External Links
+
+This site may contain links to external websites. We are not responsible for the content or practices of third-party sites.
+
+## Modifications to Terms
+
+We reserve the right to modify these Terms of Service at any time. Changes will be posted on this page with an updated "Last Updated" date.
+
+## Contact
+
+For questions about these terms, licensing requests, or to report copyright violations:
+
+{{ social_email }}
+
+## Governing Law
+
+These Terms of Service are governed by applicable copyright law and the laws of [Your Jurisdiction].
+
+---
+
+**Summary:** You can read and share links to comics, but you cannot use them for AI training, scrape the site, use them commercially, or create modified versions without permission.
diff --git a/static/css/style.css b/static/css/style.css
index 3b3e9f4..a7dd522 100644
--- a/static/css/style.css
+++ b/static/css/style.css
@@ -754,7 +754,8 @@ main {
         gap: var(--space-sm);
     }
 
-    .footer-bottom p {
+    .footer-bottom p,
+    .footer-terms {
         flex-basis: 100%;
         text-align: center;
     }
@@ -963,6 +964,18 @@ footer {
     text-decoration: underline;
 }
 
+.footer-terms {
+    color: var(--color-text);
+    text-decoration: none;
+    font-size: var(--font-size-md);
+    transition: opacity 0.2s ease;
+}
+
+.footer-terms:hover {
+    text-decoration: underline;
+    opacity: 0.8;
+}
+
 /* Compact Footer Mode */
 footer.compact-footer {
     border-top: none;
diff --git a/templates/base.html b/templates/base.html
index 5ca7854..4079964 100644
--- a/templates/base.html
+++ b/templates/base.html
@@ -9,6 +9,10 @@
     <meta name="description" content="{% block meta_description %}A webcomic about life, the universe, and everything{% endblock %}">
     <link rel="canonical" href="{% block canonical %}{{ site_url }}{{ request.path }}{% endblock %}">
 
+    <!-- AI Scraping Prevention -->
+    <meta name="robots" content="noai, noimageai">
+    <meta name="googlebot" content="noai, noimageai">
+
     <!-- Open Graph / Facebook -->
     <meta property="og:type" content="website">
     <meta property="og:url" content="{% block meta_url %}{{ site_url }}{{ request.path }}{% endblock %}">
@@ -164,6 +168,8 @@
             <div class="footer-bottom">
                 <p>&copy; {{ current_year }} {{ copyright_name }}. All rights reserved.</p>
                 <span class="footer-divider" aria-hidden="true">|</span>
+                <a href="{{ url_for('terms') }}" class="footer-terms">Terms of Service</a>
+                <span class="footer-divider" aria-hidden="true">|</span>
                 <div class="site-credit">
                     <a href="https://git.puercito.net/mi/sunday" target="_blank" rel="noopener noreferrer" aria-label="Sunday Comics - Webcomic platform">
                         <img src="{{ url_for('static', filename='images/sunday.jpg') }}" alt="Sunday Comics" class="credit-image">