❎ anti-AI measures
This commit is contained in:
144
README.md
144
README.md
@@ -24,6 +24,13 @@ A Flask-based webcomic website with server-side rendering using Jinja2 templates
|
|||||||
- [SEO Best Practices for Webcomics](#seo-best-practices-for-webcomics)
|
- [SEO Best Practices for Webcomics](#seo-best-practices-for-webcomics)
|
||||||
- [SEO Checklist for Launch](#seo-checklist-for-launch)
|
- [SEO Checklist for Launch](#seo-checklist-for-launch)
|
||||||
- [Common SEO Questions](#common-seo-questions)
|
- [Common SEO Questions](#common-seo-questions)
|
||||||
|
- [Content Protection & AI Scraping Prevention](#content-protection--ai-scraping-prevention)
|
||||||
|
- [Protection Features](#protection-features)
|
||||||
|
- [Optional: Additional Protection Measures](#optional-additional-protection-measures)
|
||||||
|
- [Important Limitations](#important-limitations)
|
||||||
|
- [Customizing Your Terms](#customizing-your-terms)
|
||||||
|
- [Testing Your Protection](#testing-your-protection)
|
||||||
|
- [Reporting Violations](#reporting-violations)
|
||||||
- [Project Structure](#project-structure)
|
- [Project Structure](#project-structure)
|
||||||
- [Setup](#setup)
|
- [Setup](#setup)
|
||||||
- [Environment Variables](#environment-variables)
|
- [Environment Variables](#environment-variables)
|
||||||
@@ -457,6 +464,143 @@ A: Hashtags don't directly affect search engine SEO, but they help social media
|
|||||||
**Q: Should I create a blog for my comic?**
|
**Q: Should I create a blog for my comic?**
|
||||||
A: Optional, but regular blog content about your comic's development can improve SEO through fresh content and more keywords.
|
A: Optional, but regular blog content about your comic's development can improve SEO through fresh content and more keywords.
|
||||||
|
|
||||||
|
## Content Protection & AI Scraping Prevention
|
||||||
|
|
||||||
|
Sunday Comics includes built-in measures to discourage AI web scrapers from using your creative work for training machine learning models without permission.
|
||||||
|
|
||||||
|
### Protection Features
|
||||||
|
|
||||||
|
#### robots.txt Blocking
|
||||||
|
The dynamically generated `robots.txt` file blocks known AI crawlers while still allowing legitimate search engines:
|
||||||
|
|
||||||
|
**Blocked AI bots:**
|
||||||
|
- **GPTBot** & **ChatGPT-User** (OpenAI)
|
||||||
|
- **CCBot** (Common Crawl - used by many AI companies)
|
||||||
|
- **anthropic-ai** & **Claude-Web** (Anthropic)
|
||||||
|
- **Google-Extended** (Google's AI training crawler, separate from Googlebot)
|
||||||
|
- **PerplexityBot** (Perplexity AI)
|
||||||
|
- **Omgilibot**, **Diffbot**, **Bytespider**, **FacebookBot**, **ImagesiftBot**, **cohere-ai**
|
||||||
|
|
||||||
|
**Note:** Regular search engine crawlers (Googlebot, Bingbot, etc.) are still allowed so your comic can be discovered through search.
|
||||||
|
|
||||||
|
The robots.txt also includes a reference to your Terms of Service for transparency.
|
||||||
|
|
||||||
|
#### HTML Meta Tags
|
||||||
|
Every page includes meta tags that signal to AI scrapers not to use the content:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<meta name="robots" content="noai, noimageai">
|
||||||
|
<meta name="googlebot" content="noai, noimageai">
|
||||||
|
```
|
||||||
|
|
||||||
|
- `noai` - Prevents AI training on text content
|
||||||
|
- `noimageai` - Prevents AI training on images (your comics)
|
||||||
|
|
||||||
|
#### Terms of Service
|
||||||
|
A comprehensive Terms of Service page at `/terms` legally prohibits:
|
||||||
|
- Using content for AI training or machine learning
|
||||||
|
- Scraping or harvesting content for datasets
|
||||||
|
- Creating derivative works using AI trained on your content
|
||||||
|
- Text and Data Mining (TDM) without permission
|
||||||
|
|
||||||
|
The Terms page is automatically linked in your footer and includes:
|
||||||
|
- Copyright protection assertions
|
||||||
|
- DMCA enforcement information
|
||||||
|
- TDM rights reservation (EU Directive 2019/790 Article 4)
|
||||||
|
- Clear permitted use guidelines
|
||||||
|
|
||||||
|
### Optional: Additional Protection Measures
|
||||||
|
|
||||||
|
#### HTTP Headers (Advanced)
|
||||||
|
For stronger enforcement, you can add HTTP headers. Add this to `app.py` after the imports:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.after_request
|
||||||
|
def add_ai_blocking_headers(response):
|
||||||
|
"""Add headers to discourage AI scraping"""
|
||||||
|
response.headers['X-Robots-Tag'] = 'noai, noimageai'
|
||||||
|
return response
|
||||||
|
```
|
||||||
|
|
||||||
|
#### TDM Reservation File (Advanced)
|
||||||
|
Create a `/tdmrep.json` endpoint to formally reserve Text and Data Mining rights:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.route('/tdmrep.json')
|
||||||
|
def tdm_reservation():
|
||||||
|
"""TDM (Text and Data Mining) reservation"""
|
||||||
|
from flask import jsonify
|
||||||
|
return jsonify({
|
||||||
|
"tdm": {
|
||||||
|
"reservation": 1,
|
||||||
|
"policy": f"{SITE_URL}/terms"
|
||||||
|
}
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
### Important Limitations
|
||||||
|
|
||||||
|
**These measures are voluntary** - they only work if AI companies respect them:
|
||||||
|
|
||||||
|
✅ **What this does:**
|
||||||
|
- Signals your intent to protect your content
|
||||||
|
- Provides legal grounding for DMCA takedowns
|
||||||
|
- Blocks responsible AI companies that honor robots.txt
|
||||||
|
- Makes your copyright stance clear to users and crawlers
|
||||||
|
|
||||||
|
❌ **What this doesn't do:**
|
||||||
|
- Cannot physically prevent determined bad actors from scraping
|
||||||
|
- Cannot remove already-scraped historical data from existing datasets
|
||||||
|
- No guarantee all AI companies will honor these signals
|
||||||
|
|
||||||
|
**Companies that claim to honor robots.txt:**
|
||||||
|
- OpenAI (GPTBot blocking)
|
||||||
|
- Anthropic (anthropic-ai blocking)
|
||||||
|
- Google (Google-Extended blocking, separate from search)
|
||||||
|
|
||||||
|
### Customizing Your Terms
|
||||||
|
|
||||||
|
Edit `/Users/pori/PycharmProjects/sunday/content/terms.md` to customize:
|
||||||
|
|
||||||
|
1. **Jurisdiction** - Add your country/state for legal clarity
|
||||||
|
2. **Permitted use** - Adjust what you allow (fan art, sharing, etc.)
|
||||||
|
3. **Contact info** - Automatically populated from `comics_data.py`
|
||||||
|
|
||||||
|
The Terms page uses Jinja2 template variables that pull from your configuration:
|
||||||
|
- `{{ copyright_name }}` - From `COPYRIGHT_NAME` in `comics_data.py`
|
||||||
|
- `{{ social_email }}` - From `SOCIAL_EMAIL` in `comics_data.py`
|
||||||
|
|
||||||
|
### Testing Your Protection
|
||||||
|
|
||||||
|
**Verify robots.txt:**
|
||||||
|
```bash
|
||||||
|
curl https://yourcomic.com/robots.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see AI bot blocks and a link to your terms.
|
||||||
|
|
||||||
|
**Check meta tags:**
|
||||||
|
View page source and look for:
|
||||||
|
```html
|
||||||
|
<meta name="robots" content="noai, noimageai">
|
||||||
|
```
|
||||||
|
|
||||||
|
**Validate Terms page:**
|
||||||
|
Visit `https://yourcomic.com/terms` to ensure it renders correctly.
|
||||||
|
|
||||||
|
### Reporting Violations
|
||||||
|
|
||||||
|
If you discover your work in an AI training dataset or being used without permission:
|
||||||
|
|
||||||
|
1. **Document the violation** - Screenshots, URLs, timestamps
|
||||||
|
2. **Review their TOS** - Many AI services have content dispute processes
|
||||||
|
3. **Send DMCA takedown** - Your Terms of Service provides legal standing
|
||||||
|
4. **Contact the platform** - Use your `SOCIAL_EMAIL` from the Terms page
|
||||||
|
|
||||||
|
Resources:
|
||||||
|
- [US Copyright Office DMCA](https://www.copyright.gov/dmca/)
|
||||||
|
- [EU Copyright Directive](https://digital-strategy.ec.europa.eu/en/policies/copyright-legislation)
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|||||||
65
app.py
65
app.py
@@ -217,6 +217,28 @@ def about():
|
|||||||
return render_template('page.html', title='About', content=html_content)
|
return render_template('page.html', title='About', content=html_content)
|
||||||
|
|
||||||
|
|
||||||
|
@app.route('/terms')
|
||||||
|
def terms():
|
||||||
|
"""Terms of Service page"""
|
||||||
|
from jinja2 import Template
|
||||||
|
# Read and render the markdown file with template variables
|
||||||
|
terms_path = os.path.join(os.path.dirname(__file__), 'content', 'terms.md')
|
||||||
|
try:
|
||||||
|
with open(terms_path, 'r', encoding='utf-8') as f:
|
||||||
|
content = f.read()
|
||||||
|
# First render as Jinja template to substitute variables
|
||||||
|
template = Template(content)
|
||||||
|
rendered_content = template.render(
|
||||||
|
copyright_name=COPYRIGHT_NAME,
|
||||||
|
social_email=SOCIAL_EMAIL if SOCIAL_EMAIL else '[Contact Email]'
|
||||||
|
)
|
||||||
|
# Then convert markdown to HTML
|
||||||
|
html_content = markdown.markdown(rendered_content)
|
||||||
|
except FileNotFoundError:
|
||||||
|
html_content = '<p>Terms of Service content not found.</p>'
|
||||||
|
return render_template('page.html', title='Terms of Service', content=html_content)
|
||||||
|
|
||||||
|
|
||||||
@app.route('/api/comics')
|
@app.route('/api/comics')
|
||||||
def api_comics():
|
def api_comics():
|
||||||
"""API endpoint - returns all comics as JSON"""
|
"""API endpoint - returns all comics as JSON"""
|
||||||
@@ -244,6 +266,9 @@ def robots():
|
|||||||
"""Generate robots.txt dynamically with correct SITE_URL"""
|
"""Generate robots.txt dynamically with correct SITE_URL"""
|
||||||
from flask import Response
|
from flask import Response
|
||||||
robots_txt = f"""# Sunday Comics - Robots.txt
|
robots_txt = f"""# Sunday Comics - Robots.txt
|
||||||
|
# Content protected by copyright. AI training prohibited.
|
||||||
|
# See terms: {SITE_URL}/terms
|
||||||
|
|
||||||
User-agent: *
|
User-agent: *
|
||||||
Allow: /
|
Allow: /
|
||||||
|
|
||||||
@@ -252,6 +277,46 @@ Sitemap: {SITE_URL}/sitemap.xml
|
|||||||
|
|
||||||
# Disallow API endpoints from indexing
|
# Disallow API endpoints from indexing
|
||||||
Disallow: /api/
|
Disallow: /api/
|
||||||
|
|
||||||
|
# Block AI crawlers and scrapers
|
||||||
|
User-agent: GPTBot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: ChatGPT-User
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: CCBot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: anthropic-ai
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: Claude-Web
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: Google-Extended
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: PerplexityBot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: Omgilibot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: Diffbot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: Bytespider
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: FacebookBot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: ImagesiftBot
|
||||||
|
Disallow: /
|
||||||
|
|
||||||
|
User-agent: cohere-ai
|
||||||
|
Disallow: /
|
||||||
"""
|
"""
|
||||||
return Response(robots_txt, mimetype='text/plain')
|
return Response(robots_txt, mimetype='text/plain')
|
||||||
|
|
||||||
|
|||||||
93
content/terms.md
Normal file
93
content/terms.md
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
# Terms of Service
|
||||||
|
|
||||||
|
**Last Updated:** January 2025
|
||||||
|
|
||||||
|
By accessing and using this website, you agree to be bound by these Terms of Service. If you do not agree to these terms, please do not use this site.
|
||||||
|
|
||||||
|
## Copyright and Ownership
|
||||||
|
|
||||||
|
All comics, artwork, text, graphics, and other content on this website are protected by copyright and owned by {{ copyright_name }}. All rights reserved.
|
||||||
|
|
||||||
|
## Permitted Use
|
||||||
|
|
||||||
|
**Personal Use:** You may:
|
||||||
|
- Read and enjoy the comics for personal, non-commercial purposes
|
||||||
|
- Share links to individual comic pages on social media
|
||||||
|
- Embed comics on personal websites with proper attribution and a link back to the original
|
||||||
|
|
||||||
|
**Attribution Required:** When sharing or embedding, you must:
|
||||||
|
- Provide clear credit to {{ copyright_name }}
|
||||||
|
- Include a link back to this website
|
||||||
|
- Not alter, crop, or modify the comic images
|
||||||
|
|
||||||
|
## Prohibited Use
|
||||||
|
|
||||||
|
You are **expressly prohibited** from:
|
||||||
|
|
||||||
|
### AI Training and Machine Learning
|
||||||
|
- Using any content from this site for training artificial intelligence models
|
||||||
|
- Scraping, crawling, or harvesting content for machine learning purposes
|
||||||
|
- Including any images, text, or data in AI training datasets
|
||||||
|
- Using content to develop, train, or improve generative AI systems
|
||||||
|
- Creating derivative works using AI trained on this content
|
||||||
|
|
||||||
|
### Commercial Use
|
||||||
|
- Reproducing, distributing, or selling comics without explicit written permission
|
||||||
|
- Using comics or artwork for commercial purposes without a license
|
||||||
|
- Printing comics on merchandise (t-shirts, mugs, etc.) without authorization
|
||||||
|
|
||||||
|
### Modification and Redistribution
|
||||||
|
- Altering, editing, or creating derivative works from the comics
|
||||||
|
- Removing watermarks, signatures, or attribution
|
||||||
|
- Rehosting images on other servers or websites
|
||||||
|
- Claiming comics as your own work
|
||||||
|
|
||||||
|
## Data Mining and Web Scraping
|
||||||
|
|
||||||
|
**Automated Access Prohibition:** Automated scraping, crawling, or systematic downloading of content is strictly prohibited without prior written consent. This includes but is not limited to:
|
||||||
|
- Web scrapers and bots (except authorized search engines)
|
||||||
|
- Automated downloads of images or data
|
||||||
|
- RSS feed abuse or bulk downloading
|
||||||
|
- Any form of data harvesting for commercial purposes
|
||||||
|
|
||||||
|
**Text and Data Mining (TDM) Reservation:** We formally reserve all rights under applicable copyright law regarding text and data mining, including but not limited to EU Directive 2019/790 Article 4. No TDM exceptions apply to this content.
|
||||||
|
|
||||||
|
## DMCA and Copyright Enforcement
|
||||||
|
|
||||||
|
Unauthorized use of copyrighted material from this site may violate copyright law and be subject to legal action under the Digital Millennium Copyright Act (DMCA) and other applicable laws.
|
||||||
|
|
||||||
|
If you discover unauthorized use of content from this site, please report it to {{ social_email }}.
|
||||||
|
|
||||||
|
## Fair Use
|
||||||
|
|
||||||
|
Limited use for purposes of commentary, criticism, news reporting, teaching, or research may qualify as fair use. If you believe your use qualifies as fair use, please contact us first.
|
||||||
|
|
||||||
|
## License Requests
|
||||||
|
|
||||||
|
If you wish to use content in ways not permitted by these terms, please contact us to discuss licensing arrangements.
|
||||||
|
|
||||||
|
## Privacy
|
||||||
|
|
||||||
|
We respect your privacy. This site may use cookies for basic functionality and analytics. We do not sell personal information to third parties.
|
||||||
|
|
||||||
|
## External Links
|
||||||
|
|
||||||
|
This site may contain links to external websites. We are not responsible for the content or practices of third-party sites.
|
||||||
|
|
||||||
|
## Modifications to Terms
|
||||||
|
|
||||||
|
We reserve the right to modify these Terms of Service at any time. Changes will be posted on this page with an updated "Last Updated" date.
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
|
||||||
|
For questions about these terms, licensing requests, or to report copyright violations:
|
||||||
|
|
||||||
|
{{ social_email }}
|
||||||
|
|
||||||
|
## Governing Law
|
||||||
|
|
||||||
|
These Terms of Service are governed by applicable copyright law and the laws of [Your Jurisdiction].
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Summary:** You can read and share links to comics, but you cannot use them for AI training, scrape the site, use them commercially, or create modified versions without permission.
|
||||||
@@ -754,7 +754,8 @@ main {
|
|||||||
gap: var(--space-sm);
|
gap: var(--space-sm);
|
||||||
}
|
}
|
||||||
|
|
||||||
.footer-bottom p {
|
.footer-bottom p,
|
||||||
|
.footer-terms {
|
||||||
flex-basis: 100%;
|
flex-basis: 100%;
|
||||||
text-align: center;
|
text-align: center;
|
||||||
}
|
}
|
||||||
@@ -963,6 +964,18 @@ footer {
|
|||||||
text-decoration: underline;
|
text-decoration: underline;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.footer-terms {
|
||||||
|
color: var(--color-text);
|
||||||
|
text-decoration: none;
|
||||||
|
font-size: var(--font-size-md);
|
||||||
|
transition: opacity 0.2s ease;
|
||||||
|
}
|
||||||
|
|
||||||
|
.footer-terms:hover {
|
||||||
|
text-decoration: underline;
|
||||||
|
opacity: 0.8;
|
||||||
|
}
|
||||||
|
|
||||||
/* Compact Footer Mode */
|
/* Compact Footer Mode */
|
||||||
footer.compact-footer {
|
footer.compact-footer {
|
||||||
border-top: none;
|
border-top: none;
|
||||||
|
|||||||
@@ -9,6 +9,10 @@
|
|||||||
<meta name="description" content="{% block meta_description %}A webcomic about life, the universe, and everything{% endblock %}">
|
<meta name="description" content="{% block meta_description %}A webcomic about life, the universe, and everything{% endblock %}">
|
||||||
<link rel="canonical" href="{% block canonical %}{{ site_url }}{{ request.path }}{% endblock %}">
|
<link rel="canonical" href="{% block canonical %}{{ site_url }}{{ request.path }}{% endblock %}">
|
||||||
|
|
||||||
|
<!-- AI Scraping Prevention -->
|
||||||
|
<meta name="robots" content="noai, noimageai">
|
||||||
|
<meta name="googlebot" content="noai, noimageai">
|
||||||
|
|
||||||
<!-- Open Graph / Facebook -->
|
<!-- Open Graph / Facebook -->
|
||||||
<meta property="og:type" content="website">
|
<meta property="og:type" content="website">
|
||||||
<meta property="og:url" content="{% block meta_url %}{{ site_url }}{{ request.path }}{% endblock %}">
|
<meta property="og:url" content="{% block meta_url %}{{ site_url }}{{ request.path }}{% endblock %}">
|
||||||
@@ -164,6 +168,8 @@
|
|||||||
<div class="footer-bottom">
|
<div class="footer-bottom">
|
||||||
<p>© {{ current_year }} {{ copyright_name }}. All rights reserved.</p>
|
<p>© {{ current_year }} {{ copyright_name }}. All rights reserved.</p>
|
||||||
<span class="footer-divider" aria-hidden="true">|</span>
|
<span class="footer-divider" aria-hidden="true">|</span>
|
||||||
|
<a href="{{ url_for('terms') }}" class="footer-terms">Terms of Service</a>
|
||||||
|
<span class="footer-divider" aria-hidden="true">|</span>
|
||||||
<div class="site-credit">
|
<div class="site-credit">
|
||||||
<a href="https://git.puercito.net/mi/sunday" target="_blank" rel="noopener noreferrer" aria-label="Sunday Comics - Webcomic platform">
|
<a href="https://git.puercito.net/mi/sunday" target="_blank" rel="noopener noreferrer" aria-label="Sunday Comics - Webcomic platform">
|
||||||
<img src="{{ url_for('static', filename='images/sunday.jpg') }}" alt="Sunday Comics" class="credit-image">
|
<img src="{{ url_for('static', filename='images/sunday.jpg') }}" alt="Sunday Comics" class="credit-image">
|
||||||
|
|||||||
Reference in New Issue
Block a user