Compare commits
2 Commits
1dac042d25
...
418ba6e4ba
| Author | SHA1 | Date | |
|---|---|---|---|
| 418ba6e4ba | |||
| 14415dfcd2 |
169
README.md
169
README.md
@@ -24,6 +24,13 @@ A Flask-based webcomic website with server-side rendering using Jinja2 templates
|
||||
- [SEO Best Practices for Webcomics](#seo-best-practices-for-webcomics)
|
||||
- [SEO Checklist for Launch](#seo-checklist-for-launch)
|
||||
- [Common SEO Questions](#common-seo-questions)
|
||||
- [Content Protection & AI Scraping Prevention](#content-protection--ai-scraping-prevention)
|
||||
- [Protection Features](#protection-features)
|
||||
- [Advanced: Image-Level Protection Tools](#advanced-image-level-protection-tools)
|
||||
- [Important Limitations](#important-limitations)
|
||||
- [Customizing Your Terms](#customizing-your-terms)
|
||||
- [Testing Your Protection](#testing-your-protection)
|
||||
- [Reporting Violations](#reporting-violations)
|
||||
- [Project Structure](#project-structure)
|
||||
- [Setup](#setup)
|
||||
- [Environment Variables](#environment-variables)
|
||||
@@ -457,6 +464,168 @@ A: Hashtags don't directly affect search engine SEO, but they help social media
|
||||
**Q: Should I create a blog for my comic?**
|
||||
A: Optional, but regular blog content about your comic's development can improve SEO through fresh content and more keywords.
|
||||
|
||||
## Content Protection & AI Scraping Prevention
|
||||
|
||||
Sunday Comics includes built-in measures to discourage AI web scrapers from using your creative work for training machine learning models without permission.
|
||||
|
||||
### Protection Features
|
||||
|
||||
#### robots.txt Blocking
|
||||
The dynamically generated `robots.txt` file blocks known AI crawlers while still allowing legitimate search engines:
|
||||
|
||||
**Blocked AI bots:**
|
||||
- **GPTBot** & **ChatGPT-User** (OpenAI)
|
||||
- **CCBot** (Common Crawl - used by many AI companies)
|
||||
- **anthropic-ai** & **Claude-Web** (Anthropic)
|
||||
- **Google-Extended** (Google's AI training crawler, separate from Googlebot)
|
||||
- **PerplexityBot** (Perplexity AI)
|
||||
- **Omgilibot**, **Diffbot**, **Bytespider**, **FacebookBot**, **ImagesiftBot**, **cohere-ai**
|
||||
|
||||
**Note:** Regular search engine crawlers (Googlebot, Bingbot, etc.) are still allowed so your comic can be discovered through search.
|
||||
|
||||
The robots.txt also includes a reference to your Terms of Service for transparency.
|
||||
|
||||
#### HTML Meta Tags
|
||||
Every page includes meta tags that signal to AI scrapers not to use the content:
|
||||
|
||||
```html
|
||||
<meta name="robots" content="noai, noimageai">
|
||||
<meta name="googlebot" content="noai, noimageai">
|
||||
```
|
||||
|
||||
- `noai` - Prevents AI training on text content
|
||||
- `noimageai` - Prevents AI training on images (your comics)
|
||||
|
||||
#### Terms of Service
|
||||
A comprehensive Terms of Service page at `/terms` legally prohibits:
|
||||
- Using content for AI training or machine learning
|
||||
- Scraping or harvesting content for datasets
|
||||
- Creating derivative works using AI trained on your content
|
||||
- Text and Data Mining (TDM) without permission
|
||||
|
||||
The Terms page is automatically linked in your footer and includes:
|
||||
- Copyright protection assertions
|
||||
- DMCA enforcement information
|
||||
- TDM rights reservation (EU Directive 2019/790 Article 4)
|
||||
- Clear permitted use guidelines
|
||||
|
||||
#### HTTP Headers
|
||||
Sunday Comics automatically adds `X-Robots-Tag: noai, noimageai` headers to all responses for additional AI blocking enforcement.
|
||||
|
||||
#### TDM Reservation File
|
||||
The `/tdmrep.json` endpoint formally reserves Text and Data Mining rights under EU Directive 2019/790, pointing to your Terms of Service.
|
||||
|
||||
### Advanced: Image-Level Protection Tools
|
||||
|
||||
For artists who want to protect their work at the image level, consider these specialized tools:
|
||||
|
||||
#### Glaze (Style Protection)
|
||||
**What it does:** Adds imperceptible changes to images that prevent AI models from accurately learning your artistic style.
|
||||
|
||||
**Best for:**
|
||||
- Protecting your unique art style from being copied by AI
|
||||
- Making AI-generated imitations look wrong or distorted
|
||||
- Artists concerned about style mimicry (e.g., "draw like [artist name]" prompts)
|
||||
|
||||
**How to use:**
|
||||
1. Download from [glaze.cs.uchicago.edu](https://glaze.cs.uchicago.edu)
|
||||
2. Process your comic images before uploading to your site
|
||||
3. The changes are invisible to humans but confuse AI models
|
||||
|
||||
**Trade-offs:**
|
||||
- Processing time: Can take several minutes per image
|
||||
- Slight file size increase
|
||||
- Requires reprocessing all comics
|
||||
|
||||
#### Nightshade (Data Poisoning)
|
||||
**What it does:** Makes images appear as something completely different to AI models while looking normal to humans.
|
||||
|
||||
**Best for:**
|
||||
- Active defense against unauthorized AI training
|
||||
- Making scraped data actively harmful to AI models
|
||||
- Artists who want to fight back against scraping
|
||||
|
||||
**How to use:**
|
||||
1. Download from [nightshade.cs.uchicago.edu](https://nightshade.cs.uchicago.edu)
|
||||
2. Process images before uploading (can combine with Glaze)
|
||||
3. AI models trained on these images will produce incorrect results
|
||||
|
||||
**Trade-offs:**
|
||||
- More aggressive than Glaze (may violate some ToS)
|
||||
- Processing time similar to Glaze
|
||||
- Ongoing research tool, effectiveness may vary
|
||||
|
||||
#### Recommendations
|
||||
- **Use Glaze if:** You want passive protection for your art style
|
||||
- **Use Nightshade if:** You want active defense and accept the risks
|
||||
- **Use both if:** Maximum protection is your priority
|
||||
- **Combine with Sunday Comics protections:** These tools complement the web-based protections (robots.txt, meta tags, etc.)
|
||||
|
||||
**Note:** Both tools are free, open-source projects from the University of Chicago's SAND Lab, specifically designed to help artists protect their work from AI exploitation.
|
||||
|
||||
### Important Limitations
|
||||
|
||||
**These measures are voluntary** - they only work if AI companies respect them:
|
||||
|
||||
✅ **What this does:**
|
||||
- Signals your intent to protect your content
|
||||
- Provides legal grounding for DMCA takedowns
|
||||
- Blocks responsible AI companies that honor robots.txt
|
||||
- Makes your copyright stance clear to users and crawlers
|
||||
|
||||
❌ **What this doesn't do:**
|
||||
- Cannot physically prevent determined bad actors from scraping
|
||||
- Cannot remove already-scraped historical data from existing datasets
|
||||
- No guarantee all AI companies will honor these signals
|
||||
|
||||
**Companies that claim to honor robots.txt:**
|
||||
- OpenAI (GPTBot blocking)
|
||||
- Anthropic (anthropic-ai blocking)
|
||||
- Google (Google-Extended blocking, separate from search)
|
||||
|
||||
### Customizing Your Terms
|
||||
|
||||
Edit `/Users/pori/PycharmProjects/sunday/content/terms.md` to customize:
|
||||
|
||||
1. **Jurisdiction** - Add your country/state for legal clarity
|
||||
2. **Permitted use** - Adjust what you allow (fan art, sharing, etc.)
|
||||
3. **Contact info** - Automatically populated from `comics_data.py`
|
||||
|
||||
The Terms page uses Jinja2 template variables that pull from your configuration:
|
||||
- `{{ copyright_name }}` - From `COPYRIGHT_NAME` in `comics_data.py`
|
||||
- `{{ social_email }}` - From `SOCIAL_EMAIL` in `comics_data.py`
|
||||
|
||||
### Testing Your Protection
|
||||
|
||||
**Verify robots.txt:**
|
||||
```bash
|
||||
curl https://yourcomic.com/robots.txt
|
||||
```
|
||||
|
||||
You should see AI bot blocks and a link to your terms.
|
||||
|
||||
**Check meta tags:**
|
||||
View page source and look for:
|
||||
```html
|
||||
<meta name="robots" content="noai, noimageai">
|
||||
```
|
||||
|
||||
**Validate Terms page:**
|
||||
Visit `https://yourcomic.com/terms` to ensure it renders correctly.
|
||||
|
||||
### Reporting Violations
|
||||
|
||||
If you discover your work in an AI training dataset or being used without permission:
|
||||
|
||||
1. **Document the violation** - Screenshots, URLs, timestamps
|
||||
2. **Review their TOS** - Many AI services have content dispute processes
|
||||
3. **Send DMCA takedown** - Your Terms of Service provides legal standing
|
||||
4. **Contact the platform** - Use your `SOCIAL_EMAIL` from the Terms page
|
||||
|
||||
Resources:
|
||||
- [US Copyright Office DMCA](https://www.copyright.gov/dmca/)
|
||||
- [EU Copyright Directive](https://digital-strategy.ec.europa.eu/en/policies/copyright-legislation)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
|
||||
83
app.py
83
app.py
@@ -19,6 +19,13 @@ app = Flask(__name__)
|
||||
app.config['SECRET_KEY'] = os.environ.get('SECRET_KEY', 'your-secret-key')
|
||||
|
||||
|
||||
@app.after_request
|
||||
def add_ai_blocking_headers(response):
|
||||
"""Add headers to discourage AI scraping"""
|
||||
response.headers['X-Robots-Tag'] = 'noai, noimageai'
|
||||
return response
|
||||
|
||||
|
||||
@app.context_processor
|
||||
def inject_global_settings():
|
||||
"""Make global settings available to all templates"""
|
||||
@@ -217,6 +224,28 @@ def about():
|
||||
return render_template('page.html', title='About', content=html_content)
|
||||
|
||||
|
||||
@app.route('/terms')
|
||||
def terms():
|
||||
"""Terms of Service page"""
|
||||
from jinja2 import Template
|
||||
# Read and render the markdown file with template variables
|
||||
terms_path = os.path.join(os.path.dirname(__file__), 'content', 'terms.md')
|
||||
try:
|
||||
with open(terms_path, 'r', encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
# First render as Jinja template to substitute variables
|
||||
template = Template(content)
|
||||
rendered_content = template.render(
|
||||
copyright_name=COPYRIGHT_NAME,
|
||||
social_email=SOCIAL_EMAIL if SOCIAL_EMAIL else '[Contact Email]'
|
||||
)
|
||||
# Then convert markdown to HTML
|
||||
html_content = markdown.markdown(rendered_content)
|
||||
except FileNotFoundError:
|
||||
html_content = '<p>Terms of Service content not found.</p>'
|
||||
return render_template('page.html', title='Terms of Service', content=html_content)
|
||||
|
||||
|
||||
@app.route('/api/comics')
|
||||
def api_comics():
|
||||
"""API endpoint - returns all comics as JSON"""
|
||||
@@ -244,6 +273,9 @@ def robots():
|
||||
"""Generate robots.txt dynamically with correct SITE_URL"""
|
||||
from flask import Response
|
||||
robots_txt = f"""# Sunday Comics - Robots.txt
|
||||
# Content protected by copyright. AI training prohibited.
|
||||
# See terms: {SITE_URL}/terms
|
||||
|
||||
User-agent: *
|
||||
Allow: /
|
||||
|
||||
@@ -252,10 +284,61 @@ Sitemap: {SITE_URL}/sitemap.xml
|
||||
|
||||
# Disallow API endpoints from indexing
|
||||
Disallow: /api/
|
||||
|
||||
# Block AI crawlers and scrapers
|
||||
User-agent: GPTBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: ChatGPT-User
|
||||
Disallow: /
|
||||
|
||||
User-agent: CCBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: anthropic-ai
|
||||
Disallow: /
|
||||
|
||||
User-agent: Claude-Web
|
||||
Disallow: /
|
||||
|
||||
User-agent: Google-Extended
|
||||
Disallow: /
|
||||
|
||||
User-agent: PerplexityBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Omgilibot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Diffbot
|
||||
Disallow: /
|
||||
|
||||
User-agent: Bytespider
|
||||
Disallow: /
|
||||
|
||||
User-agent: FacebookBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: ImagesiftBot
|
||||
Disallow: /
|
||||
|
||||
User-agent: cohere-ai
|
||||
Disallow: /
|
||||
"""
|
||||
return Response(robots_txt, mimetype='text/plain')
|
||||
|
||||
|
||||
@app.route('/tdmrep.json')
|
||||
def tdm_reservation():
|
||||
"""TDM (Text and Data Mining) reservation - signals AI training prohibition"""
|
||||
return jsonify({
|
||||
"tdm": {
|
||||
"reservation": 1,
|
||||
"policy": f"{SITE_URL}/terms"
|
||||
}
|
||||
})
|
||||
|
||||
|
||||
@app.errorhandler(404)
|
||||
def page_not_found(e):
|
||||
"""404 error handler"""
|
||||
|
||||
93
content/terms.md
Normal file
93
content/terms.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Terms of Service
|
||||
|
||||
**Last Updated:** January 2025
|
||||
|
||||
By accessing and using this website, you agree to be bound by these Terms of Service. If you do not agree to these terms, please do not use this site.
|
||||
|
||||
## Copyright and Ownership
|
||||
|
||||
All comics, artwork, text, graphics, and other content on this website are protected by copyright and owned by {{ copyright_name }}. All rights reserved.
|
||||
|
||||
## Permitted Use
|
||||
|
||||
**Personal Use:** You may:
|
||||
- Read and enjoy the comics for personal, non-commercial purposes
|
||||
- Share links to individual comic pages on social media
|
||||
- Embed comics on personal websites with proper attribution and a link back to the original
|
||||
|
||||
**Attribution Required:** When sharing or embedding, you must:
|
||||
- Provide clear credit to {{ copyright_name }}
|
||||
- Include a link back to this website
|
||||
- Not alter, crop, or modify the comic images
|
||||
|
||||
## Prohibited Use
|
||||
|
||||
You are **expressly prohibited** from:
|
||||
|
||||
### AI Training and Machine Learning
|
||||
- Using any content from this site for training artificial intelligence models
|
||||
- Scraping, crawling, or harvesting content for machine learning purposes
|
||||
- Including any images, text, or data in AI training datasets
|
||||
- Using content to develop, train, or improve generative AI systems
|
||||
- Creating derivative works using AI trained on this content
|
||||
|
||||
### Commercial Use
|
||||
- Reproducing, distributing, or selling comics without explicit written permission
|
||||
- Using comics or artwork for commercial purposes without a license
|
||||
- Printing comics on merchandise (t-shirts, mugs, etc.) without authorization
|
||||
|
||||
### Modification and Redistribution
|
||||
- Altering, editing, or creating derivative works from the comics
|
||||
- Removing watermarks, signatures, or attribution
|
||||
- Rehosting images on other servers or websites
|
||||
- Claiming comics as your own work
|
||||
|
||||
## Data Mining and Web Scraping
|
||||
|
||||
**Automated Access Prohibition:** Automated scraping, crawling, or systematic downloading of content is strictly prohibited without prior written consent. This includes but is not limited to:
|
||||
- Web scrapers and bots (except authorized search engines)
|
||||
- Automated downloads of images or data
|
||||
- RSS feed abuse or bulk downloading
|
||||
- Any form of data harvesting for commercial purposes
|
||||
|
||||
**Text and Data Mining (TDM) Reservation:** We formally reserve all rights under applicable copyright law regarding text and data mining, including but not limited to EU Directive 2019/790 Article 4. No TDM exceptions apply to this content.
|
||||
|
||||
## DMCA and Copyright Enforcement
|
||||
|
||||
Unauthorized use of copyrighted material from this site may violate copyright law and be subject to legal action under the Digital Millennium Copyright Act (DMCA) and other applicable laws.
|
||||
|
||||
If you discover unauthorized use of content from this site, please report it to {{ social_email }}.
|
||||
|
||||
## Fair Use
|
||||
|
||||
Limited use for purposes of commentary, criticism, news reporting, teaching, or research may qualify as fair use. If you believe your use qualifies as fair use, please contact us first.
|
||||
|
||||
## License Requests
|
||||
|
||||
If you wish to use content in ways not permitted by these terms, please contact us to discuss licensing arrangements.
|
||||
|
||||
## Privacy
|
||||
|
||||
We respect your privacy. This site may use cookies for basic functionality and analytics. We do not sell personal information to third parties.
|
||||
|
||||
## External Links
|
||||
|
||||
This site may contain links to external websites. We are not responsible for the content or practices of third-party sites.
|
||||
|
||||
## Modifications to Terms
|
||||
|
||||
We reserve the right to modify these Terms of Service at any time. Changes will be posted on this page with an updated "Last Updated" date.
|
||||
|
||||
## Contact
|
||||
|
||||
For questions about these terms, licensing requests, or to report copyright violations:
|
||||
|
||||
{{ social_email }}
|
||||
|
||||
## Governing Law
|
||||
|
||||
These Terms of Service are governed by applicable copyright law and the laws of [Your Jurisdiction].
|
||||
|
||||
---
|
||||
|
||||
**Summary:** You can read and share links to comics, but you cannot use them for AI training, scrape the site, use them commercially, or create modified versions without permission.
|
||||
@@ -754,7 +754,8 @@ main {
|
||||
gap: var(--space-sm);
|
||||
}
|
||||
|
||||
.footer-bottom p {
|
||||
.footer-bottom p,
|
||||
.footer-terms {
|
||||
flex-basis: 100%;
|
||||
text-align: center;
|
||||
}
|
||||
@@ -963,6 +964,18 @@ footer {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
.footer-terms {
|
||||
color: var(--color-text);
|
||||
text-decoration: none;
|
||||
font-size: var(--font-size-md);
|
||||
transition: opacity 0.2s ease;
|
||||
}
|
||||
|
||||
.footer-terms:hover {
|
||||
text-decoration: underline;
|
||||
opacity: 0.8;
|
||||
}
|
||||
|
||||
/* Compact Footer Mode */
|
||||
footer.compact-footer {
|
||||
border-top: none;
|
||||
|
||||
@@ -9,6 +9,10 @@
|
||||
<meta name="description" content="{% block meta_description %}A webcomic about life, the universe, and everything{% endblock %}">
|
||||
<link rel="canonical" href="{% block canonical %}{{ site_url }}{{ request.path }}{% endblock %}">
|
||||
|
||||
<!-- AI Scraping Prevention -->
|
||||
<meta name="robots" content="noai, noimageai">
|
||||
<meta name="googlebot" content="noai, noimageai">
|
||||
|
||||
<!-- Open Graph / Facebook -->
|
||||
<meta property="og:type" content="website">
|
||||
<meta property="og:url" content="{% block meta_url %}{{ site_url }}{{ request.path }}{% endblock %}">
|
||||
@@ -164,6 +168,8 @@
|
||||
<div class="footer-bottom">
|
||||
<p>© {{ current_year }} {{ copyright_name }}. All rights reserved.</p>
|
||||
<span class="footer-divider" aria-hidden="true">|</span>
|
||||
<a href="{{ url_for('terms') }}" class="footer-terms">Terms of Service</a>
|
||||
<span class="footer-divider" aria-hidden="true">|</span>
|
||||
<div class="site-credit">
|
||||
<a href="https://git.puercito.net/mi/sunday" target="_blank" rel="noopener noreferrer" aria-label="Sunday Comics - Webcomic platform">
|
||||
<img src="{{ url_for('static', filename='images/sunday.jpg') }}" alt="Sunday Comics" class="credit-image">
|
||||
|
||||
Reference in New Issue
Block a user