Seeds
이 문서는 seeds 폴더의 README에서 자동 생성되었습니다.
Seeds
섹션 제목: “Seeds”Seeds represent the contract phase of the NewsFork pipeline.
Philosophy
섹션 제목: “Philosophy”“Seed Engine defines HOW to fetch content.”
Seeds are production-ready contracts that specify exactly how to retrieve content from a source, including:
- Fetch method (HTML, RSS, API)
- Content selectors
- Update frequency
- Validation rules
Structure
섹션 제목: “Structure”seeds/├── drafts/ # Pending review│ └── country=sg/│ └── domain=mom.gov.sg/│ └── content=news/│ └── v1.json├── active/ # Production contracts│ └── country=sg/│ └── ...├── archived/ # Historical│ └── ...└── README.mdStorage
섹션 제목: “Storage”Seed contracts are stored in GitHub for version control and audit trail:
- Primary Storage: GitHub repository
- Path Format:
seeds/{status}/country={code}/domain={domain}/content={type}/v{version}.json - Backup: R2 (via metadata sync, future)
Lifecycle
섹션 제목: “Lifecycle”┌─────────┐ review ┌─────────┐ deprecate ┌──────────┐│ draft │ ──────────▶ │ active │ ─────────────▶ │ archived │└─────────┘ └─────────┘ └──────────┘ │ │ suspend ▼ ┌───────────┐ │ suspended │ └───────────┘Seed Contract Schema
섹션 제목: “Seed Contract Schema”{ "seed_id": "sg-mom-001", "source": { "domain": "mom.gov.sg", "type": "government", "name": "Ministry of Manpower", "country": "SG" }, "discovery": { "alive": true, "checked_at": "2026-01-23T04:10:00Z" }, "contents": [ { "nature": "news", "source_url": "https://www.mom.gov.sg/newsroom", "fetch_type": "html", "medium": "web", "confidence": 0.92 } ], "status": "draft", "version": 1}Key Concepts
섹션 제목: “Key Concepts”Source Type vs Content Category
섹션 제목: “Source Type vs Content Category”Source Type (WHO produces):
government- Official government entitiesorganization- Non-profit organizationsmedia- News outletscompany- Private companies
Content Category (WHAT the content is):
news- Time-sensitive announcementspolicy- Laws, regulationsguide- How-to contentfaq- Q&A structured info
Versioning
섹션 제목: “Versioning”Seeds are versioned to track changes:
v1.json,v2.json, etc.- Version bumps on content/structure changes
- Discovery updates don’t bump version
Promotion Flow
섹션 제목: “Promotion Flow”- Research discovers URLs (R2 datasets)
- Seed Engine creates draft (via API)
- Human reviews draft
- PR merged → promoted to active (via API)
- Scraper trusts active seeds only
API Endpoints
섹션 제목: “API Endpoints”Seed Operations
섹션 제목: “Seed Operations”| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/seeds | List seeds with filters |
| GET | /api/v1/seeds/:id | Get seed by ID |
| POST | /api/v1/seeds | Create draft seed |
| PATCH | /api/v1/seeds/:id | Update seed |
| POST | /api/v1/seeds/:id/promote | Promote to active |
| POST | /api/v1/seeds/:id/archive | Archive seed |
Query Parameters
섹션 제목: “Query Parameters”List seeds with filtering:
GET /api/v1/seeds?country=SG&status=active&source_type=governmentSupported filters:
country: Country code (SG, GB, US, etc.)status: draft, active, archived, suspendedsource_type: government, organization, media, companylimit: Number of results (default: 50, max: 100)offset: Pagination offset
Service Layer
섹션 제목: “Service Layer”The Seed Service (src/services/seed.service.ts) orchestrates:
- Domain Functions: Pure business logic (no Cloudflare dependencies)
- Infra Adapters: GitHub storage for contracts
- Validation: Zod schema validation
Key Functions
섹션 제목: “Key Functions”// Domain functions (pure, testable)createSeedContract(input: SeedCreateRequest): SeedContractvalidateSeedContract(contract: SeedContract): ValidationResultpromoteSeedToActive(contract: SeedContract): SeedContractcreateSeedPath(...): string
// Service layer (orchestrates domain + infra)SeedService.list(params): Promise<SeedListResult>SeedService.get(seedId): Promise<SeedContract>SeedService.create(request): Promise<SeedContract>SeedService.update(seedId, request): Promise<SeedContract>SeedService.promote(seedId): Promise<SeedContract>Contract Creation Flow
섹션 제목: “Contract Creation Flow”1. Research Dataset (R2) ↓2. Seed Candidate Analysis ↓3. Draft Seed Creation POST /api/v1/seeds { "source": { ... }, "contents": [ ... ] } ↓4. Human Review (GitHub PR) ↓5. Promotion POST /api/v1/seeds/:id/promote ↓6. Active Seed (GitHub) seeds/active/country=sg/domain=.../v1.jsonDuplicate Detection
섹션 제목: “Duplicate Detection”The system automatically checks for duplicate seeds:
- Domain-based: Same domain + country combination
- Content-based: Same source URL + content type
- KV Cache: Fast lookup via Cloudflare KV
Duplicate seeds are rejected with appropriate error messages.
Path Format
섹션 제목: “Path Format”Seeds use Hive-style partitioning:
seeds/drafts/country=sg/domain=mom.gov.sg/content=news/v1.jsonseeds/active/country=sg/domain=mom.gov.sg/content=news/v1.jsonseeds/archived/country=sg/domain=mom.gov.sg/content=news/v1.jsonThis format:
- Enables efficient querying by country/domain
- Supports versioning (v1, v2, etc.)
- Compatible with data lake tools
Status Management
섹션 제목: “Status Management”Draft
섹션 제목: “Draft”- Initial state for all new seeds
- Stored in
seeds/drafts/ - Requires human review before promotion
Active
섹션 제목: “Active”- Production-ready contracts
- Stored in
seeds/active/ - Trusted by scrapers
- Can be suspended or archived
Suspended
섹션 제목: “Suspended”- Temporarily disabled
- Still in
seeds/active/but marked as suspended - Can be reactivated
Archived
섹션 제목: “Archived”- Deprecated or replaced
- Stored in
seeds/archived/ - Historical record only
Versioning Strategy
섹션 제목: “Versioning Strategy”Seeds are versioned when:
- Content structure changes
- Fetch method changes
- Selectors are updated
- Validation rules change
Versioning does NOT occur for:
- Discovery updates (liveness checks)
- Metadata updates
- Status changes
Integration with Research
섹션 제목: “Integration with Research”Seeds are created from Research datasets:
- Research Discovery: URLs discovered and stored in R2
- Seed Analysis: Analyze research datasets for seed candidates
- Contract Generation: Create seed contracts from candidates
- Review Process: Human review via GitHub PRs
- Promotion: Promote approved seeds to active
Data Access
섹션 제목: “Data Access”Via API
섹션 제목: “Via API”# List active seedscurl https://api.example.com/api/v1/seeds?status=active&country=SG
# Get specific seedcurl https://api.example.com/api/v1/seeds/sg-mom-001
# Create draft seedcurl -X POST https://api.example.com/api/v1/seeds \ -H "Content-Type: application/json" \ -d '{ "source": { "domain": "mom.gov.sg", "type": "government", "country": "SG" }, "contents": [{ "nature": "news", "source_url": "https://www.mom.gov.sg/newsroom", "fetch_type": "html" }] }'
# Promote to activecurl -X POST https://api.example.com/api/v1/seeds/sg-mom-001/promoteVia GitHub Direct Access
섹션 제목: “Via GitHub Direct Access”Seed contracts can be accessed directly from GitHub:
// Using GitHub Storage Serviceconst github = createStorageService({ owner: "owner", repo: "repo", token: "token"});const seed = await github.readSeedContract(status, country, domain, content, version);Environment-Specific Storage
섹션 제목: “Environment-Specific Storage”Seed contracts are stored in GitHub, which is shared across environments. However, the API can filter by environment-specific metadata if needed.
Related Documentation
섹션 제목: “Related Documentation”- Project README
- ](/ko/v1/guides/research/)
- Environment Guide
- Architecture Guidelines