Active projects
3
1 paused
Notebooks
12
across all stages
URLs tracked
0
via Web URLs tab
Tasks open
4
2 blocked
Project progress
| Project | Stages | Progress | Status |
|---|---|---|---|
| Samoylova Network | active | ||
| Immortal Regiment Spain | active | ||
| Propaganda Narratives 2026 | paused |
Database status
| File | Size | Schema | Last backup | Integrity |
|---|---|---|---|---|
| argos_deep.sqlite | 1.2 GB | v3 | 2026-06-21 | ok |
| argos_tracking.sqlite | — | v1 | — | pending |
Samoylova Networkactive
01 ingest ✓
02 process ✓
03 organize →
04 analyze
| Notebook | v | Last run | Status |
|---|---|---|---|
| ARGOS_SamoyloveNetwork_01_TelegramIngest_v3 | v3 | 2026-06-20 | production |
| ARGOS_SamoyloveNetwork_01_WebScraping_v1 | v1 | 2026-06-18 | draft |
| ARGOS_SamoyloveNetwork_02_Transcription_v2 | v2 | 2026-06-20 | production |
| ARGOS_SamoyloveNetwork_03_NormalizationToDB_v2 | v2 | 2026-06-21 | running |
Immortal Regiment Spainactive
01 ingest ✓
02 process →
03 organize
04 analyze
| Notebook | v | Last run | Status |
|---|---|---|---|
| ARGOS_ImmortalRegiment_01_DocumentIngest_v1 | v1 | 2026-06-19 | production |
| ARGOS_ImmortalRegiment_02_PDFExtraction_v1 | v1 | 2026-06-19 | draft |
Propaganda Narratives 2026paused
01 ingest ✓
02 process
03 organize
04 analyze
| Notebook | v | Last run | Status |
|---|---|---|---|
| ARGOS_PropagandaNarratives_01_WebIngest_v1 | v1 | 2026-06-10 | paused |
Database files
Master database
argos_deep.sqlite
Underscore only. No version in filename. Single source of truth.
Working copy (Colab local)
argos_deep_working.sqlite
Ephemeral — copied from master at session start, deleted when session ends.
Versioned backup
argos_deep_backup_YYYYMMDD_vN.sqlite
e.g. argos_deep_backup_20260621_v3.sqlite — increment N on schema change.
Nightly backup
argos_deep_backup_nightly_YYYYMMDD.sqlite
Auto-created at session end. Rolling 7-day window. Oldest deleted automatically.
✗ argos-deep.sqlite (hyphen = wrong — Python can't import hyphenated names)
✗ argos_deep_v3.sqlite (version number in master name = wrong)
Colab notebooks
Pattern
ARGOS_ProjectName_NN_WorkflowName_vX.ipynb
PascalCase project · two-digit stage number · PascalCase task · version number
Stage 01 — ingest
ARGOS_SamoyloveNetwork_01_TelegramIngest_v3.ipynb
Stage 02 — process
ARGOS_SamoyloveNetwork_02_Transcription_v2.ipynb
Stage 03 — organize
ARGOS_SamoyloveNetwork_03_NormalizationToDB_v2.ipynb
Stage 04 — analyze
ARGOS_SamoyloveNetwork_04_NetworkAnalysis_v1.ipynb
Increment version (v1 → v2) only when logic changes significantly. Formatting and comment edits don't count. Keep old versions — never delete them.
Scripts & folders
Python scripts
NN_descriptive_name.py
Leading two-digit number sets execution order. e.g. 01_mount_gdrive.py
GDrive system folders
_database _backups _scripts _docs
Underscore prefix = system/infrastructure. Never used as project names.
Project folders
samoylova_network
immortal_regiment_spain
immortal_regiment_spain
snake_case. Lowercase only. No spaces, no hyphens.
Staging session folders
2026-06-21/session_id_abc123/
ISO date + session UUID prefix. Auto-created by backup_manager.py
argos_deep.sqlite — production dataschema v3
| Table | Rows | Key columns | Notes |
|---|---|---|---|
| DIGITAL_CONTENT | 130,526 | id · source · content_type · text · actor_id · language | 15% actorId NULL ⚠ |
| ACTORS | 796 | id · name · aliases · affiliation · confidence | 89% complete |
| NARRATIVES | 40 | id · theme · description · first_seen | |
| TECHNIQUES | 35 | id · disarm_id · name · category | DISARM framework |
| TRANSCRIPTS | 4,410 | id · content_id · full_text · language · method | 340 missing full_text ⚠ |
| TIMELINE_ITEMS | 658 | id · event_date · actor_id · narrative_id · description | |
| CLAIMS | 8,934 | id · content_id · claim_text · ptcof_score · verified | Added in v3 |
| RELATIONSHIPS | 2,104 | id · actor_a · actor_b · rel_type · confidence | Added in v3 |
argos_tracking.sqlite — progress & metadataschema v1
| Table | Purpose | Key columns |
|---|---|---|
| projects | Project registry | id · name · status · start_date · owner · objectives |
| notebooks | Notebook tracking | id · project_id · filename · stage · version · last_run · run_count |
| tasks | Task management | id · project_id · task_name · status · priority · due_date |
| progress_snapshots | Daily progress log | id · date · project_id · rows_ingested · completed_tasks · notes |
| weekly_reports | Auto-generated reports | id · week_of · project_id · completed_tasks · blockers · achievements |
| tracked_urls | Web URL queue | id · url · title · category · project_id · priority · status · notes |
Schema changelog
| Version | Date | Changes |
|---|---|---|
| v3 | 2026-06-21 | Full consolidation · added CLAIMS + RELATIONSHIPS · merged argos_deep_working.sqlite · removed 271 duplicates |
| v2 | 2026-06-20 | Added CLAIMS table · added RELATIONSHIPS table · ACTORS.id migrated string → UUID |
| v1 | 2026-06-10 | Initial normalized schema from BigQuery argos_v7_1 extraction |
GDrive structure/My Drive/ARGOS/
| Path | Contains | Access |
|---|---|---|
| _database/ | argos_deep.sqlite · README_DATABASE.txt · argos_tracking.sqlite | read only |
| _backups/ | Versioned + nightly backups · BACKUP_MANIFEST.txt | immutable |
| _scripts/00_utilities/ | Shared Python modules (colab_startup, database_helpers, etc.) | writable |
| _scripts/projects/ProjectName/01_ingest/ | Ingest notebooks + config.yaml | writable |
| _scripts/projects/ProjectName/02_process/ | Transcription, PDF extraction, cleaning notebooks | writable |
| _scripts/projects/ProjectName/03_organize/ | Normalization, entity extraction, deduplication | writable |
| _scripts/projects/ProjectName/04_analyze/ | Network analysis, briefing generation, STIX | writable |
| _scripts/templates/ | TEMPLATE_01..04.ipynb — copy for new work | writable |
| _docs/ | SCHEMA_v3.md · SCHEMA_CHANGELOG.md · DATA_QUALITY_ISSUES.md | writable |
| staging/ProjectName/YYYY-MM-DD/session_id/ | Session outputs, metadata.json, session_log.txt | writable |
| _sources/ | BigQuery exports · filesystem scans — reference only | reference |
Colab session paths
| Path | Purpose | Persists? |
|---|---|---|
| /content/drive/MyDrive/ARGOS/ | GDrive mount point | yes — GDrive |
| /content/argos_deep_working.sqlite | Working DB copy — all session work happens here | session only |
| /content/staging_output/ | Session output files before upload | session only |
Active versions
3
v1 · v2 · v3 always kept
Nightly window
7
rolling days
Current schema
v3
2026-06-21
Total size
3.1 GB
all backups combined
Active versioned backups (keep last 3)
argos_deep_backup_20260621_v3.sqlite
Full consolidation · merged 3 databases · 1.2 GB · 2026-06-21
argos_deep_backup_20260620_v2.sqlite
Schema v2 — added CLAIMS + RELATIONSHIPS · 1.1 GB · 2026-06-20
argos_deep_backup_20260610_v1.sqlite
Initial normalized schema from BigQuery · 0.9 GB · 2026-06-10
Backup trigger rules
| Trigger | Type | Action | Retention |
|---|---|---|---|
| Schema change | Versioned | Create _vN.sqlite before migration | Keep last 3 |
| Bulk ingestion (>10k rows) | Versioned | Create _vN.sqlite before ingest | Keep last 3 |
| Session end (data modified) | Nightly | Auto-create _nightly_YYYYMMDD.sqlite | Rolling 7 days |
| Any risky change | Versioned | Run PRAGMA integrity_check first, then backup | Keep last 3 |
Quick backup snippet (Colab)
import shutil, datetime
backup_name = f"argos_deep_backup_{datetime.date.today()}_v3.sqlite"
shutil.copy(
'/content/argos_deep_working.sqlite',
f'/content/drive/MyDrive/ARGOS/_backups/{backup_name}'
)
print(f"✓ Backup: {backup_name}")
Stage 01 — Ingestdata collection
Collect raw data from external sources. Input arrives from Telegram API, PDFs, web pages, YouTube, or local files. Output goes to staging/ProjectName/YYYY-MM-DD/session_id/.
| Source | Notebook pattern | Output format |
|---|---|---|
| Telegram API | ARGOS_Project_01_TelegramIngest_vX | CSV / JSON |
| PDF documents | ARGOS_Project_01_DocumentUpload_vX | Raw text + metadata |
| Web scraping | ARGOS_Project_01_WebScraping_vX | HTML / markdown |
| YouTube | ARGOS_Project_01_YoutubeMetadata_vX | JSON + subtitles |
Stage 02 — Processtransformation
Transform raw data into standardized, clean format. Reads from staging session folders. Outputs processed files back to the same session folder with _processed suffix.
| Operation | Notebook pattern | Utility used |
|---|---|---|
| Transcription (audio/video → text) | ARGOS_Project_02_Transcription_vX | transcription_helpers.py |
| PDF text extraction | ARGOS_Project_02_PDFExtraction_vX | pdf_extraction_helpers.py |
| Text cleaning / normalization | ARGOS_Project_02_TextCleaning_vX | — |
| Language detection / NER | ARGOS_Project_02_LanguageProcessing_vX | entity_extraction.py |
Stage 03 — Organizenormalization
Normalize processed data and load into argos_deep.sqlite. Deduplication, schema mapping, entity linking, referential integrity checks.
| Operation | Notebook pattern |
|---|---|
| Load into DB (DIGITAL_CONTENT, etc.) | ARGOS_Project_03_NormalizationToDB_vX |
| Entity extraction to ACTORS table | ARGOS_Project_03_EntityExtraction_vX |
| Remove duplicate rows | ARGOS_Project_03_Deduplication_vX |
Stage 04 — Analyzeintelligence production
Generate intelligence products from the database. Reads from argos_deep.sqlite. Outputs HTML briefings, STIX bundles, visualizations, network graphs.
| Operation | Notebook pattern | Output |
|---|---|---|
| Network analysis | ARGOS_Project_04_NetworkAnalysis_vX | Graph · PDF |
| Narrative mapping | ARGOS_Project_04_NarrativeMapping_vX | Report · JSON |
| HTML briefing | ARGOS_Project_04_BriefingGeneration_vX | HTML |
| STIX 2.1 bundle | ARGOS_Project_04_STIXGeneration_vX | JSON bundle |
Add URL
URL queue
0 URLs
All
🔍 Important
📥 To scrape
⏳ Pending
📌 Reference
🗑️ Skip
No URLs yet. Add one above.
Add task
Task board
All
Pending
In progress
Blocked
Completed
| Task | Project | Priority | Status |
|---|
Log session
Recent sessions
| Date | Project | Stage | Rows | Status |
|---|---|---|---|---|
| 2026-06-21 | Samoylova Network | 03_organize | 1,250 | success |
| 2026-06-20 | Samoylova Network | 02_process | 980 | success |
| 2026-06-19 | Immortal Regiment | 01_ingest | 342 | success |