Architecture Overview¶
Studiorum is built on a service-oriented architecture designed for extensibility, type safety, and performance. This document provides a comprehensive overview of the system's design.
High-Level Architecture¶
Core Components¶
Service Container Architecture¶
Studiorum uses a sophisticated service container for dependency injection and lifecycle management:
# Core service container pattern
@dataclass
class ServiceDescriptor[T]:
protocol: type[T]
factory: ServiceFactory[T] | AsyncServiceFactory[T]
lifecycle: ServiceLifecycle
dependencies: tuple[type[Any], ...]
class ServiceContainer:
"""
Manages service registration, instantiation, and lifecycle.
Supports both sync and async services with protocol-based typing.
"""
async def get_service[T](self, protocol: type[T]) -> T:
"""Type-safe service resolution with protocol validation."""
def register_service[T](
self,
protocol: type[T],
factory: ServiceFactory[T],
lifecycle: ServiceLifecycle,
) -> None:
"""Register services with proper typing and dependency tracking."""
Key Features:
- Protocol-based typing: All services implement protocols for loose coupling
- Lifecycle management: Singleton, transient, and scoped service lifetimes
- Dependency injection: Automatic resolution of service dependencies
- Hot reloading: Support for configuration changes without restart
- Cleanup ordering: Proper shutdown sequence with priority-based cleanup
Content Processing Pipeline¶
1. Content Ingestion¶
class ContentMerger:
"""
Merges content from multiple sources with conflict resolution.
Implements LRU caching and TTL-based invalidation.
"""
def merge_sources(
self,
sources: list[ContentSource],
strategy: MergeStrategy = MergeStrategy.PRIORITY_BASED
) -> MergedContent:
"""Intelligent content merging with conflict resolution."""
2. Content Indexing¶
class Omnidexer:
"""
High-performance content indexing system.
Provides fast lookup and search across all content types.
"""
def index_content[T: DeepIndexable](self, content: T) -> IndexResult:
"""Deep indexing using the DeepIndexable protocol."""
def search[T](
self,
query: SearchQuery,
content_type: type[T]
) -> SearchResult[T]:
"""Type-safe content search with filtering and ranking."""
3. Content Resolution¶
class ContentResolver:
"""
Resolves content references and builds complete content graphs.
Handles cross-references, dependencies, and validation.
"""
def resolve_adventure(self, adventure_id: str) -> Result[Adventure, ResolveError]:
"""Resolve adventure with all dependencies and cross-references."""
Processing Architecture¶
Tag Processing System¶
Studiorum uses an AST-based tag processing system for 5e content:
class TagParser:
"""
Lark-based parser for 5etools tag syntax.
Builds abstract syntax trees for complex tag structures.
"""
def parse(self, content: str) -> TagAST:
"""Parse tagged content into structured AST."""
class TagResolver:
"""
Resolves tag references and builds complete content structures.
Integrates with ContentTracker for cross-reference management.
"""
def resolve_tag(
self,
tag: TagNode,
context: RenderingContext
) -> Result[ResolvedContent, TagError]:
"""Resolve individual tags with context awareness."""
Rendering Pipeline¶
class RenderingContext:
"""
Immutable context object carrying rendering state and configuration.
Provides type-safe access to services and metadata.
"""
output_format: str
omnidexer: OmnidexerProtocol
content_tracker: ContentTracker
metadata: dict[str, Any]
class UnifiedTagRenderer:
"""
Dispatches tag rendering to appropriate handlers.
Supports multiple output formats and extensible handler system.
"""
def render_tag(
self,
tag: ResolvedTag,
context: RenderingContext
) -> str:
"""Render tag using appropriate format-specific handler."""
Data Flow Architecture¶
Request Processing Flow¶
Service Lifecycle Management¶
@enum.unique
class ServiceLifecycle(enum.Enum):
"""Service instance lifecycle management strategies."""
SINGLETON = "singleton" # One instance per container
TRANSIENT = "transient" # New instance per request
SCOPED = "scoped" # One instance per scope (e.g., request)
class ServiceContainer:
async def __aenter__(self) -> ServiceContainer:
"""Initialize all singleton services on container startup."""
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
"""Cleanup services in reverse dependency order."""
Type Safety Architecture¶
Protocol-Based Design¶
Studiorum uses Python protocols extensively for type-safe interfaces:
# Core service protocols
@runtime_checkable
class OmnidexerProtocol(Protocol):
"""Protocol for content indexing and search services."""
def is_loaded(self) -> bool: ...
def get_content[T](self, content_id: str, content_type: type[T]) -> T | None: ...
async def search_async[T](self, query: SearchQuery) -> SearchResult[T]: ...
@runtime_checkable
class ContentResolverProtocol(Protocol):
"""Protocol for content resolution services."""
def resolve_adventure(self, adventure_id: str) -> Result[Adventure, ResolveError]: ...
def resolve_creature(self, creature_id: str) -> Result[Creature, ResolveError]: ...
Result Type Pattern¶
All operations that can fail use Result types for explicit error handling:
from typing import TypeVar, Generic, Union
T = TypeVar('T')
E = TypeVar('E')
class Success[T, E](Generic[T, E]):
def __init__(self, value: T) -> None:
self.value = value
def unwrap(self) -> T:
return self.value
class Error[T, E](Generic[T, E]):
def __init__(self, error: E) -> None:
self.error = error
def with_context(self, message: str, **context: Any) -> Error:
"""Chain error context for better debugging."""
return Error({
"message": message,
"underlying": self.error,
"context": context
})
Result = Union[Success[T, E], Error[T, E]]
Content Model Architecture¶
All content follows structured Pydantic models with validation:
class BaseContent(BaseModel):
"""Base class for all 5e content with common fields."""
name: str
source: Source
page: int | None = None
class Config:
validate_assignment = True
extra = "forbid"
class Creature(BaseContent):
"""Structured creature data with type validation."""
size: CreatureSize
creature_type: CreatureType
challenge_rating: ChallengeRating
armor_class: ArmorClass
hit_points: HitPoints
speeds: list[Speed]
ability_scores: AbilityScores
skills: list[Skill] = []
damage_resistances: list[DamageType] = []
# ... additional fields with full typing
Async/Sync Architecture¶
Hybrid Design Pattern¶
Studiorum intentionally uses a hybrid async/sync architecture:
# CLI Context - Synchronous by design
class CLIContext:
"""Synchronous context for CLI operations."""
def get_omnidexer(self) -> Omnidexer:
return get_global_container().get_omnidexer_sync()
# MCP Context - Asynchronous by design
class AsyncRequestContext:
"""Asynchronous context for MCP server operations."""
async def get_service[T](self, protocol: type[T]) -> T:
return await self.container.get_service(protocol)
Design Rationale:
- CLI: Sequential operations, simple debugging, direct user feedback
- MCP: Concurrent requests, isolation, scalable AI agent integration
- Testing: Easier to test sync code, async when needed for concurrency
Service Access Patterns¶
# Sync access pattern (CLI)
def convert_adventure_sync(adventure_name: str) -> Result[str, ConversionError]:
omnidexer = get_omnidexer() # Sync singleton access
resolver = ContentResolver(omnidexer)
return resolver.resolve_and_convert(adventure_name)
# Async access pattern (MCP)
async def convert_adventure_async(
ctx: AsyncRequestContext,
adventure_name: str
) -> Result[str, ConversionError]:
omnidexer = await ctx.get_service(OmnidexerProtocol)
resolver = await ctx.get_service(ContentResolverProtocol)
return await resolver.resolve_and_convert_async(adventure_name)
Observability Architecture¶
Integrated Observability¶
Studiorum includes comprehensive observability through Logfire integration:
class ObservableImageService:
"""Wraps services with automatic observability."""
def __init__(self, wrapped_service: Any, observer: ImageProcessingObserver):
self._wrapped = wrapped_service
self._observer = observer
def __getattr__(self, name: str) -> Any:
"""Automatically instrument all method calls."""
attr = getattr(self._wrapped, name)
if callable(attr):
return self._create_observable_method(name, attr)
return attr
def _create_observable_method(self, method_name: str, method: Callable) -> Callable:
"""Create instrumented version of method with metrics collection."""
def instrumented(*args, **kwargs):
operation_id = self._observer.start_operation(
stage=self._infer_stage(method_name),
content_type=self._infer_content_type(args)
)
try:
result = method(*args, **kwargs)
self._observer.complete_operation(operation_id, result)
return result
except Exception as e:
self._observer.fail_operation(operation_id, e)
raise
return instrumented
Performance Monitoring¶
@dataclass
class ProcessingStatistics:
"""Real-time processing statistics."""
total_operations: int = 0
successful_operations: int = 0
failed_operations: int = 0
average_duration_ms: float = 0.0
operations_by_stage: dict[ImageProcessingStage, int] = field(default_factory=dict)
operations_by_content_type: dict[ContentType, int] = field(default_factory=dict)
def calculate_success_rate(self) -> float:
"""Calculate operation success rate."""
if self.total_operations == 0:
return 1.0
return self.successful_operations / self.total_operations
Extension Architecture¶
Plugin System¶
Studiorum supports extensible plugins through well-defined interfaces:
@runtime_checkable
class RendererPlugin(Protocol):
"""Protocol for custom renderer plugins."""
def get_supported_formats(self) -> list[str]: ...
def render_content(self, content: Any, context: RenderingContext) -> str: ...
def get_plugin_info(self) -> PluginInfo: ...
class PluginManager:
"""Manages plugin registration and lifecycle."""
def register_plugin(self, plugin: RendererPlugin) -> None:
"""Register a new renderer plugin."""
def get_renderer(self, format_type: str) -> RendererPlugin | None:
"""Get appropriate renderer for format."""
Custom Tag Handlers¶
@runtime_checkable
class TagHandler(Protocol):
"""Protocol for custom tag handlers."""
def get_supported_tags(self) -> list[str]: ...
def handle_tag(self, tag: TagNode, context: RenderingContext) -> str: ...
# Custom handler example
class CustomCreatureHandler:
def get_supported_tags(self) -> list[str]:
return ["@custom_creature"]
def handle_tag(self, tag: TagNode, context: RenderingContext) -> str:
# Custom rendering logic
return self.render_custom_creature(tag.attributes, context)
Performance Architecture¶
Caching Strategy¶
Multi-level caching for optimal performance:
class CacheManager:
"""Manages multiple cache levels with intelligent eviction."""
def __init__(self):
self.l1_cache = LRUCache(maxsize=1000) # In-memory, fast
self.l2_cache = DiskCache(maxsize=10000) # Disk-based, persistent
self.l3_cache = RedisCache() # Distributed, scalable
async def get[T](self, key: str, factory: Callable[[], T]) -> T:
"""Multi-level cache with write-through strategy."""
# L1 - Memory cache
if value := self.l1_cache.get(key):
return value
# L2 - Disk cache
if value := await self.l2_cache.get(key):
self.l1_cache[key] = value
return value
# L3 - Distributed cache
if value := await self.l3_cache.get(key):
self.l1_cache[key] = value
await self.l2_cache.set(key, value)
return value
# Generate and cache
value = await factory()
await self._cache_at_all_levels(key, value)
return value
Parallel Processing¶
class ParallelProcessor:
"""Manages parallel processing with resource limits."""
def __init__(self, max_workers: int | None = None):
self.max_workers = max_workers or os.cpu_count()
self.semaphore = asyncio.Semaphore(self.max_workers)
async def process_batch[T, R](
self,
items: list[T],
processor: Callable[[T], Awaitable[R]],
chunk_size: int = 10
) -> list[R]:
"""Process items in parallel with controlled concurrency."""
async def process_with_semaphore(item: T) -> R:
async with self.semaphore:
return await processor(item)
tasks = [process_with_semaphore(item) for item in items]
return await asyncio.gather(*tasks)
Security Architecture¶
Input Validation¶
All external input goes through strict validation:
class ContentValidator:
"""Validates all content input against schemas."""
def validate_content_id(self, content_id: str) -> Result[str, ValidationError]:
"""Validate content identifiers against allowed patterns."""
def validate_search_query(self, query: SearchQuery) -> Result[SearchQuery, ValidationError]:
"""Validate and sanitize search queries."""
def validate_output_path(self, path: str) -> Result[Path, ValidationError]:
"""Validate output paths to prevent directory traversal."""
Sandboxed Execution¶
LaTeX compilation runs in controlled environment:
class SandboxedLaTeXEngine:
"""LaTeX compiler with security restrictions."""
def __init__(self):
self.allowed_packages = self._load_allowed_packages()
self.restricted_commands = self._load_restricted_commands()
def compile_latex(
self,
content: str,
output_path: Path,
compiler: LaTeXCompiler = LaTeXCompiler.PDFLATEX
) -> Result[Path, CompilationError]:
"""Compile LaTeX with security restrictions and resource limits."""
# Validate content for dangerous patterns
validation_result = self._validate_latex_content(content)
if isinstance(validation_result, Error):
return validation_result
# Run in restricted environment
return await self._run_sandboxed_compilation(content, output_path, compiler)
Testing Architecture¶
Test Categories¶
Studiorum uses a layered testing approach:
# Unit tests - Fast, isolated
class TestContentResolver:
def setup_method(self):
from studiorum.core.container import reset_global_container
reset_global_container() # Ensure test isolation
def test_resolve_creature_success(self):
# Test with mocked dependencies
pass
# Integration tests - Real data, no LaTeX
@pytest.mark.requires_data
class TestFullPipeline:
def test_adventure_conversion_integration(self):
# Test with real 5etools data
pass
# LaTeX tests - Full compilation
@pytest.mark.latex_integration
class TestLaTeXOutput:
def test_pdf_generation_complete(self):
# Full LaTeX compilation test
pass
Service Container Testing¶
class TestServiceContainer:
"""Test service container functionality in isolation."""
async def test_service_registration_and_resolution(self):
container = ServiceContainer()
# Register test service
container.register_service(
TestProtocol,
lambda: TestService(),
ServiceLifecycle.SINGLETON
)
# Resolve and verify
service = await container.get_service(TestProtocol)
assert isinstance(service, TestService)
# Verify singleton behavior
service2 = await container.get_service(TestProtocol)
assert service is service2
Configuration Architecture¶
Unified Configuration System¶
class ApplicationConfig(BaseModel):
"""Complete application configuration with validation."""
# Service configuration
services: ServicesConfig = ServicesConfig()
# Content sources
sources: SourcesConfig = SourcesConfig()
# Rendering settings
rendering: RenderingConfig = RenderingConfig()
# LaTeX engine
latex: LaTeXConfig = LaTeXConfig()
# MCP server
mcp: MCPConfig = MCPConfig()
# Observability
observability: ObservabilityConfig = ObservabilityConfig()
@classmethod
def from_file(cls, config_path: Path) -> ApplicationConfig:
"""Load and validate configuration from file."""
def merge_with_env(self) -> ApplicationConfig:
"""Override config values with environment variables."""
Hot Reloading¶
class ConfigurationManager:
"""Manages configuration with hot reloading support."""
def __init__(self, config_path: Path):
self.config_path = config_path
self.current_config = self._load_config()
self.observers: list[Callable[[ApplicationConfig], None]] = []
def watch_for_changes(self) -> None:
"""Watch configuration file and reload on changes."""
def register_observer(self, observer: Callable[[ApplicationConfig], None]) -> None:
"""Register callback for configuration changes."""
def _reload_config(self) -> None:
"""Reload configuration and notify observers."""
new_config = self._load_config()
if new_config != self.current_config:
self.current_config = new_config
self._notify_observers(new_config)
This architecture provides a solid foundation for building on studiorum while maintaining type safety, performance, and extensibility.