2.3. Memory Systemο
The Memory System is the core component of the memories-dev framework, responsible for storing, organizing, and retrieving data in a way that preserves temporal and spatial relationships. This page explains how the memory system works and how to use it effectively.
2.3.1. Overviewο
The Memory System in memories-dev is designed to mimic aspects of human memory, particularly the ability to:
Store and retrieve information across different time periods
Organize information spatially
Establish relationships between different pieces of information
Provide context for understanding data
The system consists of four main components:
Temporal Memory: Manages data across time
Spatial Memory: Organizes data geographically
Context Memory: Maintains contextual information
Relationship Memory: Tracks connections between data elements
2.3.2. Memory Tiers Architectureο
The memories-dev framework implements a sophisticated multi-tiered memory architecture inspired by modern computing memory hierarchies and human memory systems. This design optimizes for both performance and cost-efficiency.
Each memory tier serves a specific purpose:
Tier |
Implementation |
Access Speed |
Purpose |
|---|---|---|---|
Hot Memory |
GPU-accelerated memory |
Microseconds |
Immediate processing of active data, optimized for parallel computation and neural network operations |
Warm Memory |
CPU memory & Redis |
Milliseconds |
Fast access to recently used data, supports complex queries and intermediate results |
Cold Memory |
DuckDB |
Milliseconds to seconds |
Efficient on-device storage for structured data with SQL query capabilities |
Glacier Memory |
Parquet files |
Seconds to minutes |
Long-term compressed storage for historical data, optimized for space efficiency |
The memory system automatically manages data migration between tiers based on access patterns, importance, and age of data. This approach ensures optimal performance while minimizing resource usage.
2.3.3. Mathematical Foundationsο
The memory systemβs design is based on several mathematical principles:
2.3.3.1. Vector Embeddings and Similarityο
Data retrieval in the memory system relies on vector embeddings and similarity metrics. The primary similarity measure used is cosine similarity:
similarity(A, B) = \cos(\theta) = \frac{A \cdot B}{||A|| \cdot ||B||} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}} Where: - \(A\) and \(B\) are vector embeddings - \(\theta\) is the angle between vectors - \(||A||\) and \(||B||\) are the magnitudes of the vectors For efficient nearest - neighbor search, the system uses FAISS (Facebook AI Similarity Search) with an L2 distance metric:
L2(A, B) = ||A - B||_2 = sqrt{sum_{i=1}^{n} (A_i - B_i)^2} Temporal Decay Function βββββββ The memory system implements a temporal decay function to model the importance of data over time:
importance(t) = \alpha \cdot e^{-\lambda (t_{now} - t)} Where: - \(t\) is the timestamp of the data - \(t_{now}\) is the current time - \(\alpha\) is the initial importance - \(\lambda\) is the decay rate parameter This function helps determine when data should be migrated between memory tiers. Spatial Indexing ββββ- For efficient spatial queries, the system uses geospatial indexing techniques. The primary approach is based on geohash encoding, which maps 2D coordinates to a 1D string: .. math:
geohash(lat, lon, precision) = text{base32_encode}(text{interleave_bits}(lat, lon)) This enables efficient range queries and proximity searches in the spatial domain. Implementation Details =================== The memory system is implemented through several key classes: MemoryManager ββββ The MemoryManager class coordinates all memory operations across the different tiers: .. code-block:: python
- class MemoryManager:
βββMemory manager that handles different memory tiers: - Hot Memory: GPU-accelerated memory for immediate processing - Warm Memory: CPU and Redis for fast in-memory access - Cold Memory: DuckDB for efficient on-device storage - Glacier Memory: Parquet files for off-device compressed storage
- def __init__(
self, storage_path: Path, redis_url: str = βredis://localhost:6379β, redis_db: int = 0, hot_memory_size: int = 1000, warm_memory_size: int = 10000, cold_memory_size: int = 100000, glacier_memory_size: int = 1000000
- ):
# Initialize memory tiers self.hot = HotMemory(storage_path=storage_path / βhotβ, max_size=hot_memory_size) self.warm = WarmMemory(redis_url=redis_url, redis_db=redis_db, max_size=warm_memory_size) self.cold = ColdMemory(storage_path=storage_path / βcoldβ, max_size=cold_memory_size) self.glacier = GlacierMemory(storage_path=storage_path / βglacierβ, max_size=glacier_memory_size)
The manager provides unified methods for storing, retrieving, and managing data across all tiers:
# Store data in memory system
memory_manager.store(data)
# Retrieve data from specific tier
result = memory_manager.retrieve(query, tier="hot")
# Retrieve all data from a tier
all_data = memory_manager.retrieve_all(tier="warm")
# Clear specific tier or all tiers
memory_manager.clear(tier="cold")
2.3.3.2. Memory Encodingο
The MemoryEncoder class handles the conversion of various data types into vector embeddings:
class MemoryEncoder:
"""Encodes different types of data into vector embeddings"""
def __init__(self, embedding_dim: int = 128):
self.embedding_dim = embedding_dim
# Initialize encoders for different data types
def encode(self, data: Dict[str, Any]) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
"""Encode data into vector embeddings"""
# Determine data type and use appropriate encoder
if "image" in data:
return self._encode_image(data["image"])
elif "text" in data:
return self._encode_text(data["text"])
elif "vector" in data:
return self._encode_vector(data["vector"])
elif "coordinates" in data:
return self._encode_coordinates(data["coordinates"])
else:
raise ValueError("Unsupported data type")
2.3.3.3. FAISS Integrationο
The system uses FAISS for efficient similarity search:
def _init_index(self):
"""Initialize FAISS index"""
index_file = self.index_path / "memory.index"
if index_file.exists():
self.index = faiss.read_index(str(index_file))
with open(self.index_path / "metadata.pkl", "rb") as f:
self.metadata = pickle.load(f)
else:
# Create new index
self.index = faiss.IndexFlatL2(512) # 512-dimensional embeddings
self.metadata = {}
2.3.4. Temporal Memoryο
Temporal Memory manages data across time, enabling efficient retrieval of historical states and temporal patterns.
2.3.4.1. Key Featuresο
Time Series Storage: Efficient storage of time-series data with various temporal resolutions
Temporal Indexing: Fast retrieval of data for specific time points or ranges
Versioning: Tracking changes to data over time
Temporal Patterns: Identification of patterns, trends, and anomalies across time
Interpolation: Filling gaps in temporal data through interpolation
2.3.4.2. Basic Usageο
from memories.memory import TemporalMemory
# Initialize temporal memory
temporal_memory = TemporalMemory()
# Store data with temporal information
temporal_memory.store(
data=satellite_imagery,
time_field="acquisition_date",
location_field="coordinates",
metadata={"source": "sentinel-2", "processing_level": "L2A"}
)
# Retrieve data for a specific time point
image_2020 = temporal_memory.get_at(
location=(37.7749, -122.4194),
time="2020-01-01"
)
# Retrieve data for a time range
images_2018_2022 = temporal_memory.get_range(
location=(37.7749, -122.4194),
start_time="2018-01-01",
end_time="2022-12-31",
interval="monthly" # Options: daily, weekly, monthly, yearly, etc.
)
# Get temporal statistics
stats = temporal_memory.get_statistics(
location=(37.7749, -122.4194),
time_range=("2018-01-01", "2022-12-31"),
metrics=["mean", "min", "max", "trend"]
)
2.3.4.3. Advanced Featuresο
Temporal Memory supports several advanced features:
2.3.4.3.1. Temporal Aggregationο
Aggregate data across different time periods:
# Aggregate monthly data to yearly
yearly_data = temporal_memory.aggregate(
data=monthly_data,
aggregation="yearly",
aggregation_method="mean" # Options: mean, sum, min, max, etc.
)
2.3.4.3.2. Temporal Interpolationο
Fill gaps in temporal data:
# Interpolate missing data points
complete_series = temporal_memory.interpolate(
data=sparse_data,
method="linear", # Options: linear, cubic, nearest, etc.
target_resolution="daily"
)
2.3.4.3.3. Change Detectionο
Detect changes between different time points:
# Detect changes between two time points
changes = temporal_memory.detect_changes(
location=(37.7749, -122.4194),
time1="2018-01-01",
time2="2022-01-01",
threshold=0.2, # Significance threshold
change_metrics=["area", "intensity"]
)
2.3.5. Spatial Memoryο
Spatial Memory organizes data geographically, supporting spatial queries and geographic relationships.
2.3.5.1. Key Featuresο
Spatial Indexing: Efficient indexing of data by location using techniques like quadtrees or geohashes
Spatial Queries: Support for various spatial queries (point, radius, polygon, etc.)
Spatial Relationships: Identification of spatial relationships between features
Multi-resolution Storage: Storage of data at different spatial resolutions
Coordinate System Management: Handling of different coordinate systems and projections
2.3.5.2. Basic Usageο
from memories.memory import SpatialMemory
# Initialize spatial memory
spatial_memory = SpatialMemory()
# Store data with spatial information
spatial_memory.store(
data=buildings,
geometry_field="geometry",
metadata={"source": "openstreetmap", "feature_type": "building"}
)
# Retrieve data at a specific point
point_data = spatial_memory.get_at(
location=(37.7749, -122.4194)
)
# Retrieve data within a radius
radius_data = spatial_memory.get_radius(
center=(37.7749, -122.4194),
radius_km=2,
feature_types=["building", "road", "landuse"]
)
# Retrieve data within a polygon
polygon_data = spatial_memory.get_polygon(
polygon=city_boundary,
feature_types=["building"]
)
2.3.5.3. Advanced Featuresο
Spatial Memory supports several advanced features:
2.3.5.3.1. Spatial Analysisο
Perform spatial analysis operations:
# Calculate density of features
density = spatial_memory.calculate_density(
feature_type="building",
area=neighborhood_boundary,
resolution="100m" # Grid cell size
)
# Find nearest features
nearest = spatial_memory.find_nearest(
location=(37.7749, -122.4194),
feature_type="park",
max_distance_km=5,
limit=5
)
2.3.5.3.2. Spatial Clusteringο
Identify clusters of features:
# Cluster features
clusters = spatial_memory.cluster(
feature_type="building",
area=city_boundary,
method="dbscan", # Options: dbscan, kmeans, hierarchical, etc.
parameters={"eps": 0.1, "min_samples": 5}
)
2.3.5.3.3. Spatial Joinsο
Join datasets based on spatial relationships:
# Join buildings with land use data
joined_data = spatial_memory.spatial_join(
left=buildings,
right=landuse,
how="inner", # Options: inner, left, right
predicate="intersects" # Options: intersects, contains, within, etc.
)
2.3.6. Performance Optimizationο
The memory system includes several optimizations to ensure efficient operation:
2.3.6.1. Caching Strategiesο
The system implements intelligent caching to minimize redundant operations:
The caching strategy includes:
Time-based Expiration: Cache entries expire after a configurable time period
LRU Eviction: Least Recently Used entries are evicted when cache size limits are reached
Selective Caching: Only cache results that are expensive to compute or frequently accessed
2.3.6.2. Parallel Processingο
The memory system leverages parallel processing for improved performance:
async def process_batch(self, items):
"""Process a batch of items in parallel"""
tasks = [self._process_item(item) for item in items]
return await asyncio.gather(*tasks)
async def _process_item(self, item):
"""Process a single item"""
# Implementation details...
This approach significantly improves throughput for batch operations.
2.3.7. Monitoring and Metricsο
The memory system provides comprehensive monitoring capabilities:
# Get memory system statistics
stats = memory_manager.get_stats()
# Example output:
# {
# "hot_memory": {"size": 256, "capacity": 1000, "utilization": 25.6},
# "warm_memory": {"size": 1024, "capacity": 10000, "utilization": 10.2},
# "cold_memory": {"size": 5120, "capacity": 100000, "utilization": 5.1},
# "glacier_memory": {"size": 10240, "capacity": 1000000, "utilization": 1.0},
# "operations": {"reads": 1500, "writes": 500, "cache_hits": 1200, "cache_misses": 300}
# }
These metrics can be used to monitor system performance and optimize memory usage.
2.3.8. Conclusionο
The Memory System is a core component of the memories-dev framework, providing efficient storage, retrieval, and organization of data across temporal and spatial dimensions. By leveraging a multi-tiered architecture and sophisticated indexing techniques, it enables high-performance operations on large-scale geospatial datasets.
For more information on how to use the Memory System in your applications, see the βapi_referenceβ and Examples sections.