39.3. LoadModel

39.3.1. Model Loading System

class memories.models.load_model.LoadModel[source]

Bases: object

__init__(use_gpu=True, model_provider=None, deployment_type=None, model_name=None, api_key=None, endpoint=None, device=None)[source]

Initialize model loader with configuration.

Parameters:

use_gpu (bool) – Whether to use GPU if available
model_provider (str) – The model provider (e.g., “deepseek”, “azure-ai”, “mistral”)
deployment_type (str) – Either “local” or “api”
model_name (str) – Short name of the model from BaseModel.MODEL_MAPPINGS
api_key (str) – API key for the model provider (required for API deployment type)
endpoint (str) – Endpoint URL for the model provider (optional)
device (str) – Specific GPU device to use (e.g., “cuda:0”, “cuda:1”)

get_response(prompt, **kwargs)[source]

Generate a response using either local model or API.

Parameters:

prompt (str) – The input prompt
**kwargs – Additional generation parameters including: max_length: Maximum length of generated response temperature: Sampling temperature (0.0 to 1.0) top_p: Nucleus sampling parameter top_k: Top-k sampling parameter num_beams: Number of beams for beam search

Returns:

Response dictionary containing:: text: The generated response text metadata: Generation metadata (tokens, time, etc) error: Error message if generation failed

Return type:

Dict[str, Any]

cleanup()[source]: Clean up model resources.

get_response_with_context(prompt, context_data, **kwargs)[source]

Generate a response using context-aware prompting.

Parameters:

prompt (str) – The input prompt
context_data (Dict[str, Any]) – Dictionary containing contextual information including: - location_info: Location details - raw_data_summary: Summary of raw data from different sources - analysis_results: Results of various analyses - scenario_projections: Future scenario projections - historical_trends: Historical trend analysis
**kwargs – Additional generation parameters including: max_length: Maximum length of generated response temperature: Sampling temperature (0.0 to 1.0) top_p: Nucleus sampling parameter top_k: Top-k sampling parameter num_beams: Number of beams for beam search

Returns:

Response dictionary containing:: text: The generated response text metadata: Generation metadata including context usage error: Error message if generation failed

Return type:

Dict[str, Any]

chat_completion(messages, tools=None, tool_choice='auto', **kwargs)[source]

Generate a chat completion response using either local model or API.

Parameters:

messages (List[Dict[str, str]]) – List of message dictionaries with ‘role’ and ‘content’ keys. Roles can be ‘user’, ‘assistant’, ‘system’, or ‘function’.
tools (List[Dict[str, Any]] | None) – Optional list of tool/function definitions that the model can use. Each tool should have a ‘type’, ‘function’ with ‘name’, ‘description’, ‘parameters’.
tool_choice (str) – How to handle tool selection. Options: - “auto”: Let the model decide if it should call a function - “none”: Don’t call any functions - Dict with specific function to call
**kwargs – Additional parameters including: temperature: Sampling temperature (0.0 to 1.0) max_tokens: Maximum tokens in the response top_p: Nucleus sampling parameter frequency_penalty: Frequency penalty parameter presence_penalty: Presence penalty parameter

Returns:

Response dictionary containing:: message: The assistant’s message tool_calls: List of tool calls if any metadata: Generation metadata error: Error message if generation failed

Return type:

Dict[str, Any]

39.3.2. LoadModel Class

class memories.models.load_model.LoadModel[source]

Bases: object

__init__(use_gpu=True, model_provider=None, deployment_type=None, model_name=None, api_key=None, endpoint=None, device=None)[source]

Initialize model loader with configuration.

Parameters:

use_gpu (bool) – Whether to use GPU if available
model_provider (str) – The model provider (e.g., “deepseek”, “azure-ai”, “mistral”)
deployment_type (str) – Either “local” or “api”
model_name (str) – Short name of the model from BaseModel.MODEL_MAPPINGS
api_key (str) – API key for the model provider (required for API deployment type)
endpoint (str) – Endpoint URL for the model provider (optional)
device (str) – Specific GPU device to use (e.g., “cuda:0”, “cuda:1”)

get_response(prompt, **kwargs)[source]

Generate a response using either local model or API.

Parameters:

prompt (str) – The input prompt
**kwargs – Additional generation parameters including: max_length: Maximum length of generated response temperature: Sampling temperature (0.0 to 1.0) top_p: Nucleus sampling parameter top_k: Top-k sampling parameter num_beams: Number of beams for beam search

Returns:

Response dictionary containing:: text: The generated response text metadata: Generation metadata (tokens, time, etc) error: Error message if generation failed

Return type:

Dict[str, Any]

cleanup()[source]: Clean up model resources.

get_response_with_context(prompt, context_data, **kwargs)[source]

Generate a response using context-aware prompting.

Parameters:

prompt (str) – The input prompt
context_data (Dict[str, Any]) – Dictionary containing contextual information including: - location_info: Location details - raw_data_summary: Summary of raw data from different sources - analysis_results: Results of various analyses - scenario_projections: Future scenario projections - historical_trends: Historical trend analysis
**kwargs – Additional generation parameters including: max_length: Maximum length of generated response temperature: Sampling temperature (0.0 to 1.0) top_p: Nucleus sampling parameter top_k: Top-k sampling parameter num_beams: Number of beams for beam search

Returns:

Response dictionary containing:: text: The generated response text metadata: Generation metadata including context usage error: Error message if generation failed

Return type:

Dict[str, Any]

chat_completion(messages, tools=None, tool_choice='auto', **kwargs)[source]

Generate a chat completion response using either local model or API.

Parameters:

messages (List[Dict[str, str]]) – List of message dictionaries with ‘role’ and ‘content’ keys. Roles can be ‘user’, ‘assistant’, ‘system’, or ‘function’.
tools (List[Dict[str, Any]] | None) – Optional list of tool/function definitions that the model can use. Each tool should have a ‘type’, ‘function’ with ‘name’, ‘description’, ‘parameters’.
tool_choice (str) – How to handle tool selection. Options: - “auto”: Let the model decide if it should call a function - “none”: Don’t call any functions - Dict with specific function to call
**kwargs – Additional parameters including: temperature: Sampling temperature (0.0 to 1.0) max_tokens: Maximum tokens in the response top_p: Nucleus sampling parameter frequency_penalty: Frequency penalty parameter presence_penalty: Presence penalty parameter

Returns:

Response dictionary containing:: message: The assistant’s message tool_calls: List of tool calls if any metadata: Generation metadata error: Error message if generation failed

Return type:

Dict[str, Any]

39.3.3. Model Types

39.3.3.1. Base Model

class memories.models.base_model.BaseModel[source]

Bases: object

Base model class that can be shared across modules

__init__()[source]: Initialize the base model.

classmethod get_instance()[source]: Get singleton instance of BaseModel.

get_model_config(model_name)[source]

Get configuration for a specific model.

Parameters:: model_name (str) –
Return type:: Dict[str, Any]

initialize_model(model, use_gpu=True, device=None)[source]

Initialize a model with the specified configuration.

Parameters:

model (str) – Model identifier from config
use_gpu (bool) – Whether to use GPU if available
device (str) – Specific GPU device to use (e.g., “cuda:0”, “cuda:1”)

Returns:

True if initialization successful, False otherwise

Return type:

bool

generate(prompt, **kwargs)[source]

Generate text using the model with configured parameters.

Parameters:

prompt (str) – Input prompt
**kwargs – Override default generation parameters

Returns:

Generated text

Return type:

str

cleanup()[source]: Clean up model resources.

classmethod get_model_path(provider, model_key)[source]

Get the full model path/identifier for a given provider and model key

Parameters:

provider (str) –
model_key (str) –

Return type:

str

classmethod list_providers()[source]

List all available providers

Return type:: List[str]

classmethod list_models(provider=None)[source]

List all available models, optionally filtered by provider

Parameters:: provider (str) –
Return type:: List[str]

39.3.3.2. API Connectors

class memories.models.api_connector.APIConnector[source]

Bases: ABC

Base class for API connectors.

__init__(api_key=None)[source]

Initialize the API connector.

Parameters:: api_key (str | None) –

abstract generate(prompt, **kwargs)[source]

Generate text using the API.

Parameters:: prompt (str) –
Return type:: str

abstract chat_completion(messages, tools=None, tool_choice='auto', **kwargs)[source]

Generate a chat completion from messages and optional tools.

Parameters:

messages (List[Dict[str, str]]) –
tools (List[Dict[str, Any]] | None) –
tool_choice (str) –

Return type:

Dict[str, Any]

class memories.models.api_connector.OpenAIConnector[source]

Bases: APIConnector

Connector for OpenAI API.

__init__(api_key=None)[source]

Initialize the API connector.

Parameters:: api_key (str) –

generate(prompt, model=None, **kwargs)[source]

Generate text using the API.

Parameters:

prompt (str) –
model (str) –

Return type:

str

chat_completion(messages, tools=None, tool_choice='auto', **kwargs)[source]

Generate a chat completion using the OpenAI API.

Parameters:

messages (List[Dict[str, str]]) – List of message dictionaries with ‘role’ and ‘content’ keys
tools (List[Dict[str, Any]] | None) – Optional list of tools/functions the model can use
tool_choice (str) – How to handle tool selection (“auto”, “none”, or specific)
**kwargs – Additional parameters for the API call

Returns:

Dict containing the response message, tool calls, and metadata

Return type:

Dict[str, Any]

class memories.models.api_connector.AnthropicConnector[source]

Bases: APIConnector

Connector for Anthropic API.

__init__(api_key=None)[source]

Initialize the API connector.

Parameters:: api_key (str) –

generate(prompt, model=None, **kwargs)[source]

Generate text using the API.

Parameters:

prompt (str) –
model (str) –

Return type:

str

chat_completion(messages, tools=None, tool_choice='auto', **kwargs)[source]

Generate a chat completion using the Anthropic API. Note: Tool usage may not be supported by all Anthropic models.

Parameters:

messages (List[Dict[str, str]]) –
tools (List[Dict[str, Any]] | None) –
tool_choice (str) –

Return type:

Dict[str, Any]

class memories.models.api_connector.DeepseekConnector[source]

Bases: APIConnector

Connector for Deepseek API.

__init__(api_key=None)[source]

Initialize the API connector.

Parameters:: api_key (str) –

generate(prompt, model=None, **kwargs)[source]

Generate text using the API.

Parameters:

prompt (str) –
model (str) –

Return type:

str

chat_completion(messages, tools=None, tool_choice='auto', **kwargs)[source]

Generate a chat completion using the Deepseek API. Note: Tool usage may not be supported by all Deepseek models.

Parameters:

messages (List[Dict[str, str]]) –
tools (List[Dict[str, Any]] | None) –
tool_choice (str) –

Return type:

Dict[str, Any]

39.3.4. Example Usage

from memories.models.load_model import LoadModel

# Initialize model with local deployment
local_model = LoadModel(
    use_gpu=True,
    model_provider="deepseek-ai",
    deployment_type="local",
    model_name="deepseek-coder-small"
)

# Generate text with the local model
response = local_model.get_response("Write a function to calculate factorial")
print(response["text"])

# Initialize model with API deployment
api_model = LoadModel(
    model_provider="openai",
    deployment_type="api",
    model_name="gpt-4",
    api_key="your-api-key"  # Or set OPENAI_API_KEY environment variable
)

# Generate text with the API model
response = api_model.get_response(
    "Explain quantum computing",
    temperature=0.7,
    max_tokens=500
)
print(response["text"])

# Clean up resources when done
local_model.cleanup()