The Agentic Edge: A Comprehensive Technical Analysis of Function Gemma and the Paradigm of On-Device Function Calling

1. Introduction: The Strategic Divergence in Generative AI

The trajectory of artificial intelligence development in the post-transformer era has largely been defined by a race for scale. The prevailing orthodoxy posits that larger parameter counts—scaling from billions to trillions—inevitably yield superior reasoning, broader world knowledge, and emergent capabilities. However, a countervailing trend has emerged, driven by the pragmatic necessities of deployment latency, privacy, and computational efficiency. This trend prioritizes “Small Language Models” (SLMs) that are not generalist philosophers, but specialized agents capable of interacting with the digital world.

The release of Function Gemma, a specialized derivative of Google’s Gemma 3 architecture, represents a watershed moment in this divergence. With a parameter count of merely 270 million, Function Gemma challenges the assumption that useful agentic behavior requires massive compute resources.1 Instead, it embodies a philosophy of architectural specialization, where the model is fine-tuned explicitly for “function calling”—the translation of natural language user intent into structured, executable API calls.

This report offers an exhaustive analysis of Function Gemma. We dissect its architectural lineage, rooted in the research behind Gemini and Gemma 3, and explore the technical mechanisms that allow such a compact model to perform as an “intelligent traffic controller” for complex systems.1 We further investigate the operational realities of deploying this model on the “edge”—specifically on mobile devices and embedded systems—enabled by the “Mobile Actions” dataset and specific quantization techniques.2 Through a rigorous examination of performance metrics from the Berkeley Function Calling Leaderboard (BFCL) and comparative analysis against larger peers like Llama 3.2 and Qwen 2.5, this document serves as a definitive technical reference for operationalizing Function Gemma in enterprise and consumer environments.

1.1 The Operational Necessity of the “Traffic Controller”

The central thesis supporting the deployment of Function Gemma is the “Traffic Controller” paradigm. In contemporary compound AI systems, routing every user interaction through a frontier model (such as Gemini 1.5 Pro or GPT-4) is economically and computationally inefficient. A query as trivial as “turn on the living room lights” does not require the reasoning capacity of a trillion-parameter model.

Function Gemma 270M is engineered to inhabit this routing layer. It resides locally on the device, parsing continuous streams of user input. Its primary function is decision-making: it determines whether a query can be satisfied by a local tool (e.g., an alarm clock API), or if it requires escalation to a cloud-based server.1 This hierarchical architecture optimizes the “Total Cost of Inference” (TCI) and minimizes latency for common tasks, reserving heavy cloud compute for queries requiring deep semantic understanding or broad world knowledge. The analysis suggests that this bifurcation of duties—local execution for action, cloud execution for reasoning—is the future reference architecture for Operating Systems integrating AI.

2. Architectural Foundations and Technical Specifications

To understand the capabilities and limitations of Function Gemma, one must examine the underlying architecture of the Gemma 3 family upon which it is built. Unlike its predecessors, Gemma 3 and its derivatives utilize a multimodal, multilingual framework, though Function Gemma isolates the text-to-action modality.

2.1 The Gemma 3 Lineage and Decoder-Only Architecture

Function Gemma is not a bespoke architecture built from scratch; it is a specialized checkpoint of the Gemma 3 270M model.1 This lineage is critical because it means the model inherits the pre-training benefits of the broader Gemma family, which are trained on up to 6 trillion tokens of text, code, and mathematics.3

The architecture follows a standard Transformer Decoder design, which has become the industry standard for causal language modeling.4 However, the choice of 270 million parameters places this model in a unique “micro-model” category.

  • Parameter Efficiency: At 270M parameters, the model requires roughly 0.5 GB of memory to run in full precision (FP16), and significantly less when quantized (approx. 300MB at Q8_0).5 This permits residence in the high-speed SRAM or minimal DRAM allocations of mobile NPUs (Neural Processing Units).
  • Depth vs. Width: Comparative analysis suggests that Gemma 3 generally favors a “thinner and deeper” architecture compared to the “wider and shallower” design of competitors like Llama 3.2.6 This architectural decision has implications for inference acceleration; deeper models can sometimes be harder to parallelize effectively on wide GPU buses but may capture more complex hierarchical relationships in language for their size.

2.2 Vocabulary Size and Token Efficiency

A distinguishing feature of the Gemma architecture is its massive vocabulary size of 256,000 tokens.1 This is double or quadruple the vocabulary size of many comparable models (Llama 3 typically uses ~128k).

In the context of function calling, this large vocabulary is a strategic asset. Function calling involves generating rigid, repetitive code structures (JSON keys, brackets, specific API method names). A larger vocabulary increases the probability that common coding terms, whitespace patterns, and JSON syntax elements are represented as single tokens rather than fragmented sub-tokens.

  • Impact on Latency: Fewer tokens generated means faster completion times. For a JSON object, a vocabulary that can output “parameters”: as a single token is significantly more efficient than one that must output “par, am, eters, “:.
  • Impact on Multilingualism: The 256k vocabulary also supports the model’s multilingual capabilities, allowing it to process function calls embedded in non-English user queries without excessive tokenization bloat.1

2.3 The Context Window: 32K Tokens

Function Gemma supports a context window of 32,768 tokens.7 This is a substantial capacity for a model of this size, addressing a primary pain point in agentic workflows: the “Context Clutter” of tool definitions.

Developers typically inject the schemas of all available tools into the system prompt. If a system has 50 available tools, the JSON schema description alone can consume thousands of tokens. A small context window (e.g., 4096 tokens) would force developers to aggressively truncate descriptions or use RAG to retrieve only relevant tools, increasing complexity. The 32K window allows Function Gemma to maintain a comprehensive “registry” of local tools in its working memory, enabling it to select from a broad array of capabilities without context fragmentation.

2.4 Training Data Composition

The efficacy of Function Gemma is not solely due to its architecture but its training data curriculum.

  • Pre-training: The base model was exposed to 6 trillion tokens of diverse web text, code, and math.3 The ratio of training tokens to parameters (6T / 270M) is roughly 22,000:1. This is orders of magnitude higher than the “Chinchilla optimal” ratio (which suggests ~20 tokens per parameter), indicating the model is significantly over-trained. This strategy is common for inference-optimized models, where training compute is traded for better performance at a smaller inference size.
  • Knowledge Cutoff: The model’s internal knowledge base extends up to August 2024 7, ensuring familiarity with relatively modern API standards and libraries.
  • Fine-Tuning Dataset: The operational specialization comes from the post-training phase. The model was fine-tuned on “Public Tool Definitions” (common APIs found on the web) and “Tool Use Interactions”.7 This dataset includes:
  • Prompts: User queries in various languages.
  • Function Calls: Correctly formatted JSON outputs.
  • Function Responses: The simulated output from the tool (e.g., specific weather data or database returns).
  • Natural Language Summaries: The model’s final response to the user interpreting the tool output.7
  • Clarification Requests: Scenarios where the model must ask for missing parameters (e.g., User: “Book a meeting.” Model: “With whom and at what time?”).7

3. The Mechanics of Function Calling: Control Tokens and Syntax

Function calling is often misunderstood as simply “prompt engineering” to produce JSON. In reality, reliable function calling requires the model to understand the state of the conversation and switch modes between “interlocutor” and “operator.” Function Gemma implements this via a rigorous system of Control Tokens.

3.1 The Role of Special Tokens

Unlike general-purpose models that rely on probabilistic heuristics to output JSON, Function Gemma is trained to recognize and generate specific non-printing or special tokens that demarcate the boundaries of a function call. This prevents “Prompt Injection” attacks and confusion where the model might output the text of a function call rather than the execution command for one.

Research documentation highlights the use of tokens such as <start_function_declaration> and <end_function_declaration>.9 These tokens act as brackets for the tool definition block. Inside these blocks, the model expects a formal schema.

Furthermore, the use of <escape> tags is noted for delimiting string values within descriptions.10 This is a defensive programming measure ingrained in the model’s tokenizer. By wrapping descriptions in <escape>, the model learns to treat the content within as distinct from the structural instructions, reducing the likelihood that a tool description (e.g., “This tool deletes all files”) is misinterpreted as a command to be executed immediately.

3.2 The Turn-Based Interaction Lifecycle

The interaction model for Function Gemma is strictly defined as a state machine. The documentation outlines a standard multi-turn flow that developers must implement 10:

Table 1: The Function Gemma Interaction Cycle

Turn Actor Action Technical Detail
1 Developer Tool Definition The system injects the available tools using <start_function_declaration> and JSON schemas. This establishes the “action space” for the model.
2 User Prompt The user provides natural language input (e.g., “What’s the weather in Tokyo?”).
3 Model Function Call The model parses the intent. If a tool matches, it outputs a structured object (e.g., {“name”: “get_weather”, “parameters”: {“city”: “Tokyo”}}). It does not generate conversational filler here.
4 System Execution The application layer (Python/Java/Swift) intercepts the JSON, executes the actual API call, and captures the return value.
5 Developer Function Response The system feeds the tool’s output back to the model, often wrapped in specific role markers (e.g., role: tool).
6 Model Final Answer The model uses the tool output to generate a natural language response to the user (e.g., “It is currently raining in Tokyo.”).

3.3 Handling Complex and Non-Standard Schemas

A significant capability of Function Gemma is its adaptability to “Non-Standard Schemas”.9 Standard public datasets (like those from OpenAI or Anthropic) often converge on a specific style of JSON schema (OpenAPI spec). However, legacy enterprise systems or embedded devices often use proprietary, terse, or highly nested formats.

Function Gemma’s training recipe allows for “Baking” these definitions into the weights.9 By fine-tuning the model on a specific, non-standard API (e.g., a binary-packed custom protocol represented as JSON), the model can learn to predict the specific idiosyncrasies of that format without needing extensive few-shot examples in the prompt. This “Optimizing Context Usage” 9 frees up the context window for actual conversation history rather than repetitive schema definitions.

4. Performance Analysis: The Berkeley Function Calling Leaderboard (BFCL)

To objectively assess Function Gemma’s capabilities, we turn to the Berkeley Function Calling Leaderboard (BFCL). This benchmark has emerged as the de facto standard for evaluating tool-use, moving beyond simple text similarity to rigorous executable evaluation.

4.1 BFCL Methodology: AST vs. Exec

The BFCL employs a sophisticated evaluation methodology that distinguishes it from generic LLM benchmarks.

  • AST Evaluation: For many categories, the benchmark uses Abstract Syntax Tree (AST) analysis.11 This parses the code generated by the model to ensure it is syntactically valid and structurally correct. This is superior to string matching (BLEU/ROUGE) because it allows for variations in whitespace or argument order that do not affect execution.
  • Execution (Exec): In some categories, the function is actually executed against a sandbox or a mock API to verify the result.12
  • Relevance Detection: This metric measures the model’s ability to abstain. If a user asks “What is the meaning of life?” and the only tools available are get_weather and set_alarm, the correct action is not to call a function. High relevance scores indicate reduced hallucination risks.

4.2 Detailed Performance Metrics

Function Gemma 270M demonstrates a distinct performance profile that reflects its specialization.

Table 2: Function Gemma 270M Performance on BFCL (0-Shot) 13

Metric Category Score (%) Analysis & Implications
BFCL Relevance 61.1 The model is fairly reliable at identifying when a tool is needed. This is critical for an “always-on” listener to avoid interrupting users with unwanted actions.
BFCL Irrelevance 70.6 High Value: The model excels at ignoring queries that don’t match its tools. This “negative constraint” capability is often harder to train than positive action.
BFCL Simple 61.6 Competent performance on single-tool, single-parameter tasks (e.g., “Turn on the light”).
BFCL Parallel 63.5 Surprisingly strong capability to handle requests like “Turn on the kitchen light AND the fan.” The score exceeds the Simple score slightly, suggesting robust independent processing.
BFCL Multiple 39.0 Critical Weakness: The model struggles when presented with a large list of tools to choose from. This indicates a limitation in its “selection attention”—it gets confused by distractors.
BFCL Live Parallel Multiple 20.8 Performance degrades significantly in complex, real-world scenarios requiring both selection from many tools and parallel execution.

Insight: The disparity between the “Parallel” (63.5%) and “Multiple” (39%) scores is revealing. “Parallel” implies generating multiple calls for known tools. “Multiple” implies selecting the right tool from a large list. This suggests Function Gemma 270M is best deployed in environments where the active toolset is small and well-defined (e.g., a specific app context), rather than as a general-purpose agent browsing a library of thousands of APIs.

4.3 Evaluation on the “Mobile Actions” Domain

Beyond general benchmarks, Google evaluated the model on the “Mobile Actions” dataset—a proxy for smartphone OS control.

  • Base Model Accuracy: 58%.13
  • Fine-Tuned Accuracy: 85%.13
    This massive delta (+27%) underscores the necessity of fine-tuning for this class of model. While large models (70B+) might generalize zero-shot to mobile commands, the 270M model relies on domain adaptation to achieve production-grade reliability.

5. Comparative Landscape Analysis

Function Gemma does not exist in a vacuum. It competes with other “Small Language Models” (SLMs) from Meta, Alibaba, and specialized research groups.

5.1 Function Gemma vs. Llama 3.2 1B

Meta’s Llama 3.2 1B is the primary competitor in the “mobile-class” weight division.

  • Size: Llama 3.2 1B is approximately 4.5x larger than Function Gemma 270M (1.23B parameters vs 0.27B).
  • Context: Llama 3.2 boasts a 128K context window 14, significantly larger than Gemma’s 32K.
  • Performance: Reports indicate Llama 3.2 1B scores relatively low on BFCL V2 (approx. 25.7% accuracy in some setups) 15, though it has higher general knowledge.
  • Cost/Efficiency: Gemma 2 27B is cited as being significantly more expensive than Llama 3.2 1B for tokens 16, but this comparison doesn’t scale linearly down to the 270M model. In terms of memory, Function Gemma 270M is vastly more efficient, capable of running on devices with <1GB of available RAM, whereas Llama 3.2 1B requires ~2.5GB (FP16) or ~1GB (Quantized).
  • Conclusion: Llama 3.2 1B is a better “Chat” model that can do function calling. Function Gemma 270M is a “Function Calling” model that is not for chat. If the goal is pure tool execution with minimal battery drain, Gemma wins. If the goal is a conversational assistant that occasionally calls tools, Llama wins.

5.2 Function Gemma vs. Qwen 2.5 1.5B

Alibaba’s Qwen 2.5 series is widely regarded as the current state-of-the-art for coding and logic in SLMs.

  • Coding Proficiency: Qwen 2.5 1.5B (Instruct) typically scores very high on coding benchmarks (HumanEval, MBPP) and BFCL (often >70%).17
  • Architecture: Qwen uses a standard dense architecture.
  • Trade-off: Qwen is a “smarter” model in terms of raw logic and coding. However, it is nearly 6x larger than Function Gemma. For an always-on “wake word” style agent or a background process, Qwen is likely too heavy. Function Gemma acts as a filter; Qwen acts as a solver.

5.3 Function Gemma vs. Gorilla OpenFunctions

Gorilla (University of California, Berkeley) pioneered the fine-tuning of Llama models specifically for function calling.

  • Performance: Gorilla models (often 7B) set the gold standard on BFCL.
  • Comparison: Function Gemma can be viewed as an attempt to distill “Gorilla-level” capability into a sub-1B parameter container. While it may not match Gorilla 7B in handling thousands of tools, its ability to approach competitive scores in “Simple” and “Parallel” categories at 1/25th the size is the key innovation.

6. The “Mobile Actions” Dataset and Edge Strategy

A critical component of the Function Gemma release is the Mobile Actions dataset.2 This dataset is not merely a benchmark; it is a blueprint for how Google envisions AI integrating with Android and other edge OSs.

6.1 Dataset Schema and Composition

The dataset (available on Hugging Face as google/mobile-actions) consists of approximately 9,650 rows of data.18 It covers seven core tools that represent the fundamental interactions a user has with a smartphone:

  1. turn_on_flashlight / turn_off_flashlight
  2. create_contact
  3. send_email
  4. show_map
  5. open_wifi_settings
  6. create_calendar_event

Schema Analysis:

The JSON structure in the dataset reveals specific design choices. For example, the create_calendar_event tool requires a datetime parameter in YYYY-MM-DDTHH:MM:SS format.18

  • Implication: The model must understand time. It implies that for the model to work, the system prompt must inject the “Current Time.”
  • Parameter Complexity: The dataset includes null handling (e.g., last_name, first_name, body in emails can be null). The model is trained to explicitly handle optional parameters, a common failure mode in generic LLMs which often hallucinate values for optional fields.

6.2 Privacy and the “Local First” Mandate

The strategic value of this dataset and model combination is Privacy.

  • Data Sovereignty: Processing a request like “Add John to my contacts” involves PII (Personally Identifiable Information). Sending this to a cloud API (OpenAI/Anthropic) creates a privacy risk and a compliance burden (GDPR/CCPA).
  • Local Execution: By running Function Gemma 270M locally, the PII never leaves the device. The model parses the name “John” into the JSON payload within the phone’s secure enclave, and the OS executes it. The cloud is never aware the transaction occurred.

6.3 Hardware Acceleration: Samsung S25 Ultra Case Study

Google’s research explicitly cites performance testing on the Samsung S25 Ultra.13

  • Framework: LiteRT (formerly TensorFlow Lite).
  • Delegate: XNNPACK. This is a highly optimized library for floating-point neural network inference on ARM CPUs.
  • Configuration: 4 CPU threads were used.
  • Latency Profile: While specific millisecond numbers aren’t in the snippet, the mention of “interactive” use implies latency well under 1 second for the decode phase.
  • Memory: The 270M model fits easily within the S25’s RAM, even alongside the OS and other apps, likely occupying <400MB of RAM in a quantized state.

7. Operationalizing Function Gemma: The Integration Ecosystem

For developers, the raw model weights are useless without an integration framework. Function Gemma is supported by a robust ecosystem of tools.

7.1 LangChain Integration

LangChain provides the “glue” to connect Function Gemma to application logic.

  • Prompt Templates: Developers use PromptTemplate to construct the specific string formats required. Snippets show that specific variable names and structures (like {{variable_name}} with double curly braces) are standard.19
  • Vertex AI Integration: For those not running locally, LangChain interacts with Gemma via langchain-google-vertexai.20 This abstracts the API calls but allows the developer to utilize the same prompt structures designed for the open model.

7.2 LlamaIndex and RAG

LlamaIndex (formerly GPT Index) is crucial when the “tools” involve data retrieval.

  • FunctionAgent: LlamaIndex allows Function Gemma to be wrapped as a FunctionAgent.21 In this pattern, a “Retrieval Tool” (e.g., searching a PDF) is just another function.
  • Workflow:
  1. User asks: “Summarize the vacation policy.”
  2. Function Gemma calls: search_docs(query=”vacation policy”).
  3. LlamaIndex executes the vector search.
  4. Function Gemma summarizes the retrieved chunks.
  • State Management: LlamaIndex provides Context objects to maintain state across turns, compensating for the model’s stateless nature.21

7.3 Ollama and Local Development

Ollama has become the standard for local LLM inference on Linux/macOS.

  • Support: Function Gemma is available in the Ollama library.5
  • Command: ollama run functiongemma.
  • Modelfiles: Users can create custom Modelfile configurations to set the system prompt or temperature (default top_p 0.95, top_k 64).5
  • Limitations: Snippets note that Ollama support for MCP (Model Context Protocol)—a new standard for connecting AI models to data—is not yet native for Gemma in the same way it might be for Claude, requiring “front-end controllers” like OpenWebUI to bridge the gap.22

7.4 LangGraph for Multi-Agent Flows

LangGraph allows for the creation of cyclic, stateful graphs.23

  • ReAct Implementation: Function Gemma is ideal for a ReAct (Reasoning + Acting) loop implemented in LangGraph. The graph defines the loop: Model -> Tool Call -> Execute -> Observation -> Model.
  • State Persistence: LangGraph manages the State (conversation history), allowing the 270M model to focus solely on the immediate “next step” decision without needing to manage the entire application memory.23

8. The Fine-Tuning Cookbook: Customizing the Agent

While the pre-trained Function Gemma is capable, the “Mobile Actions” case study proves that fine-tuning is the path to production quality.

8.1 The Data Recipe

The “Cookbook” provided by Google outlines the process.24

  1. Format: Data must be converted into the specific “Function Gemma Chat Format.” This involves wrapping system instructions, tool definitions, and user turns in specific delimiters.
  2. Distillation: A key strategy is Model Distillation.9 Developers can use a large model (e.g., Gemini 1.5 Pro) to generate thousands of synthetic examples of users interacting with their specific tools. This synthetic data is then used to train Function Gemma.
  • Insight: This allows a developer to “clone” the intelligence of a frontier model into the 270M model for a specific set of tasks.
  1. Quantization Aware Training (QAT): To ensure the model performs well on mobile (TFLite), training often considers quantization effects, though post-training quantization is more common for simple deployments.

8.2 Overcoming “Catastrophic Forgetting”

A risk in fine-tuning is that the model learns the new tools but forgets how to speak English or handle basic logic. The use of LoRA (Low-Rank Adaptation) is recommended.25 LoRA freezes the main model weights and only trains a small adapter layers. This preserves the general capabilities of the 270M base model while injecting the specific schema knowledge into the adapter.

9. Conclusion: The Commoditization of Agency

Function Gemma 270M signals a shift in the economics of AI agency. By proving that function calling—the core mechanic of agentic behavior—can be decoupled from massive reasoning models and executed on-device, Google has opened the door to “Ubiquitous Agency.”

The future implied by this technology is one where Operating Systems (Android, ChromeOS, Windows) are no longer static collections of menus, but dynamic agents. In this future, Function Gemma acts as the Cerebellum of the OS—handling coordination, motor control (API calls), and immediate reflexes—while the massive cloud models act as the Cerebrum, handling deep thought and creativity.

For the developer, the message is clear: do not use a cannon to kill a mosquito. For defined, structural tasks, a fine-tuned, specialized Small Language Model like Function Gemma is faster, cheaper, more private, and—as the benchmarks show—often sufficiently accurate to get the job done.

 

Citation Reference Guide

    Works cited

    1. FunctionGemma: Bringing bespoke function calling to the edge – Google Blog, accessed December 19, 2025, https://blog.google/technology/developers/functiongemma/
    2. Fine-tune FunctionGemma 270M for Mobile Actions | Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemma/docs/mobile-actions
    3. Gemma: Open Models Based on Gemini Research and Technology – arXiv, accessed December 19, 2025, https://arxiv.org/html/2403.08295v1
    4. google/gemma-7b – Hugging Face, accessed December 19, 2025, https://huggingface.co/google/gemma-7b
    5. FunctionGemma is a specialized version of Google’s Gemma 3 270M model fine-tuned explicitly for function calling. – Ollama, accessed December 19, 2025, https://ollama.com/library/functiongemma
    6. Battle of the SLMs: Gemma vs LLama – Embedl, accessed December 19, 2025, https://www.embedl.com/knowledge/battle-of-the-slms-gemma-vs-llama
    7. google/functiongemma-270m-it – Hugging Face, accessed December 19, 2025, https://huggingface.co/google/functiongemma-270m-it
    8. functiongemma:270m – Ollama, accessed December 19, 2025, https://ollama.com/library/functiongemma:270m
    9. Fine-tuning with FunctionGemma – Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma
    10. FunctionGemma formatting and best practices – Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices
    11. The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models – ICML 2026, accessed December 19, 2025, https://icml.cc/virtual/2025/poster/46593
    12. Berkeley Function Calling Leaderboard – Gorilla LLM, accessed December 19, 2025, https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html
    13. FunctionGemma model card | Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemma/docs/functiongemma/model_card
    14. Gemma 3 1B vs Llama 3.2 11B Instruct, accessed December 19, 2025, https://llm-stats.com/models/compare/gemma-3-1b-it-vs-llama-3.2-11b-instruct
    15. meta-llama/Llama-3.2-1B – Hugging Face, accessed December 19, 2025, https://huggingface.co/meta-llama/Llama-3.2-1B
    16. Gemma 2 27B vs Llama 3.2 1B Instruct (Comparative Analysis) – Galaxy.ai Blog, accessed December 19, 2025, https://blog.galaxy.ai/compare/gemma-2-27b-it-vs-llama-3-2-1b-instruct
    17. Qwen2.5-Coder 32B Instruct vs Qwen3 30B A3B – LLM Stats, accessed December 19, 2025, https://llm-stats.com/models/compare/qwen-2.5-coder-32b-instruct-vs-qwen3-30b-a3b
    18. google/mobile-actions · Datasets at Hugging Face, accessed December 19, 2025, https://huggingface.co/datasets/google/mobile-actions
    19. Create a prompt – Docs by LangChain, accessed December 19, 2025, https://docs.langchain.com/langsmith/create-a-prompt
    20. Get started with Gemma and LangChain | Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemma/docs/integrations/langchain
    21. Function Calling Google Gemini Agent | LlamaIndex Python Documentation, accessed December 19, 2025, https://developers.llamaindex.ai/python/examples/agent/gemini_agent/
    22. Dynamic Multi-Function Calling Locally with Gemma 3 + Ollama – Full Demo Walkthrough – Reddit, accessed December 19, 2025, https://www.reddit.com/r/ollama/comments/1kadwr3/dynamic_multifunction_calling_locally_with_gemma/
    23. ReAct agent from scratch with Gemini 2.5 and LangGraph – Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemini-api/docs/langgraph-example
    24. [FunctionGemma]Finetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb – GitHub, accessed December 19, 2025, https://github.com/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb
    25. Introducing Gemma models in Keras – Google Developers Blog, accessed December 19, 2025, https://developers.googleblog.com/introducing-gemma-models-in-keras/
    26. Function calling with Hugging Face Transformers | Gemma | Google AI for Developers, accessed December 19, 2025, https://ai.google.dev/gemma/docs/functiongemma/function-calling-with-hf