Google's 270M Model Does Function Calling on Your Phone. Developers Are Split.


Google just dropped a 270 million parameter model that runs on half a gigabyte of RAM. It does function calling. On your phone. Without internet.
The reaction on Reddit was split down the middle. One person said "not worth downloading". Another called it a breakthrough for local AI agents. Both used the same model. Both were right in their own way.
FunctionGemma came out three days ago. The timing matters because it answers something people have been asking about since small models got good. Can a tiny model reliably call functions? Can it act, not just chat?
Turns out it can. But there's a catch. There's always a catch.
You need to fine-tune it. Out of the box, it hits 58% accuracy on mobile actions. After fine-tuning on specific tasks, it jumps to 85%. That gap is everything. It's the difference between a toy and a tool.
Why this matters now
The AI industry spent years building bigger models. GPT-4. Claude. Gemini. All massive. All expensive to run.
But agentic AI doesn't need that much power for most tasks. It needs speed. It needs reliability. It needs to run locally without burning through API credits or waiting for network calls.
A 270M model that does function calling changes the economics.
Small models were already good at following instructions. Gemma 3 proved that months ago. What they couldn't do was reliably parse function schemas and generate valid JSON consistently. That's what made them useless for agents.
Google fixed this by training specifically for function calling. They didn't make the model smarter. They made it specialized. And specialization is what small models do best.
What developers are actually saying
The Reddit threads are messy. Someone on r/ollama tried FunctionGemma and hated it. Called it an "awful experience." No details, just frustration.
But another developer got it working with dynamic multi-function calling. Real-time search. Translation. Weather updates. All running locally through Ollama with Gemma 3 1B. They posted a full demo.
The difference? Fine-tuning and prompt engineering.
i spent an hour reading through these threads. Here's what i learned. Most people expect small models to work like big ones. They don't. Small models need hand-holding. They need examples. They need their prompts written like you're explaining something to a very literal friend who follows instructions perfectly but has zero creativity.
One person on r/LocalLLaMA figured out how to give Gemma 3 "native" tool calling by writing a custom template. It worked most of the time. Sometimes it called the wrong function. The 4B model struggled. The 12B model was much better.
That's the pattern. Bigger small models beat smaller small models at reliability.
How it actually works
FunctionGemma doesn't use special tokens for tool calling like OpenAI's models. It generates everything as text. You pass function definitions in your prompt. The model outputs JSON in its response.
This sounds worse than it is. Yes, you lose clean separation between chat and function calls. Yes, parsing gets annoying. But it works.
And it works offline. That's the whole point.
Google built it to be a traffic controller. Simple commands run on device. Complex ones get routed to bigger models like Gemma 3 27B. You get instant response for common tasks. You get smarter response for hard tasks.
The mobile actions demo is wild. Voice commands like "create a calendar event for lunch tomorrow" or "turn on the flashlight" just work. The model parses natural language, figures out which OS function to call, and executes it. On your phone. No server.
There's also a game called TinyGarden where you give voice commands to plant crops. "Plant sunflowers in the top row and water them." The model breaks that into multiple function calls with grid coordinates. Multi-turn logic from a 270M model.
The naming problem
Nobody talks about this enough but naming your AI models after gemstones is getting out of hand. We have Gemma. Gemini. There's probably a Geode somewhere.
i get it. Branding matters. But when you're debugging at 2am and trying to remember if the function calling model is Gemma or Gemini or GemmaFunction or FunctionGemma, you start wishing they'd just called it "model-270m-tools" or something boring and searchable.
My coworker keeps calling it "Gemini mini" by accident. It's not Gemini. It's Gemma. Different model family. Different use case. Same first three letters.
This is how we end up with twenty open tabs and a migraine.
Real talk
Most people don't need this. If you're building a chatbot, use a chat model. If you need function calling and you're okay with API costs, use GPT-4 or Claude. They're better. They're more reliable. They handle edge cases you won't even think about until they break your app.
FunctionGemma makes sense in very specific situations. You're building for mobile. You need offline capability. You have a defined set of functions. You're willing to fine-tune.
If any of those don't apply, this is overkill. Or underkill. Depends how you look at it.
The fine-tuning requirement is the real barrier. That 58% to 85% jump isn't free. You need data. You need compute. You need time to iterate. For prototypes and side projects, that's too much friction.
But for production apps where latency and privacy matter? This is the only option that makes sense. You can't send every user command to a cloud API. Too slow. Too expensive. Too many privacy concerns.
Small models won't replace big ones. They'll handle the boring stuff so big models don't have to.
Where this goes next
i keep thinking about that Reddit post with the custom template. Some developer spent their evening figuring out how to make Gemma 3 do tool calling without official support. They succeeded. They shared it.
That's how this tech actually moves forward. Not from Google's blog posts. From people breaking things and fixing them and posting solutions in random Reddit threads at midnight.
The shift from conversational AI to agentic AI isn't about making models smarter. It's about making them more useful in constrained environments. Your phone. Your laptop. Your toaster, probably, in five years.
FunctionGemma is small enough to fit in places big models can't go. And sometimes that's all that matters.
Enjoyed this article? Check out more posts.
View All Posts