This post discusses financial planning and startup economics, not tax or legal advice.

If you’re building an AI company in San Francisco right now, your burn rate looks nothing like the SaaS playbook your investors grew up on. The benchmarks are wrong, the cost structure is inverted, and the runway math that worked for your last company will get you killed at this one.

We work with AI startups from pre-seed through Series B, and the pattern is consistent: founders who model their burn like a traditional SaaS company run out of money three to six months earlier than they expected. The reason is almost always compute.

The Cost Structure Is Inverted

A traditional SaaS startup spends most of its money on people. Engineering salaries, sales team, maybe some marketing. Hosting costs are a rounding error, a few thousand a month on AWS, scaling linearly and predictably with customers.

AI startups flip that equation. Your biggest variable cost isn’t headcount. It’s compute. GPU hours for training, inference costs for serving users, and the vector databases, embedding pipelines, and API calls that connect everything. According to data from Kruze Consulting, AI startups spend on average twice what a traditional SaaS company spends on infrastructure, and that gap is widening. Over the past year, compute costs at AI startups grew at a roughly 300% annualized rate, compared to 53% at non-AI SaaS companies. For many AI companies, compute has gone from 24% of revenue to 50% of revenue in a single year.

That changes everything about how you should model your business.

GPU Costs: The Numbers You Actually Need

If you’re budgeting for GPU compute in 2026, here’s what the market looks like. An NVIDIA H100 - the workhorse card for most serious AI workloads, runs $2.00 to $4.15 per hour on cloud providers, depending on whether you’re using a hyperscaler like AWS (closer to $3.90/hour) or a specialized GPU cloud like Lambda, Vast.ai, or CoreWeave (closer to $2.00–$2.50/hour). A100s, the previous generation, run $1.29 to $2.29 per hour.

Those hourly numbers sound small until you do the monthly math. A single H100 running 24/7 costs roughly $1,500 to $3,000 per month. A modest training cluster of eight H100s runs $12,000 to $24,000 per month. And that’s before inference - the cost of actually serving your product to users.

Here’s the part that catches founders off guard: training is the cost you can see coming. Inference is the cost that scales with success. By the time your product has real traction, inference will be 80 to 90 percent of your total compute spend. Every user query, every API call, every feature that touches your model costs money in a way that a traditional SaaS feature simply does not.

The Gross Margin Problem Investors Are Watching

Traditional SaaS investors expect gross margins north of 75%. That’s the baseline for being considered a “software company” rather than a services business. AI companies are not hitting that number, and showing up to a Series B with 55% gross margins triggers uncomfortable questions about your unit economics.

The current reality: AI startup gross margins are averaging around 52%, up from 41% in 2024 according to ICONIQ’s data. Bessemer’s research breaks AI companies into two categories. “Supernovas” - early-stage companies with unoptimized infrastructure and experimental pricing run at roughly 25% margins. “Shooting Stars” - companies that have invested in custom models and refined their pricing hit closer to 60%.

The structural issue is that inference costs eat roughly 23% of revenue at scaling-stage AI B2B companies. For every million dollars in AI product revenue, approximately $230,000 goes to inference before you pay a single engineer or sales rep. And 84% of AI companies report that AI infrastructure costs have eroded their gross margins by more than six percentage points.

Investors are adapting. Some are now applying gross-margin-adjusted versions of the Rule of 40 for AI companies, recognizing that 30% growth with 50% gross margins is not the same as 30% growth with 80% gross margins. If your margins are structurally lower, you need proportionally higher growth to compensate.

Five Formulas Every AI Founder Should Know

Most of the financial planning tools and templates out there were built for SaaS. They assume your marginal cost of serving a new customer is close to zero. Yours isn’t. Here are the formulas that actually matter when compute is your biggest cost center.

1. True monthly burn rate. Stop using a single number. AI burn has three layers:

Monthly Burn = Fixed Costs + Semi-Variable Costs + (Cost Per Query × Monthly Query Volume)

Fixed costs are salaries, rent, software subscriptions, the stuff that doesn’t change whether you have ten users or ten thousand. Semi-variable costs are your baseline infrastructure: dev environments, storage, minimum compute reservations. The third term is the one that kills you: fully variable inference costs that scale directly with usage. A traditional SaaS company is roughly 80% fixed costs. An AI startup can easily be 40 to 50% variable. That means your burn rate changes every month, and a flat projection is a fiction.

2. Cost per query (CPQ). This is the single most important unit economic in your business:

CPQ = Total Monthly Inference Spend / Total Monthly Queries Served

Track this weekly. If CPQ is rising, you’re shipping more capable features without optimizing the underlying model, or your token consumption per task is increasing. If CPQ is falling, your engineering team is doing something right. Either way, you need to see this number moving before your investors ask about it. For context, early-stage AI startups we work with typically see CPQ anywhere from $0.002 for lightweight completions to $0.15+ for complex multi-step agent workflows.

3. AI gross margin. The standard SaaS gross margin formula understates your cost of delivery. Use this instead:

AI Gross Margin = (Revenue − Inference Costs − API Fees − GPU/Cloud Hosting − ML Ops Tooling) / Revenue

Most founders only subtract their AWS bill. But inference API fees (if you’re calling OpenAI, Anthropic, or similar), vector database costs, embedding pipeline compute, and ML operations tooling like Weights & Biases or MLflow are all cost of revenue. If you’re reporting 65% gross margins but only counting your cloud hosting line, your real margins are probably closer to 45%. Getting this right starts with a chart of accounts that puts inference where it belongs, which we cover in accounting for AI startups.

4. Runway with variable compute. The standard runway formula (cash divided by monthly burn) doesn’t work when your burn rate changes with growth. Use scenario-based runway instead:

Runway (months) = Cash Balance / (Fixed Burn + (Variable Cost Per User × Projected Users at Month N))

Run this at three growth scenarios: current user count held flat, 2x users in six months, and 5x users in twelve months. If your runway at the 5x scenario is less than six months shorter than your flat scenario, your variable costs are well-controlled. If it’s twelve months shorter, you have a structural problem that more fundraising won’t solve.

For a quick baseline before you model the variable-compute scenarios, run your numbers through our free runway calculator, then map the lumpy months (a big GPU reservation, a tax payment) in the 13-week cash flow forecast.

5. The API-to-self-hosted breakeven. If you’re using a third-party inference API, there’s a crossover point where hosting your own fine-tuned model saves money:

Breakeven Monthly Spend ≈ (Self-Hosted GPU Cost + Engineering Time to Fine-Tune and Maintain) / 0.5

The 0.5 reflects the typical 50 to 70% cost reduction from switching to a self-hosted fine-tuned model. In practice, this breakeven tends to land around $50,000 to $100,000 per month in API spend. Below that, the API is cheaper because you’re not paying an engineer to manage model infrastructure. Above that, you’re leaving significant margin on the table.

Seven Things to Do This Month

Formulas are only useful if you’re capturing the right data. Here’s the operational checklist: the stuff that takes a day to set up and saves you months of painful reclassification later.

Tag every cloud resource by function from day one. Create three cost allocation tags in your cloud provider: training, inference, and dev/experimentation. Don’t wait until your Series A data room request to figure out which percentage of your AWS bill is cost of revenue versus R&D. Reclassifying twelve months of blended cloud bills is miserable and expensive. Tagging from the start takes an afternoon.

Set up weekly compute cost alerts. Configure alerts at 80% of your budgeted monthly inference spend. If you’re consistently hitting the alert by week three, your usage is outpacing your model, and you need to either optimize or reprice before the month ends. Every major cloud provider and most GPU platforms support this natively.

Run a monthly cost-per-customer-cohort analysis. Your power users may be underwater. Pull your inference costs by customer or customer segment and divide by the revenue each segment generates. If your top 10% of users by usage are generating 15% of revenue but consuming 50% of your inference budget, you don’t have a growth problem: you have a pricing problem.

Negotiate reserved GPU instances after three months of stable baseline usage. On-demand GPU pricing is a convenience tax. Once you have three or more months of predictable baseline compute, reserved instances or committed-use contracts can save 30 to 50% versus on-demand rates. That’s a direct gross margin improvement with zero product changes.

Price in tiers or usage-based from launch. Flat-rate SaaS pricing with variable COGS is a margin death trap. A 2025 industry report found 92% of AI software companies now use mixed pricing models (subscriptions combined with usage-based fees) precisely because flat pricing doesn’t work when your heaviest users cost you 10x what your lightest users cost. If you’re charging everyone $99/month and some users are costing you $3 and others $45, you don’t have a pricing model: you have a subsidy program. Usage-based pricing fixes the margin problem but changes how you recognize revenue; we walk through that in usage-based billing and revenue recognition for AI companies.

Track your compute-to-revenue ratio monthly. This is the simplest leading indicator of whether your business is getting healthier or sicker: Total Compute Costs / Total Revenue. If this ratio is increasing month over month, your margins are compressing regardless of what your top-line growth looks like. At many AI startups, this ratio went from 0.24 to 0.50 in a single year. Plot it. Watch it. Make it go down.

Build your financial model with a GPU price scenario. GPU prices have been falling: H100 on-demand dropped from roughly $7.50/hour in late 2024 to $3.44/hour on average in early 2026. But frontier model inference is getting more expensive as models grow, and token consumption per task has increased 10x to 100x since late 2023. Model three scenarios: prices continue falling 10% per quarter, prices hold flat, and per-unit inference cost rises because you’re shipping more capable features. If your business only works in the optimistic scenario, that’s worth knowing now.

What This Means for Your Next Fundraise

VCs writing checks into AI companies in 2026 are looking at different metrics than they were two years ago. They want to see that you understand your unit economics, that your gross margins have a credible path to 60%+ even if they’re at 40% today, and that your financial model accounts for the variable cost dynamics that make AI companies fundamentally different from SaaS.

A few specific things to have ready: your cost per query and its trend over the last six months, your AI gross margin calculated with all inference and API costs included, a compute cost model showing what happens to your margins at 5x your current customer base, and evidence that your pricing captures enough value to cover the actual cost of delivery.

Founders who walk into a Series A with “we’ll figure out the margins later” are losing to founders who walk in with a spreadsheet showing cost per inference, margin improvement from model optimization, and a pricing structure that scales.

If you want to talk through how this applies to your company, book a call.

Anelya Grant is the founder of AG Accounting (AG Grant, Inc.), an accounting firm serving tech startups and healthcare organizations. She is also co-founder of JustPaid.ai, an AI-powered billing and contract-to-cash platform for growing companies.

Your AI Startup Burns Differently Than SaaS. Here's the Math.