Description:
The BigQuery vector store backend (bigquery_vector.py) constructs a SQL DELETE statement using Python f-string interpolation with user-supplied input, without any parameterization or sanitization. An unauthenticated attacker (under the default NoAuth configuration) can send a crafted id value via POST /api/v0/remove_training_data to inject arbitrary SQL, resulting in mass deletion of all training data or other unauthorized database operations against the BigQuery dataset.
This finding is based on static code audit of Vanna v2.0.2 (latest, commit 365d061).
Vulnerability type: Classic SQL Injection via string interpolation (CWE-89).
The remove_training_data method in BigQuery_VectorStore directly interpolates the id parameter into a raw SQL DELETE statement using an f-string. The id originates from the HTTP request body and passes through the Flask API layer without any validation, sanitization, or parameterization.
# src/vanna/legacy/google/bigquery_vector.py, line 273-282
def remove_training_data(self, id: str, **kwargs) -> bool:
query = f"DELETE FROM `{self.table_id}` WHERE id = '{id}'" # SQL INJECTION
try:
self.conn.query(query).result()
return True
except Exception as e:
print(f"Failed to remove training data: {e}")
return FalseThe id parameter is wrapped in single quotes inside the f-string. An attacker sending ' OR '1'='1 as the id value would produce:
DELETE FROM `project.dataset.training_data` WHERE id = '' OR '1'='1'The OR '1'='1' condition is always true, so this deletes every row in the training data table.
# src/vanna/legacy/flask/__init__.py, line 783-815
@self.flask_app.route("/api/v0/remove_training_data", methods=["POST"])
@self.requires_auth
def remove_training_data(user: any):
id = flask.request.json.get("id") # user-controlled, no validation
if id is None:
return jsonify({"type": "error", "error": "No id provided"})
if vn.remove_training_data(id=id): # passes directly to vulnerable sink
return jsonify({"success": True})The id value is extracted from the JSON request body with zero validation — no type check, no format check (e.g., UUID regex), no escaping.
# src/vanna/legacy/flask/auth.py, line 36-41
class NoAuth(AuthInterface):
def get_user(self, flask_request) -> any:
return {}
def is_logged_in(self, user: any) -> bool:
return True # every request is "authenticated"# src/vanna/legacy/flask/__init__.py, line 145-149
class VannaFlaskAPI:
def __init__(self, vn: VannaBase, ..., auth: AuthInterface = NoAuth(), ...):The default authentication backend unconditionally returns True, meaning any network-reachable attacker can hit the vulnerable endpoint without credentials.
The following traces the user-controlled input from HTTP entry point to SQL execution, proving reachability through 5 hops:
[Source] HTTP POST /api/v0/remove_training_data Body: {"id": "<PAYLOAD>"}
│
▼ Hop 1: Flask route registration
flask/__init__.py:783 @self.flask_app.route("/api/v0/remove_training_data", methods=["POST"])
│
▼ Hop 2: User input extracted from request body — no sanitization
flask/__init__.py:805 id = flask.request.json.get("id")
│ # Only check: "if id is None" (line 807) — no format validation
│
▼ Hop 3: Passed directly to the VannaBase polymorphic method
flask/__init__.py:810 vn.remove_training_data(id=id)
│
│ The `vn` variable is the VannaBase instance passed to VannaFlaskAPI.__init__() at line 147.
│ It is stored as self.vn (line 176) and captured in the route closure.
│ When vn is a BigQuery_VectorStore instance (or subclass), Python MRO resolves to:
│
▼ Hop 4: BigQuery_VectorStore.remove_training_data — f-string SQL construction
bigquery_vector.py:274 query = f"DELETE FROM `{self.table_id}` WHERE id = '{id}'"
│
│ The {id} is interpolated directly into the SQL string with no escaping.
│ A payload like: ' OR '1'='1
│ Produces: DELETE FROM `...` WHERE id = '' OR '1'='1'
│
▼ Hop 5: Injected SQL executed against BigQuery
bigquery_vector.py:277 self.conn.query(query).result()
│
[Sink] BigQuery executes the injected SQL → all rows deleted
VannaBase.remove_training_data is an abstract method:
# src/vanna/legacy/base/base.py, line 500-516
class VannaBase(ABC):
@abstractmethod
def remove_training_data(self, id: str, **kwargs) -> bool:
passBigQuery_VectorStore inherits from VannaBase and provides the concrete implementation:
# src/vanna/legacy/google/bigquery_vector.py, line 13
class BigQuery_VectorStore(VannaBase):The standard Vanna deployment pattern uses multiple inheritance:
class MyVanna(BigQuery_VectorStore, OpenAI_Chat):
...
vn = MyVanna(config={...})
app = VannaFlaskApp(vn) # vn.remove_training_data → BigQuery_VectorStore.remove_training_data
app.run()Since no intermediate class overrides remove_training_data, Python's MRO guarantees vn.remove_training_data() resolves to BigQuery_VectorStore.remove_training_data — the vulnerable implementation.
The bundled Svelte frontend (flask/assets.py) calls this endpoint when a user clicks delete on the Training Data page:
// Deobfuscated from flask/assets.py
function Un(E) {
gt.set(null);
Pe("remove_training_data", "POST", {id: E}).then(e => {
Pe("get_training_data", "GET", []).then(pt)
})
}This confirms POST /api/v0/remove_training_data is a designed, production code path — not a theoretical or dead endpoint.
| Backend | File | remove_training_data pattern |
Vulnerable? |
|---|---|---|---|
| BigQuery | bigquery_vector.py:274 |
f"DELETE ... WHERE id = '{id}'" |
YES — f-string |
| pgvector | pgvector.py:227 |
execute(delete_statement, {"id": id}) |
No — parameterized |
| Oracle | oracle_vector.py:309 |
cursor.execute(..., [id]) |
No — parameterized |
| ChromaDB | chromadb_vector.py:167 |
collection.delete(ids=[id]) |
No — API-based |
| OpenSearch | opensearch_vector.py:340 |
client.delete(index=..., id=id) |
No — API-based |
All other backends use parameterized queries or safe APIs. Only BigQuery uses raw string interpolation.
This is a SQL Injection vulnerability. An attacker with network access to the Vanna Flask API can:
-
Delete all training data — a single request wipes the entire training dataset, causing denial of service for the AI assistant. The organization loses all curated SQL examples, DDL schemas, and documentation that were used to train the model.
-
Delete selective data — using payloads like
' OR training_data_type='sql, an attacker can surgically remove specific categories of training data to degrade the AI's output quality without triggering obvious alarms. -
Potential data exfiltration — depending on BigQuery's error handling and query capabilities, error-based or blind SQL injection techniques could be used to extract data from the training dataset or other tables in the same BigQuery dataset.
-
Cross-table impact — since the BigQuery client connection (
self.conn) has access to the entire dataset, sub-queries could potentially read or modify other tables beyondtraining_data.
All Vanna deployments using the BigQuery vector store backend with the default NoAuth configuration are affected.
- Ecosystem: pip
- Package name: vanna
- Affected versions: <= 2.0.2 (all versions containing
BigQuery_VectorStore) - Patched versions:
- Severity: High
- Vector string: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:H/A:H
- CWE: CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')
| Permalink | Description |
|---|---|
| https://github.com/vanna-ai/vanna/blob/365d0617c1a4567ffee1b19b40c27feb4206bfcf/src/vanna/legacy/google/bigquery_vector.py#L273-L282 | The vulnerable remove_training_data method — constructs SQL DELETE via f-string interpolation with unsanitized user input id. |
| https://github.com/vanna-ai/vanna/blob/365d0617c1a4567ffee1b19b40c27feb4206bfcf/src/vanna/legacy/flask/init.py#L805 | id = flask.request.json.get("id") — user-controlled input extracted from HTTP request body with no validation. |
| https://github.com/vanna-ai/vanna/blob/365d0617c1a4567ffee1b19b40c27feb4206bfcf/src/vanna/legacy/flask/init.py#L810 | vn.remove_training_data(id=id) — tainted id passed directly to the polymorphic method without sanitization. |
| https://github.com/vanna-ai/vanna/blob/365d0617c1a4567ffee1b19b40c27feb4206bfcf/src/vanna/legacy/flask/auth.py#L36-L41 | NoAuth class — default auth backend where is_logged_in() unconditionally returns True, allowing unauthenticated access. |
| https://github.com/vanna-ai/vanna/blob/365d0617c1a4567ffee1b19b40c27feb4206bfcf/src/vanna/legacy/flask/init.py#L147 | VannaFlaskAPI.__init__ accepts vn: VannaBase — BigQuery_VectorStore is a concrete subclass, its remove_training_data is resolved via Python MRO. |
| https://github.com/vanna-ai/vanna/blob/365d0617c1a4567ffee1b19b40c27feb4206bfcf/src/vanna/legacy/base/base.py#L500-L516 | VannaBase.remove_training_data — declared as @abstractmethod, confirming subclass implementations (like BigQuery) are the actual execution targets. |