What is Python SDK for TiDB AI: A unified data platform empowering developers to build next-generation AI applications.?
The TiDB Python AI SDK provides a unified data platform that empowers developers to build next-generation AI applications. It supports various search modes including vector, full-text, and hybrid searches, along with automatic embedding, multi-modal storage, and advanced filtering capabilities. The SDK also includes transaction support for data consistency.
Documentation
Introduction
Python SDK for TiDB AI: A unified data platform empowering developers to build next-generation AI applications.
🎭 Auto‑Embedding & Multi‑Modal Storage: Support for text, images, and more
🖼️ Image Search Support: Text‑to‑image and image‑to‑image retrieval capabilities
🎯 Advanced Filtering & Reranking: Flexible filters with optional reranker models to fine-tune result relevance
💱 Transaction Support: Full transaction management including commit/rollback to ensure consistency
Installation
[!NOTE]
This Python package is under rapid development and its API may change. It is recommended to use a fixed version when installing, e.g., pytidb==0.0.9.
pip install pytidb
# To use built-in embedding functions and rerankers:
pip install "pytidb[models]"
# To convert query results to pandas DataFrame:
pip install pandas
import os
from pytidb import TiDBClient
db = TiDBClient.connect(
host=os.getenv("TIDB_HOST"),
port=int(os.getenv("TIDB_PORT")),
username=os.getenv("TIDB_USERNAME"),
password=os.getenv("TIDB_PASSWORD"),
database=os.getenv("TIDB_DATABASE"),
ensure_db=True,
)
Highlights# 🤖 Automatic Embedding
PyTiDB automatically embeds text fields (e.g., text) and stores the vector embedding in a vector field (e.g., text_vec).
Create a table with an embedding function:
from pytidb.schema import TableModel, Field, FullTextField
from pytidb.embeddings import EmbeddingFunction
text_embed = EmbeddingFunction("openai/text-embedding-3-small")
class Chunk(TableModel):
__tablename__ = "chunks"
id: int = Field(primary_key=True)
text: str = FullTextField()
text_vec: list[float] = text_embed.VectorField(
source_field="text"
) # 👈 Defines the vector field.
user_id: int = Field()
table = db.create_table(schema=Chunk, if_exists="skip")
Bulk insert data:
table.bulk_insert([
Chunk(id=2, text="bar", user_id=2), # 👈 The text field is embedded and saved to text_vec automatically.
Chunk(id=3, text="baz", user_id=3),
Chunk(id=4, text="qux", user_id=4),
])
🔍 Search
Vector Search
Vector search finds the most relevant records based on semantic similarity, so you don't need to include all keywords explicitly in your query.
df = (
table.search("<query>") # 👈 The query is embedded automatically.
.filter({"user_id": 2})
.limit(2)
.to_list()
)\n\n# Output: A list of dicts.
Hybrid search combines exact matching from full-text search with semantic understanding from vector search, delivering more relevant and reliable results.
PyTiDB supports a variety of operators for flexible filtering:
Operator
Description
Example
$eq
Equal to
{"field": {"$eq": "hello"}}
$gt
Greater than
{"field": {"$gt": 1}}
$gte
Greater than or equal
{"field": {"$gte": 1}}
$lt
Less than
{"field": {"$lt": 1}}
$lte
Less than or equal
{"field": {"$lte": 1}}
$in
In array
{"field": {"$in": [1, 2, 3]}}
$nin
Not in array
{"field": {"$nin": [1, 2, 3]}}
$and
Logical AND
{"$and": [{"field1": 1}, {"field2": 2}]}
$or
Logical OR
{"$or": [{"field1": 1}, {"field2": 2}]}
⛓ Join Structured and Unstructured Data
from pytidb import Session
from pytidb.sql import select
# Create a table to store user data:
class User(TableModel):
__tablename__ = "users"
id: int = Field(primary_key=True)
name: str = Field(max_length=20)
with Session(engine) as session:
query = (
select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
)
chunks = session.exec(query).all()
[(c.id, c.text, c.user_id) for c in chunks]
💱 Transaction Support
PyTiDB supports transaction management, helping you avoid race conditions and ensure data consistency.
with db.session() as session:
initial_total_balance = db.query("SELECT SUM(balance) FROM players").scalar()
# Transfer 10 coins from player 1 to player 2
db.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
db.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")
session.commit()
# or session.rollback()
final_total_balance = db.query("SELECT SUM(balance) FROM players").scalar()
assert final_total_balance == initial_total_balance