Skip to content

Instantly share code, notes, and snippets.

@ShawnRong
Created July 8, 2023 02:52
Show Gist options
  • Save ShawnRong/46628f8c092bb2eeba3eb5f6b0edb831 to your computer and use it in GitHub Desktop.
Save ShawnRong/46628f8c092bb2eeba3eb5f6b0edb831 to your computer and use it in GitHub Desktop.
Create milvus vector database with Docker
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { Milvus } from "langchain/vectorstores/milvus";
const docsFolder = "./law-docs"
export const dataIngestion = async () => {
try {
// Load docs from raw data directory
const directoryLoader = new DirectoryLoader(docsFolder, {
".pdf": (path) => new PDFLoader(path),
});
const rawDocs = await directoryLoader.load()
// use text splitters
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
})
const docs = await textSplitter.splitDocuments(rawDocs)
console.log(docs)
// Create vector store
const embeddings = new OpenAIEmbeddings({verbose: true})
await Milvus.fromDocuments(docs, embeddings, {
collectionName: "lawdocs",
})
} catch (error) {
console.log(error)
}
}
(
async () => {
await dataIngestion();
console.log("Data ingestion completed");
}
)()
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- milvus-etcd-storage:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- milvus-minio-storage:/minio_data
command: minio server /minio_data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.2.10
command: ["milvus", "run", "standalone"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- milvus-storage:/var/lib/milvus
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
attu:
container_name: milvus-attu
image: zilliz/attu:v2.2.3
environment:
MILVUS_URL: standalone:19530
ports:
- 8000:3000
depends_on:
- "standalone"
volumes:
milvus-etcd-storage:
milvus-minio-storage:
milvus-storage:
networks:
default:
name: milvus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment