An open, inspectable AI data assistant for working with Agent Skills, semantic knowledge graphs and structured domain data.

Most production-grade data assistants today are not open source and expose only a narrow text box on top of a proprietary stack. This creates three concrete problems:
Opaque, closed implementations
Commercial text-to-SQL and AI assistants are usually proprietary. Their prompts, tools, and safeguards are not transparent, which makes it difficult to understand how queries are produced or why they might fail in a given infrastructure.
Text-to-SQL without explicit semantics
LLMs can generate SQL, but real domains rely on rich semantics such as business concepts, evolving schemas, complex joins, and domain rules. In many systems this knowledge remains implicit in dashboards, code, or internal knowledge, forcing the model to infer it from few examples, which is fragile and hard to govern.
Hidden knowledge engineering and invisible semantic layer
Much of the knowledge engineering work, including defining entities, relationships, and constraints, happens behind the scenes. Domain experts rarely interact with the semantic layer itself and only see final outputs like charts or answers. This makes debugging, improving, and aligning the assistant with the real domain more difficult.
Alfred addresses these issues by providing an open, inspectable reference implementation:
It uses natural language understanding, multi source data querying, and reasoning tools to help users explore and analyze structured domain data transparently. While Alfred currently connects to Neo4j, Databricks, and Azure OpenAI, it remains backend agnostic and can integrate other databases, knowledge graphs, or AI engines without changing the core interaction patterns.
Alfred also includes a semantic knowledge store / graph explorer for navigating the domain model and relationships:

Alfred helps teams adopt data assistants by making domain knowledge explicit in a semantic graph. Users can:
alfred-app/mnt/skills directory.lib/db.tscd alfred-app
npm install
A .env files are necessary for the creation of the neo4j graph from databricks. When the alfred-app is running, those credentials can be added in the settings page.
# Azure AZURE_OPENAI_API_KEY=your_api_key AZURE_OPENAI_BASE_URL=https://.../openai/ AZURE_API_VERSION=yyyy-MM-dd AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-3-large AZURE_OPENAI_DEPLOYMENT=gpt-5.1 # Databricks DATABRICKS_HOST=your_workspace_url DATABRICKS_HTTP_PATH=your_http_path DATABRICKS_TOKEN=your_personal_access_token DATABRICKS_CATALOG=your_databricks_catalog DATABRICKS_SCHEMA=your_databricks_schema # Neo4j NEO4J_BOLT_URL=bolt://...:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password ## Encryption of Alfred user settings (API keys, tokens, passwords) Alfred stores per-user configuration (chat models, Databricks, Neo4j, etc.) in a local SQLite database under `alfred-app/data/alfred.sqlite`. To avoid storing secrets like API keys, tokens, and passwords im Klartext, Alfred supports transparent encryption at rest. - Sensitive settings are stored in the `user_settings` table and can be encrypted with **AES-256-GCM**. - Encryption is controlled via a single environment variable in your `alfred-app/.env.local`: ```env ALFRED_ENCRYPTION_KEY=...
Requirements and behavior:
openssl rand -hex 32).user_settings transparently.## Getting started from scratch If you want to try Alfred and you do not have a concrete Databricks dataset and a generated knowledge graph, you can start with the [Databricks Free Edition](https://www.databricks.com/learn/free-edition). From databricks you get your credentials for your ```.env```: ``` bash DATABRICKS_HOST=....databricks.com DATABRICKS_TOKEN=your_personal_access_token DATABRICKS_WAREHOUSE_ID=your_warehouse_id DATABRICKS_CATALOG=your_databricks_catalog DATABRICKS_SCHEMA=your_databricks_schema
Add then to your .env the credentials for neo4j knowledge graph (the neo4j will be build in the next step):
NEO4J_BOLT_URL=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=password
Next, we build and run both alfred and neo4j with docker
cd alfred-app docker compose build . # Start the service docker compose up
If you run into package or native module issues (like better-sqlite3), rebuild without cache:
docker compose build --no-cache
Access the applications
Afterwards, there are a couple of helper notebooks in the scripts/ folder to integrate databricks data in your Free Edition and to build the knowledge graph.
scripts/create_databricks_schema.ipynb
scripts/create_graph_from_databricks.ipynb
In a typical flow you would:
create_databricks_schema.ipynb in Databricks and run it to set up the sample schema and data.create_graph_from_databricks.ipynb on your local machine and run it to materialize the knowledge graph in Neo4j.These notebooks are intentionally simple and are meant as a starting point you can fork and adapt to your own schemas, business concepts, and graph modeling conventions.
The core domain concepts used in the example knowledge graph live in scripts/data/concepts.yaml (with a big acknowledgement to Kenneth Leungh for the original concept definitions). If you want to bring your own domain, you can start by tweaking this file – adding, renaming, or removing concepts – and then re-running the graph creation notebook to see how your changes show up in Neo4j and in Alfred's UI.
cd alfred-app
npm run dev
Alfred will be available at http://localhost:3000.
cd alfred-app
npm run build
npm start
app/ - Next.js application and API routescomponents/ - React components for UI and assistant-ui interfacemnt/skill - Agent Skills for Alfredlib/tools/ - Alfred's tools: View files, Data query tools and utilities for Databricks, SQL, and Neo4jlib/prompts/ - System prompt(s) for the applicationAlfred exposes its main data access paths as tools under lib/tools/. These tools are wired into the assistant runtime via the Vercel AI SDK and Assistant UI so the model can call them directly.
lib/tools/tool_sql_db_query.tsDATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN (connection details)DATABRICKS_CATALOG, DATABRICKS_SCHEMA (default catalog/schema used for qualifying tables)The tool:
DATABRICKS_CATALOG and DATABRICKS_SCHEMA.executeDatabricksSQL in lib/tools/utils_tools.ts.To adapt it to your environment:
.env.local to point to your workspace, token, and catalog/schema.executeDatabricksSQL accordingly.tool_sql_db_query.ts or remove it and fully qualify tables in the prompt and/or tool description.lib/tools/tool_neo4j_query.tsNEO4J_URI, NEO4J_USER, NEO4J_PASSWORDThe tool:
neo4j-driver package.tool_neo4j_query that runs arbitrary Cypher and returns records as plain JavaScript objects.To adapt it:
NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD to your database.getSession(database) to pick the appropriate Neo4j database, or expose multiple tools with different defaults.tool_sql_db_query.ts.Chat threads and messages are persisted through a single server-side abstraction in lib/db.ts. To swap the default SQLite database for your own (e.g. Postgres, MySQL, or a cloud database):
lib/db.ts (ThreadRecord, MessageRecord, getThreads, createThread, updateThread, deleteThread, getMessages, appendMessage, deleteMessagesByThreadId).better-sqlite3 setup and SQL statements and reimplement these functions using your preferred database client (e.g. Prisma, Drizzle, pg, Sequelize) and schema.lib/db.ts is only imported from server-side code (API routes under app/api/threads), and configure your own connection options via environment variables as needed.No changes are required in the Assistant UI integration (components/alfred/runtime-provider.tsx); once lib/db.ts is wired to your database, chat history will automatically use your personal backend.
The default schema treats all threads as belonging to a single logical user, which is sufficient for local development. For real multi-user deployments:
threads table (and optionally messages) and backfill existing rows with a stable local id (e.g. "local-dev").lib/db.ts to accept a userId argument and scope all queries by that id (e.g. getThreads(userId), createThread(userId, ...)).userId from auth in your API routes (e.g. from a session/JWT) in production, or use a fixed "local-dev" value during development.We encourage researchers and practitioners to extend Alfred with their own innovations. Examples include:
We welcome pull requests, suggestions, and discussions about how Alfred can better serve your research or practice needs.
Alfred grew out of ongoing research on AI-based data assistants. The code in this repository is a personal evening side project. Alfred builds on the work of the open-source community, including Next.js, React, Vercel AI SDK, Neo4j, Databricks, Radix UI, and others.