The primary entrypoint to programs leveraging Graph-sitter is the Codebase class.
Local Codebases
Construct a Codebase by passing in a path to a local git
repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git
folder).
from graph_sitter import Codebase
# Parse from a git repository root
codebase = Codebase("path/to/repository")
# Parse from a subfolder within a git repository
codebase = Codebase("path/to/repository/src/subfolder")
# Parse from current directory (must be within a git repo)
codebase = Codebase("./")
# Specify programming language (instead of inferring from file extensions)
codebase = Codebase("./", language="typescript")
By default, Graph-sitter will automatically infer the programming language of the codebase and
parse all files in the codebase. You can override this by passing the language
parameter
with a value from the ProgrammingLanguage
enum.
The initial parse may take a few minutes for large codebases. This
pre-computation enables constant-time operations afterward. Learn more
here.
Remote Repositories
To fetch and parse a repository directly from GitHub, use the from_repo
function.
from graph_sitter import Codebase
# Fetch and parse a repository (defaults to /tmp/codegen/{repo_name})
codebase = Codebase.from_repo('fastapi/fastapi')
# Customize temp directory, clone depth, specific commit, or programming language
codebase = Codebase.from_repo(
'fastapi/fastapi',
tmp_dir='/custom/temp/dir', # Optional: custom temp directory
commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901', # Optional: specific commit
shallow=False, # Optional: full clone instead of shallow
language="python" # Optional: override language detection
)
Remote repositories are cloned to the /tmp/codegen/{repo_name}
directory by
default. The clone is shallow by default for better performance.
Configuration Options
You can customize the behavior of your Codebase instance by passing a CodebaseConfig
object. This allows you to configure secrets (like API keys) and toggle specific features:
from graph_sitter import Codebase
from codegen.configs.models.codebase import CodebaseConfig
from codegen.configs.models.secrets import SecretsConfig
codebase = Codebase(
"path/to/repository",
config=CodebaseConfig(debug=True),
secrets=SecretsConfig(openai_api_key="your-openai-key") # For AI-powered features
)
CodebaseConfig
and SecretsConfig
allow you to configure
config
: Toggle specific features like language engines, dependency management, and graph synchronization
secrets
: API keys and other sensitive information needed by the codebase
For a complete list of available feature flags and configuration options, see the source code on GitHub.
Advanced Initialization
For more complex scenarios, Graph-sitter supports an advanced initialization mode using ProjectConfig
. This allows for fine-grained control over:
- Repository configuration
- Base path and subdirectory filtering
- Multiple project configurations
Here’s an example:
from graph_sitter import Codebase
from codegen.git.repo_operator.local_repo_operator import LocalRepoOperator
from codegen.git.schemas.repo_config import BaseRepoConfig
from codegen.sdk.codebase.config import ProjectConfig
codebase = Codebase(
projects = [
ProjectConfig(
repo_operator=LocalRepoOperator(
repo_path="/tmp/codegen-sdk",
repo_config=BaseRepoConfig(),
bot_commit=True
),
language="typescript",
base_path="src/codegen/sdk/typescript",
subdirectories=["src/codegen/sdk/typescript"]
)
]
)
For more details on advanced configuration options, see the source code on GitHub.
Supported Languages
Graph-sitter currently supports: