The primary entrypoint to programs leveraging Graph-sitter is the Codebase class.

Local Codebases

Construct a Codebase by passing in a path to a local git repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git folder).

from graph_sitter import Codebase

# Parse from a git repository root
codebase = Codebase("path/to/repository")

# Parse from a subfolder within a git repository
codebase = Codebase("path/to/repository/src/subfolder")

# Parse from current directory (must be within a git repo)
codebase = Codebase("./")

# Specify programming language (instead of inferring from file extensions)
codebase = Codebase("./", language="typescript")

By default, Graph-sitter will automatically infer the programming language of the codebase and parse all files in the codebase. You can override this by passing the language parameter with a value from the ProgrammingLanguage enum.

The initial parse may take a few minutes for large codebases. This pre-computation enables constant-time operations afterward. Learn more here.

Remote Repositories

To fetch and parse a repository directly from GitHub, use the from_repo function.

from graph_sitter import Codebase
# Fetch and parse a repository (defaults to /tmp/codegen/{repo_name})
codebase = Codebase.from_repo('fastapi/fastapi')

# Customize temp directory, clone depth, specific commit, or programming language
codebase = Codebase.from_repo(
    'fastapi/fastapi',
    tmp_dir='/custom/temp/dir',  # Optional: custom temp directory
    commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901',  # Optional: specific commit
    shallow=False,  # Optional: full clone instead of shallow
    language="python"  # Optional: override language detection
)

Remote repositories are cloned to the /tmp/codegen/{repo_name} directory by default. The clone is shallow by default for better performance.

Configuration Options

You can customize the behavior of your Codebase instance by passing a CodebaseConfig object. This allows you to configure secrets (like API keys) and toggle specific features:

from graph_sitter import Codebase
from codegen.configs.models.codebase import CodebaseConfig
from codegen.configs.models.secrets import SecretsConfig

codebase = Codebase(
    "path/to/repository",
    config=CodebaseConfig(debug=True),
    secrets=SecretsConfig(openai_api_key="your-openai-key")   # For AI-powered features
)
  • CodebaseConfig and SecretsConfig allow you to configure
    • config: Toggle specific features like language engines, dependency management, and graph synchronization
    • secrets: API keys and other sensitive information needed by the codebase

For a complete list of available feature flags and configuration options, see the source code on GitHub.

Advanced Initialization

For more complex scenarios, Graph-sitter supports an advanced initialization mode using ProjectConfig. This allows for fine-grained control over:

  • Repository configuration
  • Base path and subdirectory filtering
  • Multiple project configurations

Here’s an example:

from graph_sitter import Codebase
from codegen.git.repo_operator.local_repo_operator import LocalRepoOperator
from codegen.git.schemas.repo_config import BaseRepoConfig
from codegen.sdk.codebase.config import ProjectConfig

codebase = Codebase(
    projects = [
        ProjectConfig(
            repo_operator=LocalRepoOperator(
                repo_path="/tmp/codegen-sdk",
                repo_config=BaseRepoConfig(),
                bot_commit=True
            ),
            language="typescript",
            base_path="src/codegen/sdk/typescript",
            subdirectories=["src/codegen/sdk/typescript"]
        )
    ]
)

For more details on advanced configuration options, see the source code on GitHub.

Supported Languages

Graph-sitter currently supports: