Parsing Codebases
The primary entrypoint to programs leveraging Graph-sitter is the Codebase class.
Local Codebases
Construct a Codebase by passing in a path to a local git
repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git
folder).
By default, Graph-sitter will automatically infer the programming language of the codebase and
parse all files in the codebase. You can override this by passing the language
parameter
with a value from the ProgrammingLanguage
enum.
The initial parse may take a few minutes for large codebases. This pre-computation enables constant-time operations afterward. Learn more here.
Remote Repositories
To fetch and parse a repository directly from GitHub, use the from_repo
function.
Remote repositories are cloned to the /tmp/codegen/{repo_name}
directory by
default. The clone is shallow by default for better performance.
Configuration Options
You can customize the behavior of your Codebase instance by passing a CodebaseConfig
object. This allows you to configure secrets (like API keys) and toggle specific features:
CodebaseConfig
andSecretsConfig
allow you to configureconfig
: Toggle specific features like language engines, dependency management, and graph synchronizationsecrets
: API keys and other sensitive information needed by the codebase
For a complete list of available feature flags and configuration options, see the source code on GitHub.
Advanced Initialization
For more complex scenarios, Graph-sitter supports an advanced initialization mode using ProjectConfig
. This allows for fine-grained control over:
- Repository configuration
- Base path and subdirectory filtering
- Multiple project configurations
Here’s an example:
For more details on advanced configuration options, see the source code on GitHub.
Supported Languages
Graph-sitter currently supports: