API
Public
GHOST.GHOST
— ModuleGHOST
This is a module for collecting GitHub data about open source repositories and contributors.
GHOST.GitHubPersonalAccessToken
— TypeGitHubPersonalAccessToken(login::AbstractString,
token::AbstractString,
)::GitHubPersonalAccessToken
A GitHub Personal Access Token
Fields
login::String
token::String
client::Client
limits::Limits
GHOST.find_queries
— Methodfind_queries(spdx::AbstractString)
GHOST.find_repos
— Methodfind_repos(batch::AbstractDataFrame)::Nothing
Takes a batch of 10 spdx/createdat and puts the data in the database.
GHOST.graphql
— Functiongraphql(obj::GitHubPersonalAccessToken,
operationName::AbstractString,
vars::Dict{String};
max_retries::Integer = 3)
Return JSON of the GraphQL query.
GHOST.licenses
— Methodlicenses(conn::Connection,
pat::GitHubPersonalAccessToken,
schema::AbstractString = "gh_2007_$(Dates.year(floor(now(), Year) - Day(1)))",
)::Nothing
Uploads the licenses table to the database. It includes every OSI-approved license that is machine readable with Licensee.
GHOST.queries
— Methodqueries(conn::Connection,
spdx::AbstractString,
schema::AbstractString = "gh_2007_$(Dates.year(floor(now(), Year) - Day(1)))")
This will upload the queries to the database with:
- spdx::text NOT NULL
- created::tsrange NOT NULL
- count::smallint NOT NULL
- asof::time
- done::bool NOT NULL
GHOST.query_commits
— Methodquery_commits(branches::AbstractVector{<:AbstractString}, batch_size::Integer)::Nothing
GHOST.query_commits
— Methodquery_commits(branch::AbstractString)::Nothing
GHOST.setup
— Methodsetup(;host::AbstractString = get(ENV, "PGHOST", "localhost"),
port::AbstractString = get(ENV, "PGPORT", "5432"),
dbname::AbstractString = get(ENV, "PGDATABASE", "postgres"),
user::AbstractString = get(ENV, "PGUSER", "postgres"),
password::AbstractString = get(ENV, "PGPASSWORD", "postgres"),
schema::AbstractString = "gh_2007_$(year(floor(now(utc_tz), Year) - Day(1)))",
pats::Union{Nothing, Vector{GitHubPersonalAccessToken}} = nothing)
Sets up your PostgreSQL database for the project.
Example
julia> setup(pats = [GitHubPersonalAccessToken("MyGH_Login", ENV["GH_PAT"])])
julia> setup()
GHOST.setup_parallel
— Functionsetup_parallel(limit::Integer = 0; password::AbstractString = get(ENV, "PGPASSWORD", "postgres"))::Nothing
Setup workers.
Private
GHOST.GH_FIRST_REPO_TS
— ConstantGH_FIRST_REPO_TS::DateTime = 2007-10-29T14:37:16
Timestamp when the earliest public GitHub repository was created (id: "MDEwOlJlcG9zaXRvcnkx", nameWithOwner: "mojombo/grit")
GHOST.GITHUB_GRAPHQL_ENDPOINT
— ConstantGITHUB_GRAPHQL_ENDPOINT::String = "https://api.github.com/graphql"
GitHub API v4 GraphQL API endpoint.
GHOST.GITHUB_REST_ENDPOINT
— ConstantGITHUB_REST_ENDPOINT::String = "https://api.github.com"
GitHub API v3 RESTful root endpoint.
GHOST.Limits
— TypeLimits
GitHub API limits.
It includes how many remaining queries are available for the current time period and when it resets.
Fields
limit::UInt16
remaining::UInt16
reset::DateTime
GHOST.cleanintervals
— Methodcleanintervals(row)
Returns the input if the count is 1,000 records or fewer. If there are more than a 1,000 it splits them based on the ratio of the count.
GHOST.fill_missing_intervals
— Methodfill_missing_intervals(spdx::AbstractString, data::AbstractDataFrame)
GHOST.find_repo_count_for_intervals
— Methodfind_repo_count_for_intervals(spdx::AbstractString, created::AbstractVector{<:Interval{DateTime}})
GHOST.format_tsrange
— Methodformat_tsrange(obj::Interval{ZonedDateTime})
Return the Postgres compatible form.
GHOST.parse_author
— Methodparse_author(node)::NamedTuple
This parses the email, name, and ID of the author node.
GHOST.parse_commit
— Methodparse_commit(branch, node)::NamedTuple
This parses a commit node and adds the branch it queried.
GHOST.parse_repo
— Methodparse_repo(node, spdx::AbstractString)::NamedTuple
Parses a node and returns a suitable NamedTuple
for the table.
GHOST.prune
— Methodprune(data)
Prune the intervals based on the created and count values.
GHOST.query_intervals
— Methodquery_intervals(created::AbstractVector{<:AbstractVector{Interval{ZonedDateTime}}})::DataFrame
Returns a
GHOST.query_intervals
— Methodquery_intervals(spdx::AbstractString, created::AbstractVector{<:Interval{ZonedDateTime}})
Return count of search results based on the license for each created interval.