API

Public

GHOST.GHOSTModule
GHOST

This is a module for collecting GitHub data about open source repositories and contributors.

source
GHOST.GitHubPersonalAccessTokenType
GitHubPersonalAccessToken(login::AbstractString,
                          token::AbstractString,
                          )::GitHubPersonalAccessToken

A GitHub Personal Access Token

Fields

  • login::String
  • token::String
  • client::Client
  • limits::Limits
source
GHOST.find_reposMethod
find_repos(batch::AbstractDataFrame)::Nothing

Takes a batch of 10 spdx/createdat and puts the data in the database.

source
GHOST.graphqlFunction
graphql(obj::GitHubPersonalAccessToken,
        operationName::AbstractString,
        vars::Dict{String};
        max_retries::Integer = 3)

Return JSON of the GraphQL query.

source
GHOST.licensesMethod
licenses(conn::Connection,
         pat::GitHubPersonalAccessToken,
         schema::AbstractString = "gh_2007_$(Dates.year(floor(now(), Year) - Day(1)))",
         )::Nothing

Uploads the licenses table to the database. It includes every OSI-approved license that is machine readable with Licensee.

source
GHOST.queriesMethod
queries(conn::Connection,
        spdx::AbstractString,
        schema::AbstractString = "gh_2007_$(Dates.year(floor(now(), Year) - Day(1)))")

This will upload the queries to the database with:

  • spdx::text NOT NULL
  • created::tsrange NOT NULL
  • count::smallint NOT NULL
  • asof::time
  • done::bool NOT NULL
source
GHOST.setupMethod
setup(;host::AbstractString = get(ENV, "PGHOST", "localhost"),
       port::AbstractString = get(ENV, "PGPORT", "5432"),
       dbname::AbstractString = get(ENV, "PGDATABASE", "postgres"),
       user::AbstractString = get(ENV, "PGUSER", "postgres"),
       password::AbstractString = get(ENV, "PGPASSWORD", "postgres"),
       schema::AbstractString = "gh_2007_$(year(floor(now(utc_tz), Year) - Day(1)))",
       pats::Union{Nothing, Vector{GitHubPersonalAccessToken}} = nothing)

Sets up your PostgreSQL database for the project.

Example

julia> setup(pats = [GitHubPersonalAccessToken("MyGH_Login", ENV["GH_PAT"])])
julia> setup()
source
GHOST.setup_parallelFunction
setup_parallel(limit::Integer = 0; password::AbstractString = get(ENV, "PGPASSWORD", "postgres"))::Nothing

Setup workers.

source

Private

GHOST.GH_FIRST_REPO_TSConstant
GH_FIRST_REPO_TS::DateTime = 2007-10-29T14:37:16

Timestamp when the earliest public GitHub repository was created (id: "MDEwOlJlcG9zaXRvcnkx", nameWithOwner: "mojombo/grit")

source
GHOST.LimitsType
Limits

GitHub API limits.

It includes how many remaining queries are available for the current time period and when it resets.

Fields

  • limit::UInt16
  • remaining::UInt16
  • reset::DateTime
source
GHOST.cleanintervalsMethod
cleanintervals(row)

Returns the input if the count is 1,000 records or fewer. If there are more than a 1,000 it splits them based on the ratio of the count.

source
GHOST.parse_authorMethod
parse_author(node)::NamedTuple

This parses the email, name, and ID of the author node.

source
GHOST.parse_commitMethod
parse_commit(branch, node)::NamedTuple

This parses a commit node and adds the branch it queried.

source
GHOST.parse_repoMethod
parse_repo(node, spdx::AbstractString)::NamedTuple

Parses a node and returns a suitable NamedTuple for the table.

source
GHOST.pruneMethod
prune(data)

Prune the intervals based on the created and count values.

source
GHOST.query_intervalsMethod
query_intervals(created::AbstractVector{<:AbstractVector{Interval{ZonedDateTime}}})::DataFrame

Returns a

source
GHOST.query_intervalsMethod
query_intervals(spdx::AbstractString, created::AbstractVector{<:Interval{ZonedDateTime}})

Return count of search results based on the license for each created interval.

source