COSC 419: Learning Analytics
A3: Mining GitHub Statistics [30 pts]
Due date: Feb 16, 2020, 11:59pm
In this assignment, you will grab GitHub data to generate a graph that shows
all the collaborations among a specific group of users. Specific user logins
will be provided to you.
What to submit:
Submit the following on Connect:
- A PDF report documenting:
- A unique list of all committers identified in your code
- Nicely formatted output of a list of repo owners and their
collaborators
- An image file containing the graph generated from your adjacency matrix
- If your code doesn't fully work: document which part of the assignment
you were able to finish and which part(s) did not work. In your explanation,
be sure to reference precisely the code files that you got working.
- All the code you wrote: be sure it's well documented so we know which
file or which part of the file does what
Specific Instructions
Use any programming language you want. (My solution uses Ruby and GraphViz.)
- Get comfortable with basic cURL commands
(browse through the resources below and try them out)
- Connect to GitHub via its API to get user and repo information - try to
get info from your own GitHub account first, then try a friend of yours
Use the list of users here for your final solution: user names
- Understand the API output in JSON format
- Indirectly get collaborator information via commits
For whatever reason, even if you can grab someone's repo info, you cannot grab
that repo's list of collaborators directly if you are not the onwer. But: you
can grab the repo's list of commits, and from there, see who made the commit
and learn that person is a collaborator.
- Parse API output to obtain a condensed list of users, list of repos per
user, and list of collaborators per repo
For the above list of users, the condensed form I have is this: listcollabs.txt. Note that this might change if
the users have changes in their repos recently.
- Create an adjacency matrix of collaboration based on parsed info. My
adjacency matrix looks like: this.
- Visualize the collaboration matrix. I converted my matrix to a .dot
format and then visualized it as a graph. My outputs look like this:
graphcollabs.dot file and the actual
graph.
Grading Criteria
- [2 pts] A README file explaining which files to run, in what order, and
what to expect from each file execution (see my example: a3ex-README.txt)
- [4 pts] Program source code to automatically execute cURL commands and
interact with GitHub
Note that if you used authentication and embedded your user name and password
into the files, make sure you replace your password with a dummy variable
like "PASSWORD" before you submit it to us! We will test it with our own
username and password
- [4 pts] List of committers extracted for each repo
- [10 pts] Code and output for formatted list of users, repos, and collaborators
- [8 pts] Conversion to an adjacency matrix and graph format
- [2 pts] Graph visualization
For example, I wrote the following code in Ruby:
- a script that gets all the repo data from the list of users
- a script that cleans the repo data from the previous command
- a script that parses the cleaned repo data, reads in JSON data, grabs
collaborator info for each repo
- a script that cleans the collaborator data from the previous command
- a script that parses the cleaned collaborator data, reads in JSON data,
grabs each committer login for each repo, dumps the output into a nicely
formatted file
- a script that takes the nicely formatted collaborator file and creates an
adjacency matrix and corresponding graph .dot format
- cut/paste .dot file content into www.webgraphviz.com to visualize the collaboration graph
My scripts are between 20-100 lines each. The sample users I used were
"bohuie" and "mbojey". These two users collaborated on some repos in the past
(but not all of them). After running all the scripts, a resulting list of
collaborations I got from all the JSON files is shown here a3ex-listcollabs.txt. To visualize the info in a
graph, I converted it to a3ex-graphcollabs.dot which looks like:
Resources: