The purpose of this post is to describe my first attempt at a Golang programing. I decided it would be neat to use Golang to query the Github API for list Magento repositories and the location associated with the owner of the repository. If you search Github for: “magento language:php” at the time of this writing you get around 3000+ repositories. See for yourself.
All Github repositories are owned by a Github user or organization. A Github user or organization can choose to displaying their geographic location in their Github profile. Since I work daily with Magento I thought it would be neat to get a list of geographic locations associated with the 3000+ Magento repos. Please note: I’m almost certain there is a better way of doing this, but this is my first time writing a Golang program so who cares?
Let’s get started why dont we
@golang gopher writing some files pic.twitter.com/O0MZMnzPWr
— Tegan Snyder (@tegansnyder) August 28, 2014
If you don’t already have Golang I suggest you download it by following the instructions here: http://golang.org/doc/install
There is one caveat. If you are on OSX you will need to make sure to add the following lines to your ~/.bash_profile file:
vi ~/.bash_profile
# add line below to end of it
export PATH=$PATH:/usr/local/go/bin
Then save and reload your bash PATHS by issuing:
source ~/.bash_profile
Before we get started lets create a nice place for all of your Go projects. Create a directory in your home directory. For example I have a “Dev” folder in my “/users/tegan/Dev/”” path. I just created another folder called “golang” in that folder to hold my Go projects.
When you have the folder created you next need to setup your “GOPATH”.
vi ~/.bash_profile
# add these lines to end
export GOPATH=$HOME/Dev/golang
export PATH=$GOPATH/bin:$PATH
Now lets get started by creating a “main.go” file in: /users/tegan/Dev/golang/
File: main.go
package main
import (
"fmt"
"github.com/google/go-github/github"
)
func main() {
client := github.NewClient(nil)
fmt.Println("Repos that contain magento and PHP code.")
query := fmt.Sprintf("magento+language:php")
opts := &github.SearchOptions{
Sort: "stars",
ListOptions: github.ListOptions{
PerPage: 100,
},
}
repos, _, err := client.Search.Repositories(query, opts)
if err != nil {
fmt.Printf("error: %v\n\n", err)
} else {
fmt.Printf("%v\n\n", github.Stringify(repos))
}
rate, _, err := client.RateLimit()
if err != nil {
fmt.Printf("Error fetching rate limit: %#v\n\n", err)
} else {
fmt.Printf("API Rate Limit: %#v\n\n", rate)
}
}
Now run this file:
go run main.go
If it works you should get a list of all Github repos that contain PHP code and have the word “magento” somewhere in the code. The format will be json output. Note from the import statement (“go-github/github”) we are including a library that Google wrote to make dealing with Github repositories in go really simple.
Lets say we want to spice it up a little bit and get a little fancier. I’m taking bits and pieces from some other Golang examples I found browsing Github and I put together this:
package main
import (
"fmt"
"github.com/google/go-github/github"
"log"
"math"
"time"
)
const (
REMAINING_THRESHOLD = 1
)
func main() {
client := github.NewClient(nil)
fmt.Println("Repos that contain magento and PHP code.")
page := 1
maxPage := math.MaxInt32
query := fmt.Sprintf("magento+language:php")
opts := &github.SearchOptions{
Sort: "stars",
ListOptions: github.ListOptions{
PerPage: 100,
},
}
for page <= maxPage {
opts.Page = page
result, response, err := client.Search.Repositories(query, opts)
Wait(response)
if err != nil {
log.Fatal("FindRepos:", err)
}
maxPage = response.LastPage
msg := fmt.Sprintf("page: %v/%v, size: %v, total: %v",
page, maxPage, len(result.Repositories), *result.Total)
log.Println(msg)
for _, repo := range result.Repositories {
fmt.Println("repo: ", *repo.FullName)
fmt.Println("owner: ", *repo.Owner.Login)
time.Sleep(time.Millisecond * 500)
}
page++
}
}
func Wait(response *github.Response) {
if response != nil && response.Remaining <= REMAINING_THRESHOLD {
gap := time.Duration(response.Reset.Local().Unix() - time.Now().Unix())
sleep := gap * time.Second
if sleep < 0 {
sleep = -sleep
}
time.Sleep(sleep)
}
}
Now we have a list of all the Github repositories on Github that are Magento related we can do some interesting stuff. Lets say we want to get a list of all the Magento repository owners and group them by their geographic location to get a comprehensive list of Magento repositories on Github geographically. Here is a way to do that.
Lets start by adding pulling in the Github user locations:
for _, repo := range result.Repositories {
repo_name := *repo.FullName
username := *repo.Owner.Login
fmt.Println("repo: ", repo_name)
fmt.Println("owner: ", username)
user, response, err := client.Users.Get(username)
Wait(response)
if err != nil {
fmt.Println(err)
} else {
if user.Location != nil {
fmt.Println("location: ", *user.Location)
} else {
fmt.Println("location: ", user.Location)
}
}
time.Sleep(time.Millisecond * 500)
}
page++
That works great but you run into Github API rate limit issues. To get around that you can create an oAuth app at in your application settings page. Note you can always test your rate limit at at anytime by visiting: https://api.github.com/rate_limit?client_id=CLIENT_ID_HERE&client_secret=CLIENT_SECRET_HERE
Here is my example with oAuth authentication. Note I’ve also put in a file writer so we can write everything to “/tmp/locations.txt”.
package main
import (
"fmt"
"github.com/google/go-github/github"
"io"
"log"
"math"
"os"
"time"
)
const (
REMAINING_THRESHOLD = 1
)
func main() {
t := &github.UnauthenticatedRateLimitedTransport{
ClientID: "YOUR_CLIENT_ID_GOES_HERE",
ClientSecret: "YOUR_CLIENT_SECRET_GOES_HERE",
}
client := github.NewClient(t.Client())
fmt.Println("Repos that contain magento and PHP code.")
page := 1
maxPage := math.MaxInt32
query := fmt.Sprintf("magento+language:php")
opts := &github.SearchOptions{
Sort: "stars",
ListOptions: github.ListOptions{
PerPage: 100,
},
}
filename := "/tmp/repo_locations.csv"
f, err := os.Create(filename)
if err != nil {
fmt.Println(err)
}
for page <= maxPage {
opts.Page = page
result, response, err := client.Search.Repositories(query, opts)
Wait(response)
if err != nil {
log.Fatal("FindRepos:", err)
}
maxPage = response.LastPage
msg := fmt.Sprintf("page: %v/%v, size: %v, total: %v",
page, maxPage, len(result.Repositories), *result.Total)
log.Println(msg)
for _, repo := range result.Repositories {
repo_name := *repo.FullName
username := *repo.Owner.Login
fmt.Println("repo: ", repo_name)
fmt.Println("owner: ", username)
user, response, err := client.Users.Get(username)
Wait(response)
if err != nil {
fmt.Println(err)
} else {
if user.Location != nil {
user_location := *user.Location
fmt.Println("location: ", user_location)
n, err := io.WriteString(f, "\""+username+"\",\""+user_location+"\",\""+repo_name+"\"\n")
if err != nil {
fmt.Println(n, err)
}
}
}
time.Sleep(time.Millisecond * 500)
}
page++
}
f.Close()
}
func Wait(response *github.Response) {
if response != nil && response.Remaining <= REMAINING_THRESHOLD {
gap := time.Duration(response.Reset.Local().Unix() - time.Now().Unix())
sleep := gap * time.Second
if sleep < 0 {
sleep = -sleep
}
time.Sleep(sleep)
}
}
If you ran the above program you would find it quitting after producing 1000 records. This is because Github imposes a limit on the results returned by a search API call. The Search API returns only the top 1000 results. You could get around that restriction by slicing your search API query into multiple calls based on the time that the repositories were created.
Here is the final version that gets around the 1000 limit by splitting the query into batches on the created_at times of the repositories:
package main
import (
"fmt"
"github.com/google/go-github/github"
"io"
"log"
"math"
"os"
"time"
)
const (
REMAINING_THRESHOLD = 1
)
func main() {
t := &github.UnauthenticatedRateLimitedTransport{
ClientID: "YOUR_CLIENT_ID_GOES_HERE",
ClientSecret: "YOUR_CLIENT_SECRET_GOES_HERE",
}
client := github.NewClient(t.Client())
fmt.Println("Repos that contain magento and PHP code.")
// create a file to be used for geocoder
filename := "/tmp/locations.txt"
f, err := os.Create(filename)
if err != nil {
fmt.Println(err)
}
// slice the queries into batches to get around the API limit of 1000
queries := []string{"\"2008-06-01 .. 2012-09-01\"", "\"2008-06-01 .. 2012-09-01\"", "\"2012-09-02 .. 2013-04-20\"", "\"2013-04-21 .. 2013-10-20\"", "\"2013-10-21 .. 2014-03-10\"", "\"2014-03-10 .. 2014-07-10\"", "\"2014-07-10 .. 2014-09-30\""}
for _, q := range queries {
query := fmt.Sprintf("magento language:PHP created:" + q)
page := 1
maxPage := math.MaxInt32
opts := &github.SearchOptions{
Sort: "updated",
Order: "desc",
ListOptions: github.ListOptions{
PerPage: 100,
},
}
for page <= maxPage {
opts.Page = page
result, response, err := client.Search.Repositories(query, opts)
Wait(response)
if err != nil {
log.Fatal("FindRepos:", err)
}
maxPage = response.LastPage
msg := fmt.Sprintf("page: %v/%v, size: %v, total: %v",
page, maxPage, len(result.Repositories), *result.Total)
log.Println(msg)
for _, repo := range result.Repositories {
repo_name := *repo.FullName
username := *repo.Owner.Login
created_at := repo.CreatedAt.String()
fmt.Println("repo: ", repo_name)
fmt.Println("owner: ", username)
fmt.Println("created_at: ", created_at)
user, response, err := client.Users.Get(username)
Wait(response)
if err != nil {
fmt.Println(err)
} else {
if user.Location != nil {
user_location := *user.Location
n, err := io.WriteString(f, "\""+username+"\",\""+user_location+"\",\""+repo_name+"\",\""+created_at+"\"\n")
if err != nil {
fmt.Println(n, err)
}
} else {
user_location := "not found"
n, err := io.WriteString(f, "\""+username+"\",\""+user_location+"\",\""+repo_name+"\",\""+created_at+"\"\n")
if err != nil {
fmt.Println(n, err)
}
}
}
time.Sleep(time.Millisecond * 500)
}
page++
}
}
f.Close()
}
func Wait(response *github.Response) {
if response != nil && response.Remaining <= REMAINING_THRESHOLD {
gap := time.Duration(response.Reset.Local().Unix() - time.Now().Unix())
sleep := gap * time.Second
if sleep < 0 {
sleep = -sleep
}
time.Sleep(sleep)
}
}
Now that we have a nice list of repositories formatted like this:
“username,location,reponame,created_at”
Here is a full list of the what the file looks like:
Locations.txt Gist
Map Geocoding the Results with Node.js
Wouldn’t it be nice if we put all the Magento repositories on a nice world map so we can plot the Github contributions to Magento around the world? Out of the 3650 repos we found 1193 didn’t have locations listed so we can use the remaining 2457 and see if we can plot them on a map.
For geocoding the results into a nice map I used a Node.js geocoder from Javier Arce found here: javierarce/node-batch-geocoder.
I will spare you the data messaging that I had to due to get the data in the correct format for Tilebox/Mapbox. Here is the map you’ve been waiting for:
Full source code available on Github here:
https://github.com/tegansnyder/Golang-Magento-Github-Repo-Search
Subtle Golang differences
Since this is my first Golang program I thought I would share some of syntax and convention differences. This is by no means an exhaustive list, but here are a few that I found:
- Use ” double quotes not ‘ single quotes in a string. Go doesn’t like single quotes
- Use the plus + operator to append strings together not full stops.
- No semicolons
- Doesn’t care about tabbing
- No brackets around if statements
- Must use curly brackets – on if statements, for etc
- Every variable must be used
- functions can return multiple variables
- its nil not null