Contributing to SystemML
There are many ways to become involved with SystemML:
Mailing Lists
Development Mailing List
Perhaps the easiest way to obtain help and contribute to SystemML is to join the SystemML Development mailing list (dev@systemml.apache.org). You can subscribe to this list by sending an email to dev-subscribe@systemml.apache.org. You can unsubscribe from this list by sending an email to dev-unsubscribe@systemml.apache.org. The dev mailing list archive can be found here.
Issues Mailing List
The issues mailing list archive can be found here. To subscribe to the issues list, send an email to issues-subscribe@systemml.apache.org. To unsubscribe from the issues list, send an email to issues-unsubscribe@systemml.apache.org.
Commits Mailing List
The commits mailing list archive can be found here. To subscribe to the issues list, send an email to commits-subscribe@systemml.apache.org. To unsubscribe from the issues list, send an email to commits-unsubscribe@systemml.apache.org.
Issue Tracker
Have you found a bug in SystemML? Have you thought of a way to improve SystemML? Are you interested in working on SystemML itself? If so, the SystemML JIRA Issue Tracker is the place to go.
SystemML on GitHub
Have you found an issue on the SystemML JIRA Issue Tracker that you are interested in working on? If so, add a comment to the issue asking to be assigned the issue. If you don’t hear back in a timely fashion, please contact us on the dev mailing list and we will be happy to help you.
Once you have an issue to work on, how do you go about doing your work? The first thing you need is a GitHub account. Once you have a GitHub account, go to the Apache SystemML GitHub site at https://github.com/apache/systemml and click the Fork button to fork a personal remote copy of the SystemML repository to your GitHub account.
The next step is to clone your SystemML fork to your local machine.
$ git clone https://github.com/YOUR_GITHUB_NAME/systemml.git
Following this, it’s a good idea to set your git user name and email address. In addition, you may want
to set the push.default
property to simple
. You only need to execute these commands once.
$ git config --global user.name "Your Name"
$ git config --global user.email "yourname@yourhost.com"
$ git config --global push.default simple
Next, reference the main SystemML repository as a remote repository. By convention, you can
call this upstream
. You only need to add the remote upstream
repository once.
$ git remote add upstream https://github.com/apache/systemml.git
After this, you should have an origin
repository, which references your personal forked SystemML
repository on GitHub, and the upstream
repository, which references the main SystemML repository
on GitHub.
$ git remote -v
origin https://github.com/YOUR_GITHUB_NAME/systemml.git (fetch)
origin https://github.com/YOUR_GITHUB_NAME/systemml.git (push)
upstream https://github.com/apache/systemml.git (fetch)
upstream https://github.com/apache/systemml.git (push)
The main code branch by convention is the master
branch. You can check out the master
branch
using the checkout
command:
git checkout master
To update this branch with the latest official code, you can pull
from the upstream
master
branch. A pull
essentially does a fetch
(retrieves code) and a merge
(merges latest remote
changes into your local branch):
git pull upstream master
It’s recommended that you create a new, separate branch for your work based on the current master
branch. Give this branch a descriptive name. For example, if you were assigned the issue SYSTEMML-101
,
you could use the checkout -b
command to create a new branch based on the master
branch and
check out this branch:
git checkout -b SYSTEMML-101-my_cool_new_feature
At this point, you are ready to do your work on this branch.
If you updates involve code, you should run the complete test suite to verify that your
updates have not had unexpected side-effects in the project. You can do this via
the Maven verify
command:
mvn clean verify
Your commit messages should follow standard git formatting conventions. If your commit is in regards to a particular JIRA issue, please include a reference to the JIRA issue, such as in the following:
git commit -m "[SYSTEMML-101] My cool new feature"
When ready, push your changes on this branch to your remote GitHub fork:
$ git push
fatal: The current branch SYSTEMML-101-my_cool_new_feature has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin SYSTEMML-101-my_cool_new_feature
$ git push --set-upstream origin SYSTEMML-101-my_cool_new_feature
At this stage, you can go to your GitHub web page and file a Pull Request for the work that you did on this branch. A Pull Request is a request for project committers (who have write access to Apache SystemML) to review your code and integrate your code into the project. Typically, you will see a green button to allow you to file a Pull Request.
Once your Pull Request is opened at SystemML Pull Requests, typically Jenkins will automatically build the project to see if all tests pass when run for your particular branch. These automatic builds can be seen here.
A conversation typically will proceed with regards to your Pull Request. Project committers and potentially others will give you useful feedback and potentially request that some changes be made to your code. In response, you can make the requested updates or explain why you feel that they make sense as they are. If you make additional updates, you can commit the changes and then push the changes to your remote branch. These updates will automatically appear in the pull request.
When your changes are accepted (a committer will write “Looks good to me”, “LGTM”, or something similar), a committer will attempt to incorporate your changes into the SystemML project. Typically this is done by squashing all of your commits into a single commit and then rebasing your changes into the master branch. Rebasing gives a linear commit history to the project.
If the merge in complicated, it is possible that a committer may ask you to resolve any merge conflicts in your pull request. If any difficulties are experienced, a project committer will be more than happy to assist in the integration of your work into the project.
After the Pull Request is closed, a comment can be added to the original JIRA issue referencing the Pull Request, and the issue can be resolved and closed.
Documentation
Documentation is one useful way to become involved with SystemML. SystemML online documentation is generated from markdown using Jekyll. For more information, please see GitHub’s Using Jekyll as a static site generator with GitHub Pages.
After installing Jekyll locally (see Setting up your GitHub Pages site locally with Jekyll),
Jekyll can be run from the docs
folder via:
bundle exec jekyll serve
This allows you to work on the documentation locally at http://127.0.0.1:4000.
You can allow others to preview your documentation updates on GitHub by pushing the
docs
subtree of your branch to your remote gh-pages
branch.
git subtree push --prefix docs origin gh-pages
For instance, if you have filed a Pull Request for a documentation update on a regular branch,
you could additionally push the docs
subtree to the remote gh-pages
branch. In the Pull Request
conversation, you could include a link to the documentation that was automatically generated
when you pushed to gh-pages
. The URL is http://<YOUR_NAME>.github.io/systemml/.
If you experience issues pushing the docs
subtree to the gh-pages
branch because you’ve
previously pushed from a different branch, one simple solution is to delete the remote gh-pages
branch and perform the subtree
command again.
git push origin --delete gh-pages
Code Format
Java Code Format
Java in SystemML should be formatted using a standard format. The “SystemML Format” at
dev/code-style/systemml-style-eclipse.xml
can be imported into Eclipse and
dev/code-style/systemml-style-intellij.xml
can be imported into IntelliJ for this purpose.
No trailing whitespace is preferred. IDEs such as Eclipse and IntelliJ can be configured for this option.
DML Code Format
DML in SystemML should be formatted according to a standard format. Indentation in DML files should be two spaces.