Using Git Submodules for Private Content
My website has been open source for as long as it has existed. Originally, it was a WordPress site, but only the layout was out there for everyone to see since the data was saved in a database. Once I moved to Gatsby, I kept all the images and posts in a content
directory. This was way better, as my content is all conveniently stored in one easy-to-save folder and the posts are all in beautiful markdown.
However, people often like my layout and want to use it, so they clone and deploy this site. Sometimes they will just leave up all the posts and images and update the name and image. Although I subscribe to the Zenhabits Uncopyright philosophy towards content - my content is out there for the world to see and do what they want with it, and it doesn't bother me - I don't think I should make it quite so easy to just clone everything I've written in a moment. If you're going to plagiarize, you should at least have to do a bit of work.
So I decided to store my content in a private git submodule. If you go to the repo for this site now, you'll see a folder that looks like content @ <hash>
. If you click on it, you'll be taken to a 404 page. If I click on it, I'll be taken to a separate, private repo that contains all my images and posts.
A lot of people have asked me how to use private git submodules, so I'll go over it here. Note that this is not a deep-dive into submodules, but just the basics of adding, updating, and cloning a repo with submodules.
Git Submodules
Git submodules allow you to keep a git repository as a subdirectory of another git repository.
This could be useful if you have a lot of projects within a project. One example of this is the Dracula code theme repo. Every folder is a git submodule. This allows people to add a new theme for a new program by creating their own repo, and the owner of the parent repository only needs to reference the child repos. You can tell they're all submodules because of the @ <hash>
after each subdirectory name.
Before doing anything with submodules, I would recommend running this command to update the config and set submodule.recurse to true
, which allows git clone
and git pull
to automatically update submodules.
git config --global submodule.recurse true
Command | Description |
---|---|
git submodule add <repo> |
Add a submodule within a repository |
git submodule update |
Update existing submodules within a repository (add --remote to pull from a remote location) |
git submodule init |
Initialize local submodules file (only necessary if repo not cloned with --recurse-submodules ) |
Adding a submodule
Let's imagine that you want a public blog, located on the blog
repo, to contain a submodule with all the posts, located in the posts
repo. So it will look like:
- A public repo at
github.com/you/blog
- A private repo at
github.com/you/posts
I'm just using GitHub as an example, it doesn't matter where the repo is hosted. Also, git submodules can also be used for both private and public repos.
First you can add
the submodule. From the root of blog
, you would run this command.
git submodule add https://github.com/you/posts
This would clone the posts
repo into a folder in blog
.
Cloning into '/Users/you/blog/posts'...
You will now have two new entries into the blog
repo, a .gitmodules
file, and the new posts
subdirectory.
git status
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .gitmodules
new file: posts
.gitmodules
will look like this:
[submodule "posts"]
path = posts
url = https://github.com/you/posts
At this point, you have a reference to the posts
repo as a submodule, so the directory structure will look like this:
.git
.gitmodules
posts/
As a note, if you
cd
into.git
, you'll see amodules
directory. This will contain a folder calledposts
, and this is where git is storing references and other data about your submodules.
Updating a submodule
To update submodule content, you'll pull in any changes made to the remote submodule repo with the update
command. Since you would be updating content from a remote location, you'll add the --remote
flag. From the root of the blog
repo, you would run the command:
git submodule update --remote
It's important to note that when working with submodules, you shouldn't work on or commit your local version of the submodule repo. If you made any changes locally, your version would now be out-of-sync with the submodule repo.
You just want to treat a submodule as an entirely separate repo, but linked. This is much like code found in
node_modules
for an node project, where the references to the projects are listed inpackage.json
and you know any local changes you make to a dependency innode_modules
will not be persisted.
Modified content
If you make changes locally and run a git status, you will see modified content next to the modified submodule.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
(commit or discard the untracked or modified content in submodules)
modified: posts (modified content)
New commits
If you make changes to the submodule and bring those commits in properly and run a git status, you will see new commits next to the modified submodule.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: posts (new commits)
If you see (new commits)
, you can commit those changes on the parent repo. You can also check this by viewing the diff.
git diff
diff --git a/posts b/posts
index abc..def 160000
--- a/posts
+++ b/posts
@@ -1 +1 @@
-Subproject commit abc...
+Subproject commit def...
Cloning a repo with submodules
If you clone an existing repository and it has submodules within it, you'll have to init
and update
to pull in all the submodule content.
git clone https://github.com/you/posts
cd posts && git submodule init && git submodule update
You can bypass this by either having the submodules.recurse
setting set, or by using the --recurse-submodules
flag.
git clone --recurse-submodules https://github.com/you/posts
This will clone the directory along with all submodule content.
Deployment
For my site, I've made the content
submodule private. If you have a Netlify site and want to know what to do to allow Netlify to pull from the private repo, here is an overview of the steps.
- Generate a deploy key from Netlify.
- Add the key as a read-only deploy key on the settings for your private repo (found at
github.com/you/repo/settings/keys
). - Netlify will now have permissions to fetch the submodules that it reads from your
.gitmodules
file.
Summary
Here are the main points from the article:
- Submodules are used when a subdirectory in a repo should consist of all the data from another repo.
- You can add a submodule to a project with
git add submodule <submodule-repo>
. - You can update submodules within a project with
git update submodule --remote
. - You should clone a project that has submodules with
--recurse-submodules
or setsubmodule.recurse
in your config to do this by default. - You should not work on any submodule files directly within the parent repo. The submodule directory should be treated only as a reference to another existing repo.
My current process for updating the site looks like:
- Make changes to
content
repo. - Commit changes to
content
and push to private submodule hosted on GitHub:git commit && git push
. - Pull new updates into local
taniarascia.com
repo:git submodule update --remote
. - Commit the new submodule changes and push to public GitHub repo:
git commit && git push
. - Netlify deploys the new site.
Comments