In this post, I urge Ruby gem maintainers to be aware of the files they distribute with their gems. If your gem includes files which you have not consciously included for a specific reason, please consider removing these files from your gem.
This call to action extends beyond Ruby gems, and even extends beyond the Ruby language. If you are packaging software of any kind, whether in source code or binary form, please ask yourself,
- What files are included in my software package?
- What is the reason that
<file X>
is in my software package?
It is easy to fall into the trap of blacklisting files from a package. This, in my opinion, is the wrong approach to packaging software.
I don't want test files in my package, so I will manually exclude them.
Software packages should instead have a whitelist of components, not the other way around.
I do want these specific files in my package, so I will add them to the whitelist.
This model applies to software in any software ecosystem. Provide what you need. No more, no less.
What’s in a Ruby gem?
Ruby gems can live in many places on your system. Figuring out where your gems are located is an exercise left to the reader. For me, my gems are in $GEM_HOME
; with packaged gem files in $GEM_HOME/cache
and unpacked gem directories in $GEM_HOME/gems
.
$ echo $GEM_HOME
/home/tom/.gem/ruby/2.6.0
# Show cached .gem files
$ ls $GEM_HOME/cache
actioncable-5.2.3.gem
actionmailer-5.2.3.gem
actionpack-5.2.3.gem
...
# Show unpacked gem directories
$ ls $GEM_HOME/gems
actioncable-5.2.3
actionmailer-5.2.3
actionpack-5.2.3
...
Let’s take a look inside one of the unpacked gems to see what files it has.
$ cd $GEM_HOME/gems
$ tree rails-5.2.3/
rails-5.2.3/
└── README.md
0 directories, 1 file
Yep, you read that right. The rails
gem includes literally one file: the README.md
. Nothing else. Wow, what a lightweight web framework! 😬
…Of course, we know that the purpose of the rails
gem is to be a “dependency package” of sorts, merely pointing to the other important gems it uses to get work done (e.g. activerecord
, railties
).
Here’s a more illustrative example; one where the gem actually contains some useful and interesting files.
$ tree parallel-1.17.0/
parallel-1.17.0/
├── lib
│ ├── parallel
│ │ ├── processor_count.rb
│ │ └── version.rb
│ └── parallel.rb
└── MIT-LICENSE.txt
2 directories, 4 files
We can see that the parallel
gem includes code and a license. That’s it.
Both of the above gems are good examples of how to package things appropriately. These gem maintainers chose to include exactly what was required for their gem, and nothing more.
What should be in a Ruby gem?
Warning! Opinion incoming!
Please consider each of these things carefully for your gem. Some gems should include more files than others. Some gems don't include any code at all, as we saw before with the rails
gem.
Code
The gem implementation in
lib/
. This is where work gets done. I don’t think this requires much of an explanation - your implementation should obviously be included with your gem.README
Including a README with your gem is acceptable, because it helps inform developers of your gem’s purpose. You might even consider including multiple README files in different languages.
LICENSE
I am not a lawyer, but it is probably a good idea to distribute the full text of your software license along with the source code.
Example code (optional)
If you’re feeling generous, offline users will appreciate a few short examples of some common ways to use your gem.
What should not be in a gem?
Gemspec
You don’t need to package your Gem’s specification file (
.gemspec
). The Gemspec is automatically included with your gem archive via themetadata.gz
file. Have a look inside one of your packaged.gem
files to see for yourself. Hint: a.gem
is really just a.tar
archive with a different extension.Test / spec files
People who acquire your gem via
gem install
orbundle install
are generally not going to figure out your testing framework and run your tests. Many years ago,test_files
andgem --test
was a thing, but this is no longer the case.Other development files
Anything having to do with your development process or CI pipeline probably does not belong in your gem. For example,
.travis.yml
, or.gitignore
. People who will make changes to your gem are going to contribute their changes upstream by cloning the gem’s repository. They may hack on your unpacked gem to begin with, but development files are not necessary at that stage.
I didn’t put those files in my gem, how did they get there?
In your .gemspec
, let’s take a look at what’s happening and provide some recommendations.
The default gem skeleton generated by $ bundle gem <gem_name>
looks like this:
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
This is the line where we specify which files are included in the gem. It is evaluated at gem build time. This default line of code does the following things:
- Lists all files checked into the git repository, separated by NULL byte
- Splits the list of files into an array
- Removes files from the array which are in a
test
,spec
, orfeatures
directory
Because this blacklist-based approach only excludes a few directories, everything else in the repository will make its way into the gem. This is not what we want.
How do I clean up my gem?
There are a few different methods for eliminating unwanted files in your .gemspec
. The best solution for your project might be some combination of these methods.
Use
git ls-files
with arguments.spec.files = `git ls-files -z lib *.md *.txt`.split("\x0")
Use Ruby’s
grep
to whitelist files instead ofreject
to blacklist them.spec.files = `git ls-files -z`.split("\x0").grep(%r{lib|README\.md})
Use
extra_rdoc_files
to add documentation files to your gem.spec.extra_rdoc_files = %w[README.md LICENSE]
How do I ensure that my gem stays junk-free?
I recommend regular inspection of your gem’s contents by adding a step to your CI process. For example:
# Clean and rebuild the gem
rm -rf pkg
bin/rake build
cd pkg
# Unpack the built gem
tar xvf *.gem
# List files contained in the gem
tar tf data.tar.gz
Take a look at an example GitHub Actions configuration that displays the contents of the gem during the final step in the workflow.
The state of a modern Rails project
Let’s inspect a vanilla Rails project and see what kinds of junk it gives us.
Starting from a clean slate, no gems on the system at all, we install Rails 5.2.3 and create a new Rails project. This results in the download of 73 (!) separate gems.
$ gem install rails -v 5.2.3
...
40 gems installed
$ rails new foobar
...
Bundle complete! 14 Gemfile dependencies, 71 gems now installed.
$ ls -1 $GEM_HOME/cache | wc -l
73
This high number of dependencies feels a little bit uncomfortable, but not too bad overall. Things could be a lot worse, say, if we were using some other unnamed language and its package management facility.
…you know the one. 😠
Is there a tool that can help me find these junk files?
There is now! 🎉
https://github.com/t-richards/show-gem-junk
The 5-second explanation goes like this:
# 1. Install gem
gem install show-gem-junk
# 2. Set your GEM_HOME environment variable. For example:
export GEM_HOME=/home/tom/.gem/ruby/2.6.0
# 3. Run the tool!
show-gem-junk
If everything goes well, you should see some output like this:
Gem Name Version Size Junk % Junk
crass 1.0.4 668 KB 556 KB 83.26
tzinfo 1.2.5 987 KB 613 KB 62.05
rubyzip 1.2.3 676 KB 393 KB 58.15
xpath 3.2.0 82.5 KB 36.2 KB 43.93
tilt 2.0.9 272 KB 113 KB 41.49
addressable 2.6.0 586 KB 243 KB 41.41
chromedriver-helper 2.1.1 151 KB 59.7 KB 39.65
rack 2.0.7 1.04 MB 422 KB 39.55
minitest 5.11.3 305 KB 102 KB 33.4
rails-html-sanitizer 1.0.4 87.4 KB 25 KB 28.63
regexp_parser 1.5.1 600 KB 163 KB 27.14
erubi 1.8.0 62.2 KB 16.3 KB 26.26
jbuilder 2.9.1 148 KB 37.1 KB 25.05
loofah 2.2.3 338 KB 73.3 KB 21.67
builder 3.2.3 145 KB 30.8 KB 21.27
archive-zip 0.12.0 550 KB 105 KB 19.11
mini_portile2 2.4.0 151 KB 27.3 KB 18.11
childprocess 1.0.1 182 KB 30.2 KB 16.58
method_source 0.9.2 66 KB 9.8 KB 14.86
rails-dom-testing 2.0.3 96 KB 11.9 KB 12.39
public_suffix 3.1.0 379 KB 45.7 KB 12.03
capybara 3.24.0 1.74 MB 194 KB 10.88
rack-proxy 0.6.5 63.7 KB 6.73 KB 10.57
pg 1.1.4 1.74 MB 187 KB 10.49
rb-inotify 0.10.0 70.3 KB 6.37 KB 9.07
thread_safe 0.3.6 714 KB 50.5 KB 7.08
msgpack 1.3.0 1.65 MB 102 KB 6.01
nio4r 2.3.1 826 KB 34 KB 4.12
bindex 0.7.0 149 KB 5.02 KB 3.38
webpacker 4.0.7 948 KB 26.6 KB 2.81
mimemagic 0.3.3 2.41 MB 63.5 KB 2.57
bootsnap 1.4.4 236 KB 1.78 KB 0.75
ruby_dep 1.5.0 46.5 KB 330 Bytes 0.69
rake 12.3.2 332 KB 1.96 KB 0.59
rb-fsevent 0.10.3 192 KB 1.11 KB 0.58
mini_mime 1.0.1 264 KB 1.48 KB 0.56
thor 0.20.3 250 KB 794 Bytes 0.31
selenium-webdriver 3.142.3 1.48 MB 2 KB 0.13
ffi 1.11.1 5.42 MB 6.45 KB 0.12
Grand total size of junk: 3.71 MB
This, by the way, is an analysis of our 73 rails gems from earlier. If every one of these gems were to eliminate their junk files, we could trim down a vanilla rails install by 3.7MB!
Let’s pick on the worst offender for a moment. It looks like the crass
gem is 83% junk! What’s in there?
$ show-gem-junk -v
Gem Name: crass
Version: 1.0.4
Path: /home/tom/.gem/ruby/2.6.0/gems/crass-1.0.4
Size: 668 KB
Junk: 556 KB
% Junk: 83.26
Junk File Size
crass-1.0.4/test/css-parsing-tests/color3_hsl.json 199 KB
crass-1.0.4/test/support/serialization/bootstrap.css 117 KB
crass-1.0.4/test/support/serialization/animate.css 71.2 KB
crass-1.0.4/test/support/serialization/pure.css 34.8 KB
crass-1.0.4/test/css-parsing-tests/color3_keywords.json 22.8 KB
crass-1.0.4/test/shared/parse_rules.rb 17.1 KB
crass-1.0.4/test/support/serialization/bootstrap-theme.css 16.4 KB
crass-1.0.4/test/css-parsing-tests/component_value_list.json 14 KB
crass-1.0.4/test/test_parse_properties.rb 12.1 KB
crass-1.0.4/test/css-parsing-tests/README.rst 9.15 KB
crass-1.0.4/test/css-parsing-tests/make_color3_keywords.py 6.73 KB
crass-1.0.4/test/support/serialization/html5-boilerplate.css 5.25 KB
crass-1.0.4/test/css-parsing-tests/stylesheet_bytes.json 4.94 KB
crass-1.0.4/test/css-parsing-tests/color3.json 3.82 KB
crass-1.0.4/test/test_css_parsing_tests.rb 3.62 KB
crass-1.0.4/test/support/common.rb 3.29 KB
crass-1.0.4/test/css-parsing-tests/An+B.json 2.23 KB
crass-1.0.4/test/test_serialization.rb 1.71 KB
crass-1.0.4/test/css-parsing-tests/one_declaration.json 1.52 KB
crass-1.0.4/test/css-parsing-tests/rule_list.json 1.31 KB
crass-1.0.4/test/css-parsing-tests/stylesheet.json 1.31 KB
crass-1.0.4/test/css-parsing-tests/declaration_list.json 1.17 KB
crass-1.0.4/test/css-parsing-tests/one_rule.json 1.01 KB
crass-1.0.4/test/test_crass.rb 864 Bytes
crass-1.0.4/crass.gemspec 815 Bytes
crass-1.0.4/test/css-parsing-tests/one_component_value.json 657 Bytes
crass-1.0.4/test/css-parsing-tests/make_color3_hsl.py 624 Bytes
crass-1.0.4/test/css-parsing-tests/LICENSE 326 Bytes
crass-1.0.4/test/test_parse_stylesheet.rb 313 Bytes
crass-1.0.4/test/test_parse_rules.rb 303 Bytes
crass-1.0.4/test/support/serialization/misc.css 167 Bytes
crass-1.0.4/.travis.yml 84 Bytes
crass-1.0.4/.gitignore 42 Bytes
Holy cow! There are 556K of CSS test fixtures in there.
On the GitHub page for show-gem-junk
, you’ll find information on what this tool considers to be a “junk” file, as well as usage details.
Key Takeaways
- Gem users: Have a look inside your gems every once in a while, sometimes you can find interesting things in there.
- Open source contributors: Please consider making a pull request to your favorite project and help them clean up the junk!
- Gem maintainers: Please don’t make your users download junk.
TL;DR:
Today's RubyGems include many unnecessary files. Even with a modest number of dependencies, these files can add up, especially across multiple downloaded versions. Don't make your users download junk. Package only what you use, and omit the rest.