Clean Up Your Gems

by Tom Richards

Sunday, Jun 23, 2019

In this post, I urge Ruby gem maintainers to be aware of the files they distribute with their gems. If your gem includes files which you have not consciously included for a specific reason, please consider removing these files from your gem.

This call to action extends beyond Ruby gems, and even extends beyond the Ruby language. If you are packaging software of any kind, whether in source code or binary form, please ask yourself,

  1. What files are included in my software package?
  2. What is the reason that <file X> is in my software package?

It is easy to fall into the trap of blacklisting files from a package. This, in my opinion, is the wrong approach to packaging software.

I don't want test files in my package, so I will manually exclude them.

Software packages should instead have a whitelist of components, not the other way around.

I do want these specific files in my package, so I will add them to the whitelist.

This model applies to software in any software ecosystem. Provide what you need. No more, no less.

What’s in a Ruby gem?

Ruby gems can live in many places on your system. Figuring out where your gems are located is an exercise left to the reader. For me, my gems are in $GEM_HOME; with packaged gem files in $GEM_HOME/cache and unpacked gem directories in $GEM_HOME/gems.

$ echo $GEM_HOME
/home/tom/.gem/ruby/2.6.0

# Show cached .gem files
$ ls $GEM_HOME/cache
actioncable-5.2.3.gem
actionmailer-5.2.3.gem
actionpack-5.2.3.gem
...

# Show unpacked gem directories
$ ls $GEM_HOME/gems
actioncable-5.2.3
actionmailer-5.2.3
actionpack-5.2.3
...

Let’s take a look inside one of the unpacked gems to see what files it has.

$ cd $GEM_HOME/gems
$ tree rails-5.2.3/
rails-5.2.3/
└── README.md

0 directories, 1 file

Yep, you read that right. The rails gem includes literally one file: the README.md. Nothing else. Wow, what a lightweight web framework! 😬

…Of course, we know that the purpose of the rails gem is to be a “dependency package” of sorts, merely pointing to the other important gems it uses to get work done (e.g. activerecord, railties).

Here’s a more illustrative example; one where the gem actually contains some useful and interesting files.

$ tree parallel-1.17.0/
parallel-1.17.0/
├── lib
│   ├── parallel
│   │   ├── processor_count.rb
│   │   └── version.rb
│   └── parallel.rb
└── MIT-LICENSE.txt

2 directories, 4 files

We can see that the parallel gem includes code and a license. That’s it.

Both of the above gems are good examples of how to package things appropriately. These gem maintainers chose to include exactly what was required for their gem, and nothing more.

What should be in a Ruby gem?

Warning! Opinion incoming!

Please consider each of these things carefully for your gem. Some gems should include more files than others. Some gems don't include any code at all, as we saw before with the rails gem.

  1. Code

    The gem implementation in lib/. This is where work gets done. I don’t think this requires much of an explanation - your implementation should obviously be included with your gem.

  2. README

    Including a README with your gem is acceptable, because it helps inform developers of your gem’s purpose. You might even consider including multiple README files in different languages.

  3. LICENSE

    I am not a lawyer, but it is probably a good idea to distribute the full text of your software license along with the source code.

  4. Example code (optional)

    If you’re feeling generous, offline users will appreciate a few short examples of some common ways to use your gem.

What should not be in a gem?

  1. Gemspec

    You don’t need to package your Gem’s specification file (.gemspec). The Gemspec is automatically included with your gem archive via the metadata.gz file. Have a look inside one of your packaged .gem files to see for yourself. Hint: a .gem is really just a .tar archive with a different extension.

  2. Test / spec files

    People who acquire your gem via gem install or bundle install are generally not going to figure out your testing framework and run your tests. Many years ago, test_files and gem --test was a thing, but this is no longer the case.

  3. Other development files

    Anything having to do with your development process or CI pipeline probably does not belong in your gem. For example, .travis.yml, or .gitignore. People who will make changes to your gem are going to contribute their changes upstream by cloning the gem’s repository. They may hack on your unpacked gem to begin with, but development files are not necessary at that stage.

I didn’t put those files in my gem, how did they get there?

In your .gemspec, let’s take a look at what’s happening and provide some recommendations.

The default gem skeleton generated by $ bundle gem <gem_name> looks like this:

spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }

This is the line where we specify which files are included in the gem. It is evaluated at gem build time. This default line of code does the following things:

  • Lists all files checked into the git repository, separated by NULL byte
  • Splits the list of files into an array
  • Removes files from the array which are in a test, spec, or features directory

Because this blacklist-based approach only excludes a few directories, everything else in the repository will make its way into the gem. This is not what we want.

How do I clean up my gem?

There are a few different methods for eliminating unwanted files in your .gemspec. The best solution for your project might be some combination of these methods.

  1. Use git ls-files with arguments.

    spec.files = `git ls-files -z lib *.md *.txt`.split("\x0")
    
  2. Use Ruby’s grep to whitelist files instead of reject to blacklist them.

    spec.files = `git ls-files -z`.split("\x0").grep(%r{lib|README\.md})
    
  3. Use extra_rdoc_files to add documentation files to your gem.

    spec.extra_rdoc_files = %w[README.md LICENSE]
    

How do I ensure that my gem stays junk-free?

I recommend regular inspection of your gem’s contents by adding a step to your CI process. For example:

# Clean and rebuild the gem
rm -rf pkg
bin/rake build
cd pkg

# Unpack the built gem
tar xvf *.gem

# List files contained in the gem
tar tf data.tar.gz

Take a look at an example GitHub Actions configuration that displays the contents of the gem during the final step in the workflow.

The state of a modern Rails project

Let’s inspect a vanilla Rails project and see what kinds of junk it gives us.

Starting from a clean slate, no gems on the system at all, we install Rails 5.2.3 and create a new Rails project. This results in the download of 73 (!) separate gems.

$ gem install rails -v 5.2.3
...
40 gems installed

$ rails new foobar
...
Bundle complete! 14 Gemfile dependencies, 71 gems now installed.

$ ls -1 $GEM_HOME/cache | wc -l
73

This high number of dependencies feels a little bit uncomfortable, but not too bad overall. Things could be a lot worse, say, if we were using some other unnamed language and its package management facility.

…you know the one. 😠

Is there a tool that can help me find these junk files?

There is now! 🎉

https://github.com/t-richards/show-gem-junk

The 5-second explanation goes like this:

# 1. Install gem
gem install show-gem-junk

# 2. Set your GEM_HOME environment variable. For example:
export GEM_HOME=/home/tom/.gem/ruby/2.6.0

# 3. Run the tool!
show-gem-junk

If everything goes well, you should see some output like this:

Gem Name                Version       Size       Junk     % Junk
crass                     1.0.4     668 KB     556 KB      83.26
tzinfo                    1.2.5     987 KB     613 KB      62.05
rubyzip                   1.2.3     676 KB     393 KB      58.15
xpath                     3.2.0    82.5 KB    36.2 KB      43.93
tilt                      2.0.9     272 KB     113 KB      41.49
addressable               2.6.0     586 KB     243 KB      41.41
chromedriver-helper       2.1.1     151 KB    59.7 KB      39.65
rack                      2.0.7    1.04 MB     422 KB      39.55
minitest                 5.11.3     305 KB     102 KB       33.4
rails-html-sanitizer      1.0.4    87.4 KB      25 KB      28.63
regexp_parser             1.5.1     600 KB     163 KB      27.14
erubi                     1.8.0    62.2 KB    16.3 KB      26.26
jbuilder                  2.9.1     148 KB    37.1 KB      25.05
loofah                    2.2.3     338 KB    73.3 KB      21.67
builder                   3.2.3     145 KB    30.8 KB      21.27
archive-zip              0.12.0     550 KB     105 KB      19.11
mini_portile2             2.4.0     151 KB    27.3 KB      18.11
childprocess              1.0.1     182 KB    30.2 KB      16.58
method_source             0.9.2      66 KB     9.8 KB      14.86
rails-dom-testing         2.0.3      96 KB    11.9 KB      12.39
public_suffix             3.1.0     379 KB    45.7 KB      12.03
capybara                 3.24.0    1.74 MB     194 KB      10.88
rack-proxy                0.6.5    63.7 KB    6.73 KB      10.57
pg                        1.1.4    1.74 MB     187 KB      10.49
rb-inotify               0.10.0    70.3 KB    6.37 KB       9.07
thread_safe               0.3.6     714 KB    50.5 KB       7.08
msgpack                   1.3.0    1.65 MB     102 KB       6.01
nio4r                     2.3.1     826 KB      34 KB       4.12
bindex                    0.7.0     149 KB    5.02 KB       3.38
webpacker                 4.0.7     948 KB    26.6 KB       2.81
mimemagic                 0.3.3    2.41 MB    63.5 KB       2.57
bootsnap                  1.4.4     236 KB    1.78 KB       0.75
ruby_dep                  1.5.0    46.5 KB  330 Bytes       0.69
rake                     12.3.2     332 KB    1.96 KB       0.59
rb-fsevent               0.10.3     192 KB    1.11 KB       0.58
mini_mime                 1.0.1     264 KB    1.48 KB       0.56
thor                     0.20.3     250 KB  794 Bytes       0.31
selenium-webdriver      3.142.3    1.48 MB       2 KB       0.13
ffi                      1.11.1    5.42 MB    6.45 KB       0.12

Grand total size of junk: 3.71 MB

This, by the way, is an analysis of our 73 rails gems from earlier. If every one of these gems were to eliminate their junk files, we could trim down a vanilla rails install by 3.7MB!

Let’s pick on the worst offender for a moment. It looks like the crass gem is 83% junk! What’s in there?

$ show-gem-junk -v

Gem Name: crass
Version:  1.0.4
Path:     /home/tom/.gem/ruby/2.6.0/gems/crass-1.0.4
Size:     668 KB
Junk:     556 KB
% Junk:   83.26

Junk File                                                          Size
crass-1.0.4/test/css-parsing-tests/color3_hsl.json               199 KB
crass-1.0.4/test/support/serialization/bootstrap.css             117 KB
crass-1.0.4/test/support/serialization/animate.css              71.2 KB
crass-1.0.4/test/support/serialization/pure.css                 34.8 KB
crass-1.0.4/test/css-parsing-tests/color3_keywords.json         22.8 KB
crass-1.0.4/test/shared/parse_rules.rb                          17.1 KB
crass-1.0.4/test/support/serialization/bootstrap-theme.css      16.4 KB
crass-1.0.4/test/css-parsing-tests/component_value_list.json      14 KB
crass-1.0.4/test/test_parse_properties.rb                       12.1 KB
crass-1.0.4/test/css-parsing-tests/README.rst                   9.15 KB
crass-1.0.4/test/css-parsing-tests/make_color3_keywords.py      6.73 KB
crass-1.0.4/test/support/serialization/html5-boilerplate.css    5.25 KB
crass-1.0.4/test/css-parsing-tests/stylesheet_bytes.json        4.94 KB
crass-1.0.4/test/css-parsing-tests/color3.json                  3.82 KB
crass-1.0.4/test/test_css_parsing_tests.rb                      3.62 KB
crass-1.0.4/test/support/common.rb                              3.29 KB
crass-1.0.4/test/css-parsing-tests/An+B.json                    2.23 KB
crass-1.0.4/test/test_serialization.rb                          1.71 KB
crass-1.0.4/test/css-parsing-tests/one_declaration.json         1.52 KB
crass-1.0.4/test/css-parsing-tests/rule_list.json               1.31 KB
crass-1.0.4/test/css-parsing-tests/stylesheet.json              1.31 KB
crass-1.0.4/test/css-parsing-tests/declaration_list.json        1.17 KB
crass-1.0.4/test/css-parsing-tests/one_rule.json                1.01 KB
crass-1.0.4/test/test_crass.rb                                864 Bytes
crass-1.0.4/crass.gemspec                                     815 Bytes
crass-1.0.4/test/css-parsing-tests/one_component_value.json   657 Bytes
crass-1.0.4/test/css-parsing-tests/make_color3_hsl.py         624 Bytes
crass-1.0.4/test/css-parsing-tests/LICENSE                    326 Bytes
crass-1.0.4/test/test_parse_stylesheet.rb                     313 Bytes
crass-1.0.4/test/test_parse_rules.rb                          303 Bytes
crass-1.0.4/test/support/serialization/misc.css               167 Bytes
crass-1.0.4/.travis.yml                                        84 Bytes
crass-1.0.4/.gitignore                                         42 Bytes

Holy cow! There are 556K of CSS test fixtures in there.

On the GitHub page for show-gem-junk, you’ll find information on what this tool considers to be a “junk” file, as well as usage details.

Key Takeaways

  • Gem users: Have a look inside your gems every once in a while, sometimes you can find interesting things in there.
  • Open source contributors: Please consider making a pull request to your favorite project and help them clean up the junk!
  • Gem maintainers: Please don’t make your users download junk.

TL;DR:

Today's RubyGems include many unnecessary files. Even with a modest number of dependencies, these files can add up, especially across multiple downloaded versions. Don't make your users download junk. Package only what you use, and omit the rest.