How to self publish a book: customizing Bookdown

tl;dr: This post is related to How to self-publish a book: A handy list of resources. It's centered around Bookdown and some non-standard customizations I found useful to create the Data Science Live Book.

  • The first steps into Bookdown
  • Amazon Kindle format
  • Building the book
  • Be mindful of the line width
  • At the beginning of every '.Rmd'
  • Some parameters in index.Rmd
  • Images (position, caption, index, and size)
  • Book size

If you want to check the Github of the book: https://github.com/pablo14/data-science-live-book

This may not seem relevant if you are not developing writing a book, a manual or any kind of document; however, if you do start to write, then Google will probably bring you here for the answer.

The first steps into Bookdown

This is not an introductory post; on the contrary, for the very first step on Bookdown:

Amazon Kindle format

The extension for Amazon books is ".mobi", and the one delivered by Bookdown is: ".epub".
You have to download the converter from: https://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765211

If the epub version is in the same directory as the 'kindlegen' program, then convert the book by typing:
./kindlegen my-amazing-book.epub
This will create the new file: my-amazing-book.mobi.

I'm aware that Calibre can convert from epub to mobi. But just to be sure, I recommend the kindlegen from Amazon. Calibre is useful to check the ebook. It's free.

Building the book

Get the script that I use to create the pdf and the html from here.

To create the epub, I use the RStudio button (in the bookdown::epub_book option)

bookdown.yml

In this file, you can define what files to process and the order in which to display the pages.

The default is to run all the pages; however, if the book is big, then it will run a lot of unnecessary code. As you can see, there are two comments that I use when debugging.

Please note the new_session: yes: it's important to always look at the last run.

Be mindful of the line width

Although we never usually think about line width, that is, how many characters fit in a single line, Amazon will not publish a book if a single character is outside of the margins of the page format you define.

formatR to the rescue! Yet another package from Yihui... what does it do? It formats R code automatically.

And yes, it allows rearranging the code in order to "fit" (Note 1) the line width. Hurray!

Live application: https://yihui.shinyapps.io/formatR/

Note 1: However, under certain conditions, it does not work as we want (but as expected). It uses the "deparse" R base function.

Without technical details, I had to do a "dirty-while(1)-hack" to the formatR package because the width of the deparse function doesn't work as I needed.

More technical info:: https://github.com/yihui/formatR/pull/71
I forked formatR to work with this functionality. I cannot warranty it works in all scenarios: https://github.com/pablo14/formatR

I spent around 30 hours fixing this β€”I hope it only takes you 1 minute. 🍻

Important: Check the parameters width.cutoff=56 and options("width"=56) from the next section.

At the beginning of every '.Rmd'

This is what I have:

```{r include = FALSE}
if(!knitr:::is_html_output())
{
  options("width"=56)
  knitr::opts_chunk$set(tidy.opts=list(width.cutoff=56, indent = 2), tidy = TRUE)
  knitr::opts_chunk$set(fig.pos = 'H')
}```

That will run in the case of PDF output. This sets the global options for all the chunks in the file, but we can overwrite each chunk’s behavior.

Now, I don't remember if the options("width"=56) is redundant with the width.cutoff=56, but my maximum line width is 56 characters according to the book size.

indent = 2: two spaces of indentation.

tidy=TRUE puts formatR to work to produce a beautiful and standardized layout code. Keep in mind that I used my own forked version of formatR.

When the width adjustment doesn't work, you have to change it by hand in the Rmd (not the md!). In this case, you have to set tidy=FALSE in all of the chunks that will be adjusted by hand:

```{r, tidy=FALSE}
print("bla bla bla * 1000")```

For example, a .Rmd from the book:

See the tidy=FALSE and how each line has a max width of 56. Amazon and I can assure that 😁.

Here's the web-live version of this section: https://livebook.datascienceheroes.com/exploratory-data-analysis.html#selecting_best_vars_mic

And here is the pdf version:

The work of Bookdown and its capabilities are stunning!

Some parameters in index.Rmd

links-as-notes: true
Use this option if you want to have a printable version of your book. It will convert all the hyperlinks to footnotes (because you cannot click the paper).

A nice-to-have feature: When we are referencing a chapter inside the book, the page number does not appear in the footnotes or beside the reference position. For an example, see more in the data preparation chapter (Page 54)"_

linestretch: 1.15
Use this parameter in the case where the lines in the of having the sentences in the PDF are really close. It will give some air to your paragraphs. ;)

Images

Image position

knitr::opts_chunk$set(fig.pos = 'H') -- Remember this line was at the beginning of the Rmd.

The parameter fig.pos='H' (plus another one) forces all the images to be in the place we create them; otherwise, they will be at the bottom of every page.

The other one is out.extra='' (double single quote), which you have to define it in each chunk.

Taking the same example as before:

```{r importance-variable-ranking, fig.width=6, fig.height=4.5, 
tidy=FALSE, fig.cap="Correlation using information theory", 
out.extra=''}```

More info about image position: https://stackoverflow.com/questions/42486617/knitr-ignoring-fig-pos

This is not necessary when you add images, just like:

knitr::include_graphics("exploratory_data_analysis/mic_mutual_info.png")

Image caption and index

fig.cap: Add the caption and an autoincrement number (1.18 in this case) to the figure as follows:

Image size

From the last example: fig.width=6 and fig.height=4.5 define the size in the same way as if you were on RStudio: if you decrease it, then the numbers will be larger.
The size will depend on your plot.

There are other ways to define size, e.g., out.width and out.height in which you can adjust the size in percentage of the screen or in pixels. Please refer to official documentation: https://bookdown.org/yihui/bookdown/figures.html

Wrapping long urls

When citing a bibliography, a long url will be out of margin. Fix it this way:

In the template.tex add: \Urlmuskip=0mu plus 1mu\relax

Regarding long urls in the book's content, I couldn't find a way to wrap them. My ugly hack: to use a Google shorten url service, for example: "https://goo.gl/2TrDgN" (which is now deprecated), so please choose another service like tinyurl.

Book size

I started with an A4 format because it is the default. When the book was almost ready I realized that it would weigh as much as a 7-year-old cat:

After checking with the Google oracle, it says that "standard sizes" for technical books are 6 x 9 inches or 15.24 x 22.86 cm.

Again, this is defined in the tempated.tex:

\usepackage[paperwidth=6in, paperheight=9in]{geometry}

You will have to check all line widths as before.

Final words

I really hope that I have encouraged you to write your own stories in a book. If you want a second opinion on some material you plan to publish, let me know.

Don't be shy, start with a minimum html site with a few pages, or with a post in Rpubs or Github Pages. Let it grow and enjoy what you do πŸŒ±πŸ“š.


Thanks for reading! πŸš€

Blog

Linkedin

Twitter

Data Science Live Book