I work with many RMarkdown files structured within a hierarchy of directories. I needed to render these files to PDF and then use Ghostscript to merge them. This article explains the two small shell scripts I used for the task.
The Problem
For a client project, I need to produce several PDF documents consisting of:
- A cover letter
- A report
For reasons irrelevant to this article, the cover letter and report use different templates for rendering, so they cannot be combined before rendering. Both documents are written in RMarkdown (.Rmd files), which renders directly to PDF.
Each project involves hundreds of coverโreport pairs, making manual rendering impractical. The directory structure follows this pattern:
๐--client-root
๐--project-1
| ๐--report
| | |--cover.Rmd
| | |--report_project-1.Rmd
| ๐--data
๐--project-2
| ๐--report
| | |--cover.Rmd
| | |--report_project-2.Rmd
| ๐--data
Of course, in reality, my directories aren’t named “project-n”; they have real, meaningful names.
I never, ever, use spaces or non-ASCII characters in any directory or file names.
The Solution
I used a one-liner to render all Rmd files to PDF:
find client-root -type f -name "*.Rmd" | xargs -I{} Rscript -e 'rmarkdown::render("{}")'
How It Works
find client-root -type f -name "*.Rmd"is a standardfindcommand that:- Searches within
client-root - Looks for files (
-type f) - Matches filenames ending in
.Rmd(-name "*.Rmd") - The output is a list of file paths, e.g.,
./client-root/project-1/report/cover.Rmd.
- Searches within
The
|(pipe) sends this list to the next command.xargs -I{} Rscript -e 'rmarkdown::render("{}")'processes each file:xargsconstructs and executes commands for each file found.-I{}tellsxargsto replace{}with each filename.Rscript -eruns an R expression (-edenotes inline execution).rmarkdown::render("{}")calls the R function to process each file dynamically, replacing{}with the actual filename.
After running this, the directory structure now contains corresponding PDFs:
๐--client-root
๐--project-1
| ๐--report
| | |--cover.Rmd
| | |--cover.pdf
| | |--report_project-1.Rmd
| | L--report_project-1.pdf
| ๐--data
๐--project-2
| ๐--report
| | |--cover.Rmd
| | |--cover.pdf
| | |--report_project-2.Rmd
| | L--report_project-2.pdf
| ๐--data
A New Problem
Now I needed to merge the cover and report PDFs for each project.
For a single project, I could do this manually using Ghostscript (gs):
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \
-sOutputFile=merged_report_project-1.pdf \
cover.pdf report_project-1.pdf
But since I had multiple projects, I needed to automate the process using Bash.
Additionally, I had to follow a naming convention:
The merged file should start with "merged_", followed by the reportโs filename, e.g.:
merged_report_project-1.pdf
Merging the PDFs
To merge the PDFs, my approach was:
- Locate all
reportdirectories across projects. - Extract file paths for the cover and report PDFs.
- Construct the merged filename dynamically.
- Use Ghostscript to merge the files.
Hereโs the script:
find client-root -type d -name "report" | \
while read -r dir; do
cover_pdf="$dir/cover.pdf"
report_pdf=("$dir/report_"*.pdf)
output_pdf="$dir/merged_$(basename "${report_pdf[0]}")"
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \
-sOutputFile="$output_pdf" \
"$cover_pdf" \
"${report_pdf[0]}"
done
Understanding the Script
Finding directories
find client-root -type d -name "report"- Searches under
client-root - Finds only directories (
-type d) named"report". - The results are piped to the next command.
- Searches under
Processing each directory
while read -r dir; do ... done- Iterates over each directory found.
read -r dirassigns each directory path todir.- The
-rflag ensures the path is read literally, preventing unintended escape sequences.
Defining the file paths
cover_pdf="$dir/cover.pdf"- Constructs the path for the cover PDF.
- Quotes ensure correct handling if spaces exist (even though I avoid them).
report_pdf=("$dir/report_"*.pdf)- Uses a wildcard (
report_*.pdf) to match the report file. - The parentheses create an array, allowing for multiple matches (though only one is expected).
Constructing the merged filename
output_pdf="$dir/merged_$(basename "${report_pdf[0]}")"${report_pdf[0]}selects the first (and expected only) match.basenamestrips the directory path, keeping only the filename.$( ... )performs command substitution, inserting the result dynamically."merged_"is prepended to create the final merged filename.
Merging with Ghostscript
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \ -sOutputFile="$output_pdf" \ "$cover_pdf" \ "${report_pdf[0]}"- Combines the cover and report PDFs, saving as
merged_report_project-N.pdf. - If you’re curious about
gsflags, check them out usingman gs.
- Combines the cover and report PDFs, saving as
And thatโs it! Now all my merged_report_project-x.pdf files are generated automatically.
Bash saved me a lot of time, which I then used to write this post. Now back to work! ๐