Monday 30 November 2015

Piping Doxygen output to Github Pages for automatic navigation

Github Pages is a wonderful system for building a website. It works using Git. You edit your files locally, push them to Github, and the site automatically updates. docs.simul.co uses it for Simul's documentation. One great thing about gh-pages is that because you edit it locally, you can use something like Sublime Text to edit the files, which still, in 2015, is so much more responsive than editing online with all the latency that entails.

Gh-pages uses a system called Jekyll, which is also great. While the site can take straight-up html files and reproduce them on the site, it also understands Markdown syntax. While html is a text format, it's hard to write - you often want to use a specialized editor - but these end up filling your file with unnecessary cruft - repeated format tags and so on. Markdown is much simpler - it's plain text with a few conventions, for instance:

Heading
--------

when compiled to html by Jekyll, becomes:

Heading

So if you put a file in your repo called test.md, that becomes a page on your website called test.html. It will only be used if it has Jekyll Front Matter at the top, like so:

---
title: Introduction
layout: reference
url: Introduction
---

Jekyll will format it, and apply a theme if you have one. The sites it generates are generally good-looking.

So that's all great

but most of my interesting stuff comes from the Simul source code - I use Doxygen to generate documentation from source. Based on formatted comments, Doxygen creates html (and various other formats if desired) docs.

So what I wanted was to pipe the doxygen output into my Jekyll site, but keep the Jekyll formatting, and generate a consistent nav bar that looked the same whether you were in the Doxygen-created automatic docs, or the static markdown-based site.

Now Doxygen can't generate markdown. It can use markdown as an input - and create html from it just as Jekyll does. Fortunately, Jekyll doesn't insist on having markdown as an input. If you put a plain html file in a Jekyll site, it will be ignored, but if you add some Jekyll front matter to the top of your html as plain text, it will be included in the Jekyll site. The front matter makes it a wrong html file, so it will only work properly within Jekyll.

So the first step to get the Doxygen content into the Pages site was to create a special jekyll_header.html that Doxygen would use, which has the front matter at the top, like so:

---
title: $title
layout: reference
---
<!-- HTML header for doxygen 1.8.9.1-->
... more typical doxygen stuff here...

Doxygen replaces $title with the page title. We specify "reference" as the layout so that we can create a special Jekyll layout if needed for the docs.

So now Doxygen will build html files which, if copied into the Pages repository, will appear in the website with Jekyll styling.

The whole point of this is automation. We use Jenkins, with a project set up to run doxygen on the source code. It's arranged to that the Doxygen configuration file has these settings, which read environment variables:

HTML_OUTPUT            =$(HTML_OUTPUT)
HTML_HEADER            =$(HTML_HEADER)
HTML_FOOTER            =$(HTML_FOOTER)

Then we call doxygen twice with output to two different folders - once to generate the standard compiled html (chm) help file, and once with the jekyll header to prepare the docs for Pages.

A separate Jenkins project then copies the output into the "reference" directory in the Pages repo. Before this, we delete the entire contents of this directory, because any files that were not regenerated by Doxygen should be removed from the folder - otherwise they'll still show up on the site. Then we git commit, git push, and the site should update.

This much will get the generated docs into Jekyll, and the Jekyll styling will be applied, if you've created the "reference" layout (any good Jekyll tutorial can help you with this).

Doxygen is probably the best docs-generator of its kind, but there's a lack of control. The html generated can have a navigation tree on the side, but the generated tree uses frames, a relic of the 90's which doesn't play well with modern site design, much less with the mobile web. Jekyll on the other hand, can render very nicely on mobile, and with a bit of work, can display a nice modern html navigation bar.

So the next step was to try to fix up the left-hand-side navigation tree for the Doxygen content.

Our existing method relies on the directory structure of the website: each folder is a level of the tree, with the filename being the bottom level. That doesn't work for Doxygen content because Doxygen doesn't generate a directory tree structure, but a single flat directory, full of files with names like "classsimul_1_1base_1_1EnvironmentVariables.html".

In the Jekyll _includes directory, create an html file called something like "ref_sidebar.html", and fill it with:
<div class="sidebar">
  <div class="container sidebar-sticky">
    <div class="sidebar-about">
      <h1>
        <a href="{{ site.baseurl }}">
          {{ site.title }}
        </a>
      </h1>
      <p class="lead">{{ site.description }}</p>
    </div>
    <nav class="sidebar-nav">
    </nav>
  </div>
</div>


Now inside the "sidebar-nav", we will put the Jekyll/liquid code.
{% assign sorted_pages = site.pages | sort: "url" %}

This is the first step - we create a sorted list of all the url's in the site. It's pretty fast. I don't know if it's possible to create an array of structs in Liquid, so we create some blank arrays that we'll fill up with properties of the various pages.

{% assign this_path = "" %}

{% assign pages = site.array %}
{% assign paths = site.array %}
{% assign titles = site.array %}
{% assign path_sizes = site.array %}

Then we're going to go through all the pages (again, not slow), and from the url's we will create a "path" for each page with "/" as a separator. For example, the page
{% for node in sorted_pages %} {% assign path = node.url | remove: "/index.html" %}
So the path is formed initially from the url, knocking off the trailing "/index.html" that Jekyll may have generated. This is so that the main page of any tree branch will look like the root of the branch. Then we discard some pages that we don't want:

{% if path contains "/dir_" or path contains "/struct" or path contains "/functions" or path contains "/namespacemembers" or path contains "graph_legend" or path contains "google" %} {% continue %} {% endif %}
If a page refers to a source file, doxygen gives it a filename ending in _source.html, for example, "ChunkInputOutput_8h_source.html" would be the page generated for the source file "ChunkInputOutput.h" . In this case, if it's in the "reference" folder, we make it a child of "/source/", so all the source pages get grouped together.

{% if path contains "_source.html" %} {% assign path = path | replace: "reference/","reference/source/" %} {% endif %}

Similarly, the page generated for the class simul::base::BaseProfilingInterface would be called "classsimul_1_1base_1_1BaseProfilingInterface.html". We want this to be listed under the page "classes.html", which is doxygen's class list. But we also want it to be in a tree structure, so its full path should read "reference/classes/simul/base/BaseProfilingInterface". The following code achieves this. It replaces "_1_1" (which represents "::") with "/". It replaces "reference/class" with "reference/classes", so that the class definitions will be children of "classes". But that would turn the actual "classes.html" into "classeses.html". So we fix that too. We remove any ".html" extension so that tree parents are properly recognized. Various other functions are performed: "_8" goes back to ".".
Note also that we replace "_ooo" with "/". I've chosen "_ooo" as a separator in doxygen source filenames (not source code, just text documents that doxygen should interpret). This is so we can get proper parent-child relationships in such documents.
{% assign path = path | replace: "reference/class","reference/classes/" | replace: "reference/classeses","reference/classes" | remove: ".html" | replace: "_1_1","/" | replace: "_01"," " | replace: "_8","." | remove_first: "/" | replace: "_ooo","/" %}

Now we'll store some useful information: how many parts (separated by "/") are in the path?

{% assign psz = path | size %} {% assign path_parts = path | split: '/' %} {% assign path_parts_size = path_parts | size %}
And we're ready to push our path onto the list.
{% assign paths = paths | push: path %} {% assign pages = pages | push: node %} {% assign path_sizes = path_sizes | push: path_parts_size %} We fix up the title to look friendlier. We're going to use this in the nav tree.
{% assign title = node.title | remove: " Source File" | remove: " Class Reference" %} {% assign titles = titles | push: title %}
And finally we check if this is the current page.
{% if node.url == page.url %} {% assign this_path = path %} {% assign this_node = node %} {% assign this_index = forloop.index0 %} {% endif %} {% endfor %}

Having prepared

all that, we can now construct the nav tree using nested <ul> and <li> tags:
<ul>
{% assign menu_level = 0 %}

We iterate through the list "pages" that we've just created. The forloop index corresponds for "pages", "paths", "titles" and so on. Remember to use forloop.index0. Just "index" would start from 1 and give you the wrong page.

{% for node in pages %} {% assign node_path = paths[forloop.index0] %} {% assign node_title = titles[forloop.index0] %} {% assign node_path_parts = node_path | split: '/' %} {% assign node_path_parts_size = path_sizes[forloop.index0] %} {% if node_title == "404" %} {% continue %} {% endif %}

At the first level of the nav tree, we only want single-part paths:

{% unless node_path_parts_size == 1 %} {% continue %} {% endunless %} <li> <a class="sidebar-nav-item{% if page.url == node.url %} active{% endif %} sidebar-1" href="{{ node.url }}">{{ node_title }}</a></li>

NOW,

if the path of the page you're viewing contains the path of the page in the list, we'll expand its children!
{% if this_path contains node_path %}

We'll build new lists containing only the children of this parent:
{% assign pages2 = site.array %} {% assign titles2 = site.array %} {% for node2 in pages %}

and only two-level paths at this stage:

{% unless path_sizes[forloop.index0] == 2 %} {% continue %} {% endunless %} {% assign path2 = paths[forloop.index0] %} {% if path2 contains node_path and node2.url != node.url %} {% assign pages2 = pages2 | push: node2 %} {% assign titles2 = titles2 | push: titles[forloop.index0] %} {% endif %} {% endfor %}
{% if pages2.size > 0 %} {% assign menu_level = 1 %} <ul> {% for node2 in pages2 %} <li><a class="sidebar-nav-item{% if page.url == node2.url %} active{% endif %} sidebar-2" href='{{node2.url}}'>{{ titles2[forloop.index0] }}</a></li> {% endfor %}

Now this is the point where you might want to add more levels to the tree. Otherwise, close out the <ul> and the rest is just tidy-up.

</ul> {% endif%}
{% endif %}
{% endfor %} <<!-- node in pages -->
{% for num in (1...menu_level) %} </ul> {% endfor %} </ul>