silent puma logo Silent Puma | Documentation

Imports and the module system: don't reivent the wheel

Lopez Crawl Directives support the notion of modules and the broader term of namespaces. This concepts have proven hugely successful in many programming languages as a key way of preserving programmer sanity.

In Lopez Crawl Directives, your configuration may span multiple files. This has the three key advantages: 1. Compartimentalization: if your config ends up getting too big, you can organize it better putting related items in the same files and then importing them all in a “main” file. 2. Code reuse: you can bring along that useful small file of analyses that you used in that other project. 3. Not reinventing the wheel: you don’t need to rewrite everything from scratch. Lopez comes with lopez-std, a set of useful and common analyses and configurations that you can use out of the box.

The import directive

The importdirective is deceptively simple: just pass a string to it containing a path, like so:

import "my-module.my-submodule";

The gist of it is this: Lopez will look for a file corresponding to that path (more on that later) and will include it in the crawl configuration. This means that all items declared in that sub-module (boundary rules, set-variables, selections, etc…) will become active and be considered for any crawl using your module as a configuration.

Paths and files

Let’s consider the simplest import example where a single module is imported:

import "my-module";

Suppose this import was given inside a file called main.lcd. Lopez will look for some options while resolving my-module: 1. First, it will look for a file called my-module.lcd in the same folder as main.lcd. 2. If this fails, it will look for a folder called my-module, with a file called module.lcd inside it (the name of the file is always module.lcd, no matter the module name). 3. If even this fails, it will look for the same two files in the “import path”, a special folder in your system where there are a bunch of modules. If you installed Lopez with the entalator, you don’t ever need to worry about this place. Just accept that there are modules there that can be imported “like magic”. 4. If everything fails, you got yourself an import error and need to revise your configuration.

Resolving sub-modules, like "my-module.my-submodule" is not that different: the lookup order is the same. The only thing different is that the search is done not in the folder of main.lcd, but in the folder my-module, which, in turn, should be in the folder of main.lcd. In other words: the module hierarchy follows the filesystem hierarchy; just substitute . with \.

Special paths root and super

There are two special module names that serve special purposes in the import system. The simplest of them is super. Suppose you have structured the your code in the following way:

main.lcd
some-module.lcd
a-module/
    module.lcd
    a-submodule.lcd

Now suppose that, for whatever reason, you want a-module.a-submodule to import a-module. If you write import "a-module";, Lopez will look for a file a-module.lcd in the same folder as a-submodule.lcd, which does not exist. So, how can you import a module which is above in the hierarchy? Using super! The special path super means “the module above”. Therefore, the way to make the import work is using

import "super.a-module";

The super module can also be chained. Therefore, to import the file some-module.lcd in the example above, you may do:

import "super.super.some-module";

However, you can get a cleaner solution using the root special module. This module means “the root of the whole thing”. In our case, this corresponds to the folder containing main.lcd. Therefore, to import some-module from a-module.sub-module you can write

import "root.some-module";

The Standard Library

If you have installed Lopez with the entalator, you have also installed lopez-std, a collection of useful pre-built analyses and configurations to get you productive fast with crawling. This collection of modules is available to every project in the import path hierarchy, even if the files are not there. For example, suppose you intend to crawl a site which is full of links to .pdf files. By default, Lopez will also try to interpret these files as HTML, syntax be damned. In fact, you can also scrape these files, even though things sometimes are a bit broken. However, suppose, you want to skip these file (and other common file extensions as .docx). Well, lopez-std has the exact module for you: bad-extensions, a module which contains boundary rule to disallow Lopez from crawling these stuff. And you can use it out of the box:

import "bad-extensions";

Even though you don’t have any corresponding file in your folder.