Build a Search Index
To start using Stork, you need to build a search index: a file that Stork can load to respond to search queries. To build a search index, you need to give Stork a configuration file that tells Stork all the documents you want indexed, along with some metadata about how to index those documents.
The Configuration File
A Stork configuration file describes all the documents that should be indexed. This configuration file can either be written in JSON or TOML.
Stork can read contents from the web, from the filesystem, or from inline within the configuration file.
[[input.files]]
title = "1: General Introduction"
contents = "After an unequivocal experience of the inefficiency of the subsisting federal government, you are called upon to deliberate on a new Constitution for the United States of America..."
url = "https://federalist.stork-search.net/1.html"
[[input.files]]
title = "2-5: Concerning Dangers from Foreign Force and Influence"
contents = "When the people of America reflect that they are now called upon to decide a question, which, in its consequences, must prove one of the most important that ever engaged their attention..."
url = "https://federalist.stork-search.net/2-5.html"
[[input.files]]
title = "6-7: Concerning Dangers from Dissentions Between the States"
contents = "The three last numbers of this paper have been dedicated to an enumeration of the dangers to which we should be exposed, in a state of disunion, from the arms and arts of foreign nations..."
url = "https://federalist.stork-search.net/6-7.html"
To build a search index from this configuration file, save the file to disk and pass it into the Stork command-line tool:
$ stork build --input basic.toml --output federalist.st
Testing your index
After writing a config file, you might want to test how well the search interface works before loading it onto your web site. Stork offers a test mode, where it will build your search index and load it into a simplified web interface so you can play with the search functionality while iterating on your configuration.
To test out your index, run:
$ stork test --index my-index.st
and open http://localhost:1612
, the web page served by Stork.
File Formats
Today, Stork can automatically recognize and extract text from four types of files:
- Plain text files,
- SRT subtitle files,
- HTML files, and
- Markdown files.
Stork will automatically detect the file format by inspecting its file extension; however, if your file extension is non-standard (such as .mdx
for a Markdown file), you can specify the format of any file in the configuration:
[[input.files]]
path = "federalist-1.mdx"
title = "1: General Introduction"
url = "https://federalist.stork-search.net/1.html"
filetype = "Markdown"
Additional Options
You can visit the Configuration File Reference to see the full list of acceptable configuration key-value pairs.
The Stork configuration file lets you control many aspects of how Stork indexes your content and how the search interface behaves, such as:
- How frontmatter in a document should be parsed
- Which HTML selectors in an HTML document should be indexed and which should be ignored
- Which language the stemmer uses to stem each word in your corpus