Sending Summaries to a Non-default Indexer in Splunk

Summary Indexing to Somewhere Else

You are a Splunk administrator. Your environment has more than one indexer cluster or non-clustered indexer. Your clients use summary indexing.

A user wants to send events to a summary index that is not on the default indexer or cluster. (This may be necessary, for example, because another indexer cluster has a longer retention period for indexes than the one the search head normally sends its events to.) There is no way for your client to do this because no commands allow a user to specify a tcpout group for the data.

This post describes a way to allow clients to write searches that send summaries to specific indexers or indexer clusters.

Assumptions

You are using a recent version of Splunk, e.g. 7+. You know how a distributed Splunk architecture works. You know how to edit Splunk configuration files. [This post has been updated for Splunk 9, which creates summary index files using a different naming convention than previous versions.]

How Do Summary Events Get to a Summary Index?

Splunk summary indexes normally contain events with a source type of stash. The si- commands (sichart, sirare, sistats, sitimechart, and sitop) put events into a summary index – if you save them into a report, schedule the report and turn on summary indexing for the report. The collect command can be used to send events into a summary index directly (without the need to configure a summary-enabled report) and allows you to specify a target summary index. It does not, however, have any facility to send events to a specific indexer or cluster. It only sends events to a summary index on the server or cluster specified as the default in outputs.conf. There is, therefore, no built-in SPL provision to send data to a non-default indexer.

Understanding how Splunk processes summary events is helpful now. When you run an si- command or collect, the results of your query are written to a file on that search head that gets processed as a batch input. There is a stanza for this purpose in etc/system/default/inputs.conf on every Splunk instance:

[batch://$SPLUNK_HOME/var/spool/splunk/...stash_new]
queue = stashparsing
sourcetype = stash_new
move_policy = sinkhole
crcSalt = <SOURCE>

This input picks up a file with a .stash_new extension in the var/spool/splunk directory, sends the events to the special stashparsing queue (that does not count against your license), and then deletes the file. This path uncoincidentally is where Splunk si- commands and collect drop their results.

Given this insight, we can make our own input that will force the events to go wherever we want.

A Diversion into the Source Type for Summaries

Summary indexes contain events with the source type stash, no matter what the source types of your data originally were. (There is an ‘orig_sourcetype’ field available to retrieve that.) However, when Splunk ingests the results in var/spool/splunk, it assigns the events a source type of stash_new first, then renames the source type to stash before it passes events to the indexer.

At some point in Splunk history, summary indexing changed from whatever it was in the beginning to the ‘new’ way. The new way evidently involves a special line breaking scheme. The significance to us is that whatever our process is, it must continue to treat events the same way as stash_new and the source type needs to change to stash before we are done.

What Are We Working With?

Let us assume that we have a search head cluster and two indexer clusters. The indexer clusters are called cluster_a and cluster_b in our configuration. These have cluster masters named cma and cmb, respectively. Our search head deployer, therefore, pushes out the following in an outputs.conf to the search head cluster:

[tcpout]
defaultGroup = cluster_a
indexAndForward = false

[tcpout:cluster_a]
indexerDiscovery = cma

[indexer_discovery:cma]
master_uri = https://cma.mydomain.net:8089
pass4SymmKey = ichbineinGeheimnis

[tcpout:cluster_b]
indexerDiscovery = cmb

[indexer_discovery:cmb]
master_uri = https://cmb.mydomain.net:8089
pass4SymmKey = jestemtajemnicą

Note that the default group is cluster_a, so any events generated by the search head will go to cluster_a unless we intervene. Both indexer clusters use discovery through their cluster masters.

How Do We Make This Work?

We want some of our summary index events to go to one or more summary indexes on cluster_b. To do this, we must exploit the options available in the collect command.

The collect command contains two optional arguments of particular interest to our conundrum. First, there is the index option that allows selection of the target summary index. Second, there is the file option that allows the search to override the name of the file produced in var/spool/splunk. Since we can control the file name, we can provide a special inputs.conf stanza to handle file names with a special pattern. In our special inputs.conf stanza, we can specify a special source type. And with a special source type, we can write a props.conf stanza that overrides the destination tcpout group (specified in outputs.conf) that determines which indexers receive our events.

Thus, with the collect command’s index and file options, we can control both the indexer cluster and the index in which our summary events end up.

Let us look at our special inputs.conf stanza:

[batch:///opt/splunk/var/spool/splunk/*tcp_routing_cluster_b_*.stash_new]
crcSalt = <SOURCE>
move_policy = sinkhole
queue = stashparsing
sourcetype = stash_tcp_routing_cluster_b

To keep things self-documenting, we will look for file names that contain ‘tcp_routing_cluster_b_’. Note that this stanza is otherwise identical to the stanza provided by Splunk for normal summary events, but that we change the source type to stash_tcp_routing_cluster_b. Now, let us handle this new source type in a props.conf stanza:

[stash_tcp_routing_cluster_b]
TRUNCATE = 0
HEADER_MODE = firstline
MAX_DAYS_HENCE = 2
MAX_DAYS_AGO = 10000
MAX_DIFF_SECS_AGO = 155520000
MAX_DIFF_SECS_HENCE = 155520000
MAX_TIMESTAMP_LOOKAHEAD = 64
LEARN_MODEL = false
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE_DATE = false
LINE_BREAKER = (\r?\n==#### 1E8N3D4E6V5E7N2T9 ####==\r?\n)
TRANSFORMS-special = tcp_routing_cluster_b, set_sourcetype_to_stash

This stanza is identical to the [stash] stanza provided by Splunk in etc/system/default/props.conf, except that the TRANSFORMS key is changed to include a routing change that we implement in our own transforms.conf:

[tcp_routing_cluster_b]
DEST_KEY = _TCP_ROUTING
REGEX = .
FORMAT = cluster_b

After our rerouting with [tcp_routing_cluster_b], [set_sourcetype_to_stash] (already provided by Splunk) is called to set the source type to the final stash.

Once this configuration is pushed to the search heads, we can write a search to send summary events to a summary index on cluster_b. Let us assume we have an index there called ‘mysummary’.

tag=web user=* earliest=-15min@min latest=@min
| sistats count BY user
| collect index="mysummary" file="tcp_routing_cluster_b_$random$.stash_new"

The collect command will put the results of sistats into a file like var/spool/splunk/tcp_routing_cluster_b_0f1e2d3c4a596876.stash_new. (Using the $random$ field avoids potential file name collisions. Splunk changed this by version 9 to prefix a random string before the specified filename, e.g. eccd58b15734ff30_tcp_routing_cluster_b… in our case, so the use of $random$ is probably superfluous now.) That file will be picked up by our special batch input. Its events will be given our special source type. That source type processing will route the data to cluster_b and switch to the stash source type. At cluster_b, the events will be processed and placed into the index we specified in our search.

There is no way to specify the file name using si- commands, so any reports not using collect should be piped to collect for this to work. Meanwhile, summary indexing should probably be disabled for any such report in Settings->Searches, Reports, and Alerts. The collect command precludes the need for the report to have summary indexing specified, since the results will always be sent to a summary index.

Adaptability

All this configuration happens at the search tier. It can be applied to a stand-alone search head just as well as to a cluster. If you have non-clustered indexers, outputs.conf obviously does not need indexer discovery for that output. The output routing will work regardless, if your outputs are set up correctly for the search head in general.