Troubleshooting Splunk Firehose Nozzle for PCF

This topic describes how to troubleshoot Splunk Firehose Nozzle for Pivotal Cloud Foundry (PCF).

To view the nozzle’s logs running on PCF do the following:

  1. Login via the cli using the admin user.
  2. Target the org created by the tile
    • cf target -o splunk-nozzle-org
  3. View the recent app Splunk Nozzle logs (the version number installed by the tile will vary)
    • cf logs --recent splunk-firehose-nozzle-1.0.0
  4. Or stream the app logs as they’re emitted
    • cf logs splunk-firehose-nozzle-1.0.0

Troubleshooting Walkthrough

The following assumes you have access to Splunk to run basic searches against index _internal and the user-specified index for Firehose events.

1. Ensure Splunk indexer(s) are listening to data on configured port:

Search internal logs of indexer(s), specifically the TCP input processor, as follows:

  index=_internal sourcetype=splunkd component=TcpInputProc 

A correct setup shows a following event at startup time:

10-13-2016 17:31:16.226 +0000 INFO TcpInputProc - Creating fwd data Acceptor for IPv4 port 9997 with SSL

2. Ensure Splunk Nozzle is connected to Splunk indexer(s):

Search internal logs of nozzle, specifically the TCP output processor, to confirm that the Splunk Nozzle is correctly forwarding:

  index=_internal host="splunk-nozzle_*" sourcetype=splunkd component=TcpOutputProc

A correct setup shows an event as follows:

11-04-2016 20:24:13.702 +0000 INFO TcpOutputProc - Connected to idx=:9997 using ACK.

If no such event is found, then no data is being forwarded from the Nozzle. This can be due to misconfigured indexer(s) firewall, or misconfigured SSL settings of either Nozzle (forwarder) or indexer(s). Search internal logs of Splunk indexer(s) for any errors:

  index=_internal sourcetype=splunkd component=TcpInputProc ERROR

An example error would be: 11-03-2016 22:23:00.046 +0000 ERROR TcpInputProc - Error encountered for connection from src=:55713. error:140760FC:SSL routines:SSL23_GET_CLIENT_HELLO:unknown protocol

Whether you are using self-signed certificates or certificates signed by a third party, follow these instructions to provide your signed certificates details in a correct format when configuring the tile. For more details on troubleshooting and validating your forwarder/receiver connection, see:

3. Ensure Splunk Nozzle is forwarding events from the Firehose:

Search app logs of the Nozzle to confirm correct behavior:

  sourcetype="cf:splunknozzle"

A correct setup logs a start message with configuration parameters of the Nozzle logged as a JSON object, for example:

  data: {
     add-app-info: true
     api-endpoint: https://api.endpoint.com
     app-cache-ttl: 0
     app-limits: 0
     batch-size: 1000
     boltdb-path: cache.db
     branch: null
     buildos: null
     commit: null
     debug:  false
     extra-fields:
     flush-interval: 5000000000
     hec-workers: 8
     ignore-missing-apps: true
     job-host:
     job-index: -1
     job-name: splunk-nozzle
     keep-alive: 25000000000
     missing-app-cache-ttl:  0
     queue-size: 10000
     retries: 2
     skip-ssl: true
     splunk-host: http://localhost:8088
     splunk-index: atomic
     splunk-version: 6.6
     subscription-id: splunk-firehose
     trace-logging: true
     version:
     wanted-events: ValueMetric,CounterEvent,Error,LogMessage,HttpStartStop,ContainerMetric
  }
  ip: 10.0.0.0
  log_level: 1
  logger_source: splunk-nozzle-logger
  message: splunk-nozzle-logger.Running splunk-firehose-nozzle with following configuration variables
  origin: splunk_nozzle

Search app logs of the Nozzle for any errors:

  sourcetype="cf:splunknozzle" data.error=*

Errors are logged with corresponding message and stacktrace.

4. Check for dropped events due to HTTP Event Collector availability:

As the Splunk Firehose Nozzle is sending data to Splunk via HTTPS using the HTTP Event Collector it is also susceptible to any network issues across the network path from point to point. Run the following search to determine if Splunk has indexed any events indicating issues with the HEC Endpoint.

  sourcetype="cf:splunknozzle" "dropping events"

5. Check for data loss inside the Splunk Firehose Nozzle:

If “Event Tracing” is enabled, extra metadata will be attached to events allowing searches to calculate the percentage of data loss inside the Splunk Firehose Nozzle, if any.

Each instance of the Splunk Firehose Nozzle will run with a randomly generated UUID. The below query will display the message success rate for each UUID.

  index=main | stats count as total_events , max(nozzle-event-counter) as max_number by uuid | eval success_percentage=(total_events/max_number) * 100 | table success_percentage
Create a pull request or raise an issue on the source for this page in GitHub