Concord supports injecting faults into your system using the chaos engineering
tool Gremlin with the gremlin
task as part of
any flow.
Users planning to use gremlin plugin in their concord flow should take care of below items.
Pricing. Please refer below for documentation about pricing
Installing gremlin agents on their respective endpoints. Please refer below for documentation about installation
The gremlin plugin offers below categories of attacks to inject faults into your system:
To be able to use the task in a Concord flow, it must be added as a dependency:
configuration:
dependencies:
- mvn://com.walmartlabs.concord.plugins:gremlin-task:1.32.3
The attacks under the Resource
category starve your application of critical
resources like CPU, memory, IO, or disk and reveal how your service behaves.
The attacks
configuration under this category uses a number of input
parameters that are common for all operations:
action
: Required - The name of the operation to perform.apiKey
: Required - Gremlin Api Keylength
: Required - The length of the attack (seconds)useProxy
: When set to true
uses the proxy host
and port
set in default vars. By default set to false
endPointType
: Type of endPoints - Accepted values are hosts
and containers
. By
default set to hosts
.endPointType
is hosts
targetType
: Type of clients that should be targeted by the attack. Allowed
values are Random
and Exact
. Default is set to Exact
targetList
: Required - when targetType
is Exact
. Input is a list eg.
["client1", "client2"]
targetTags
: Required - when targetType
is Random
. Input is a key/value
pair eg. { "myTagKey": "myTagValue" }
. More information about client tags
can be found in the documentation for advanced gremlin configurationendPointType
is containers
targetType
: Type of clients that should be targeted by the attack. Allowed
values are Random
and Exact
. Default is set to Exact
containerIds
: Required - when targetType
is Exact
. Input is a list eg.
["containerId1", "containerId2"]
containerLabels
: Required - when targetType
is Random
. Input is a key/value
pair eg. { "myContainerLabelKey": "myContainerLabelValue" }
.containerCount
: Interger value - Number of docker containers that you want to attack that have a specific label. Can be used when containerLabels
parameter is set. Defaults to 1
.The cpu
action of the gremlin
task can be used to generate high load for one
or more CPU cores. The following parameter is needed in addition to the general
parameters:
cores
: Required - The number of CPU cores to hog- task: gremlin
in:
action: cpu
apiKey: myApiKey
cores: 1
length: 15
targetType: Random
targetTags: { "myTagKey": "myTagValue" }
The memory
action of the gremlin
task can be used to allocate a specific
amount of RAM to be consumed. The following parameters are needed in addition to
the general parameters:
unitOption
: Required - Allowed values are GB
, MB
, PERCENT
memoryUnits
: Required - When unitOption
is GB
or MB
. The number of MB/GBs to allocatememoryPercent
: Required - When unitOption
is PERCENT
, e.g. 10 is 10%Example with absolute value usage:
- task: gremlin
in:
action: memory
apiKey: myApiKey
unitOption: MB
memoryUnits: 100
length: 15
targetType: Random
targetTags: { "myTagKey": "myTagValue" }
Example with percent usage:
- task: gremlin
in:
action: memory
apiKey: myApiKey
unitOption: PERCENT
memoryPercent: 10
length: 15
targetType: Exact
targetList: ["client1", "client2"]
The disk
action of the gremlin
task can be used to write files to disk to
fill it to a specific percentage. The following parameters are needed in
addition to the general parameters:
dir
: Required - The root directory to run the disk attackworkers
: The number of disk-write workers to execute. Default is 1blockSize
: Number of Kilobytes (KB) that are read/written at a time. Default is 5percent
: Required - Percent of Volume to fill (0-100)- task: gremlin
in:
action: disk
apiKey: myApiKey
length: 15
dir: myDir
workers: 3
blockSize: 5
percent: 25
targetType: Exact
targetList: ["client1", "client2"]
The io
action of the gremlin
task can be used to put read/write pressure on
I/O devices such as hard disks. The following parameters are needed in addition
to the general parameters:
dir
: Required - The root directory to run the io attackmode
: Required - The io mode to execute [r,w,rw]workers
: The number of io workers to execute. Default is 1blockSize
: Number of Kilobytes (KB) that are read/written at a time. Default is 5blockCount
: The number of blocks read/written by workers. Default is 5- task: gremlin
in:
action: io
apiKey: myApiKey
length: 10
dir: myDir
workers: 3
mode: rw
blockSize: 6
blockCount: 4
targetType: Exact
targetList: ["client1", "client2"]
Example: Using endPointType
as containers
- task: gremlin
in:
action: memory
apiKey: myApiKey
unitOption: PERCENT
memoryPercent: 10
length: 15
endPointType: containers
targetType: Exact
containerIds:
- "myContainerId1"
- "myContainerId2"
- task: gremlin
in:
action: memory
apiKey: myApiKey
unitOption: PERCENT
memoryPercent: 10
length: 15
endPointType: containers
targetType: Random
containerCount: 2
containerLabels: { "myLabelKey": "myLabelValue" }
The attacks under the State
category introduce chaos into your infrastructure,
so that you can observe how well your service handles it or fails.
The attacks
configurations under this category use a number of input
parameters that are common for all operations:
action
: Required - The name of the operation to perform.apiKey
: Required - Gremlin Api Keylength
: Required - The length of the attack (seconds)useProxy
: When set to true
uses the proxy host
and port
set in default vars. By default set to false
endPointType
: Type of endPoints - Accepted values are hosts
and containers
. By
default set to hosts
.endPointType
is hosts
targetType
: Type of clients that should be targeted by the attack. Allowed
values are Random
and Exact
. Default is set to Exact
targetList
: Required - when targetType
is Exact
. Input is a list eg.
["client1", "client2"]
targetTags
: Required - when targetType
is Random
. Input is a key/value
pair eg. { "myTagKey": "myTagValue" }
. More information about client tags
can be found in the documentation for advanced gremlin configurationendPointType
is containers
targetType
: Type of clients that should be targeted by the attack. Allowed
values are Random
and Exact
. Default is set to Exact
containerIds
: Required - when targetType
is Exact
. Input is a list eg.
["containerId1", "containerId2"]
containerLabels
: Required - when targetType
is Random
. Input is a key/value
pair eg. { "myContainerLabelKey": "myContainerLabelValue" }
.containerCount
: Interger value - Number of docker containers that you want to attack that have a specific label. Can be used when containerLabels
parameter is set. Defaults to 1
.The shutdown
action of the gremlin
task can be used to reboot or halt the
host operating system to test how your system behaves when losing one or more
cluster machines. The following parameters are needed in addition to the general
parameters:
delay
: The number of minutes to delay before shutting down. Default is 1
minutereboot
: Indicates the host should reboot after shutting down. Default true
- task: gremlin
in:
action: shutdown
apiKey: myApiKey
delay: 1
reboot: true
targetType: Random
targetTags: { "myTagKey": "myTagValue" }
The timeTravel
action of the gremlin
task can be used to change the host’s
system time. This can be used to simulate adjusting to daylight saving time and
other time-related events. The following parameters are needed in addition to
the general parameters:
offset
: The offset (+/-) to the current time (seconds). Default +5
secondsntp
: Disable NTP from correcting systemtime. Default value is set to false
- task: gremlin
in:
action: timeTravel
apiKey: myApiKey
length: 15
offset: -100
targetType: Random
targetTags: { "myTagKey": "myTagValue" }
The processKiller
action of the gremlin
task can be used to kill a specified
process. This can be used to simulate application or dependency crashes. The
following parameters are needed in addition to the general parameters:
interval
: The number of seconds to delay before kills. Default 5 secondsprocess
: Required - The process name to match (allows regex) or the process IDgroup
: The group name or ID to match against (name matches only)user
: The user name or ID to match against (name matches only)newest
: If set the newest matching process will be killed (name matches
only, cannot be used with -o). Default set to false
oldest
:If set the oldest matching process will be killed (name matches only,
cannot be used with -n). Default set to false
exact
: If set the match must be exact and not just a substring match (name
matches only). Default to false
killChildren
: If set the processes children will also be killed. Default to
false
fullMatch
: If set the processes name match will occur against the full
command line string that the process was launched with. Default to false
task: gremlin
in:
action: processKiller
apiKey: myApiKey
length: 15
interval: 10
process: myProcess
newest: true
targetType: Random
targetTags: { "myTagKey": "myTagValue" }
Example: Using endPointType
as containers
task: gremlin
in:
action: processKiller
apiKey: myApiKey
length: 15
interval: 10
process: myProcess
newest: true
targetType: Exact
containerIds:
- "myContainerId1"
- "myContainerId2"
- task: gremlin
in:
action: timeTravel
apiKey: myApiKey
length: 15
offset: -100
endPointType: containers
targetType: Random
containerCount: 2
containerLabels: { "myLabelKey": "myLabelValue" }
The attacks under the Network
category allow you to see the impact of lost or
delayed traffic to your application. You can test how your service behaves when
you are unable to reach one of your dependencies, internal or external. You can
limit the impact to only the traffic you want to test by specifying ports,
hostnames, and IP addresses.
The attacks
configuration under this category uses a number of input
parameters that are common for all operations:
action
: Required - The name of the operation to perform.apiKey
: Required - Gremlin Api Keylength
: Required - The length of the attack (seconds)useProxy
: When set to true
uses the proxy host
and port
set in default vars. By default set to false
endPointType
: Type of endPoints - Accepted values are hosts
and containers
. By
default set to hosts
.endPointType
is hosts
targetType
: Type of clients that should be targeted by the attack. Allowed
values are Random
and Exact
. Default is set to Exact
targetList
: Required - when targetType
is Exact
. Input is a list eg.
["client1", "client2"]
targetTags
: Required - when targetType
is Random
. Input is a key/value
pair eg. { "myTagKey": "myTagValue" }
. More information about client tags
can be found in the documentation for advanced gremlin configurationendPointType
is containers
targetType
: Type of clients that should be targeted by the attack. Allowed
values are Random
and Exact
. Default is set to Exact
containerIds
: Required - when targetType
is Exact
. Input is a list eg.
["containerId1", "containerId2"]
containerLabels
: Required - when targetType
is Random
. Input is a key/value
pair eg. { "myContainerLabelKey": "myContainerLabelValue" }
.containerCount
: Interger value - Number of docker containers that you want to attack that have a specific label. Can be used when containerLabels
parameter is set. Defaults to 1
.The blackhole
action of the gremlin
task can be used drop all matching
network traffic. The following parameters are needed in addition to the general
parameters:
ipAddresses
: Required - Impact traffic to these IP addressesdevice
: Impact traffic over this network interfacehostnames
: Only impact traffic to these hostnames. Whitelist a host with a
leading ^
egressPorts
: Only impact egress traffic to these destination ports. Ranges
work too: 8080-8085
ingressPorts
: Only impact ingress traffic on these incoming ports. Ranges
work too: 8080-8085
protocol
: Only impact traffic using this IP protocol. Allowed values are
TCP, UDP, ICMP. Defaults to all protocols- task: gremlin
in:
action: blackhole
apiKey: myApiKey
length: 15
ipAddresses: "ipAddress1, ipAddress2"
device: "myDevice"
hostnames: "host1.com, host2.com"
egressPorts: "egPort1, egPort2"
ingressPorts: "ingPort1, ingPort2"
protocol: UDP
targetType: Exact
targetList: ["client1", "client2"]
The latency
action of the gremlin
task can be used to inject latency into
all matching egress network traffic. The following parameters are needed in
addition to the general parameters:
ipAddresses
: Required - Only impact egress traffic to these IP addressesdevice
: Impact traffic over this network interfacehostnames
: Only impact traffic to these hostnames. Whitelist a host with a
leading ^
egressPorts
: Only impact egress traffic to these destination ports, ranges
are supported with 8080-8085
sourcePorts
: Only impact egress traffic from these source ports, ranges are
suppored with 8080-8085
delay
: How long to delay egress packets millis
protocol
: Only impact traffic using this IP protocol. Allowed values are
TCP, UDP, ICMP Defaults to all protocols- task: gremlin
in:
action: latency
apiKey: myApiKey
length: 15
ipAddresses: "ipAddress1, ipAddress2"
device: "myDevice"
hostnames: "host1.com, host2.com"
egressPorts: "egPort1, egPort2"
sourcePorts: "sPort1, sPort2"
delay: 100
protocol: ICMP
targetType: Exact
targetList: ["client1", "client2"]
The packetLoss
action of the gremlin
task can be used to induce packet loss
into all matching egress network traffic. The following parameters are needed in
addition to the general parameters:
ipAddresses
: Required - Only impact traffic to these IP addressesdevice
: Impact traffic over this network interfacehostnames
: Only impact traffic to these hostnames. Whitelist a host with a
leading ^
egressPorts
: Only impact egress traffic to these destination ports, ranges
work too: 8080-8085
sourcePorts
: Only impact egress traffic from these source ports, ranges work
too: 8080-8085
percent
: Percentage of packets to drop (10 is 10%). Default is set to 1
corrupt
: Corrupt packets instead of simply dropping them. Default is set to
false
.protocol
: Only impact traffic using this IP protocol. Allowed values are
TCP, UDP, ICMP Defaults to all protocols- task: gremlin
in:
action: packetLoss
apiKey: myApiKey
length: 15
ipAddresses: "ipAddress1, ipAddress2"
device: "myDevice"
hostnames: "host1.com, host2.com"
egressPorts: "egPort1, egPort2"
sourcePorts: "sPort1, sPort2"
percent: 5
corrupt: true
protocol: ICMP
targetType: Exact
targetList: ["client1", "client2"]
The dns
action of the gremlin
task can be used to block access to DNS
servers. The following parameters are needed in addition to the general
parameters:
ipAddresses
: Required - Impact traffic to these IP addressesdevice
: Impact traffic over this network interfaceprotocol
: Only impact traffic using this IP protocol. Allowed values are
TCP, UDP. Defaults to all protocols- task: gremlin
in:
action: dns
apiKey: myApiKey
length: 15
ipAddresses: "ipAddress1, ipAddress2"
device: "myDevice"
protocol: UDP
targetType: Random
targetTags: { "myTagKey": "myTagValue" }
Example: Using endPointType
as containers
- task: gremlin
in:
action: packetLoss
apiKey: myApiKey
length: 15
ipAddresses: "ipAddress1, ipAddress2"
device: "myDevice"
hostnames: "host1.com, host2.com"
egressPorts: "egPort1, egPort2"
sourcePorts: "sPort1, sPort2"
percent: 5
corrupt: true
protocol: ICMP
targetType: Exact
containerIds:
- "myContainerId1"
- "myContainerId2"
- task: gremlin
in:
action: dns
apiKey: myApiKey
length: 15
ipAddresses: "ipAddress1, ipAddress2"
device: "myDevice"
protocol: UDP
targetType: Random
containerCount: 2
containerLabels: { "myLabelKey": "myLabelValue" }
The halt
action of the gremlin
task can be used to idempotently halt the
specified active attack.
action
: Required halt
- The name of the operation to perform.apiKey
: Required - Gremlin API KeyattackGuid
: Required - GUID of the attack.- task: gremlin
in:
action: halt
apiKey: myApiKey
attackGuid: attackGuid
The performed halt
action is identical to a manual usage of the
Gremlin app with the halt
button against the
specified active attack.