Tripwire should be back online and updated to 1.13 - please refresh your browsers. A hard refresh might be required but shouldn’t be.
I re-enabled ship mass tracking feature for now. If it causes problems again I might have to disable it.
this worked great for a few days after the reset, but the last 48-72 hours have been hit or miss, mostly miss - seeing a lot of 502/504
Something is going on with the PHP servers. I have not yet solved the issue as it always seems to happen when I am away/working.
I am monitoring and trying to resolve the issues.
Thanks you for your time & effort.
Think I finally got the 502s figured out. A table suddenly needed an index.
was going pretty good for about 24h and then just up and died now
maybe the mass tracking?
Making an official public cry for help…
Tripwire 502s are being caused by database issues. The signature/wormhole tables (InnoDB) are becoming locked for long periods of time.
I’ve blindly tried adding indexes and other things to try and fix the issue and have run out of even ideas to try. Tripwire needs an DBA (Database Administrator) that can look at the structure and make professional recommendations on proper setup to prevent table locks - if anyone is one or knows one PLEASE HELP!
You can find the entire database structure in the source code here: https://bitbucket.org/daimian/tripwire
Look in /.docker/mysql/tripwire.sql
I cannot and will not provide data to populate a database - this would destroy Tripwire security, so don’t bother asking.
In my opinion there is no way to be immune to tablelocks by messing around with the DB itself. You can make it better, but never eliminate it.
Most such problems are coding issues. Like starting an read/write transaction and doing lot of BS before commiting it. Or demanding exclusive read/write if you only need to read.
Also, you can probably run queries to check for queued transacations to help track down which transactions that block. I only have experience with Postgres so cant tell you what those commands would be in your case.
As a professional programmer with years of experience, i normally try to have a routine in yor ADMIN CONTROL PANEL specifically for recreate Indexes
WHen i go to create a new index, NEVER do in the database myself, i run my module and check if the index exist name is IX_TABLENAME_FIELDNAME , if not exist create it. I can rerun or rebuild if i mess something.
Anyway, you need check your source code and look what selects are you doing in source code and add to a notepad/editor file this
In medium to small datasets (less than 20 gb) i normally have best results using SIMPLE INEX, not composite index.
Good but confuse …
CREATE INDEX IX_Persons_LastNameMIX ON Persons (LastName, FirstName);
More readble for me in
CREATE INDEX IX_Persons_LastName ON Persons (LastName);
CREATE INDEX IX_Persons_FirstName ON Persons (FirstName);
An i assume you hace not char fields, instead varchar.
innodb doesn’t do table locks. it does row locks.
from your db structure: https://bitbucket.org/daimian/tripwire/src/9c04fdbc0fc95db602b973fd26999f5ae94fb07a/.docker/mysql/tripwire.sql#lines-66
IASM.
SELECT TABLE_NAME,
ENGINE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'database';
you are going to need to do an alter table and fix all these ■■■■■■■ so they appropriately use innodb engine rather than IASM.
My recommendation would be that you start by looking into the query planner explain command. I use the TSQL equivalent a lot when doing these kinds of optimizations to the eve DB. Unfortunately, I know very little about the details of innoDB, but as an example, consider the following query:
UPDATE signatures SET modifiedByID = 0, modifiedByName = "Tripwire" WHERE lifeLeft < NOW() AND lifeLength <> '0' AND type <> 'wormhole';
When executing that query, you’re hinging on the index being created by this key in the signatures table to avoid a whole table scan:
KEY `lifeLeft, lifeLength` (`lifeLeft`,`lifeLength`),
However, it may be buying you less than you expect. You’re about to wander into the deep magic of implementation-specific db engine index construction techniques.
Here are some good links on the topic:
https://www.sql-server-performance.com/composite-indexes/
https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
It’s hard to say exactly what this is doing without looking at the query planner results. However, my guess is that because you also have the “type” in the where clause, the composite index is less useful than you might think. The composite index can be used to rapidly assemble a bunch of rows, but it’s possible that they now need to be merged and scanned to see if they are of type wormhole. It’s possible the innoDB engine is ignoring the index entirely and just doing a full table scan, because it considers the merge-and-filter to be more expensive than the table scan.
My recommendations would be:
- Throw some of these things through the EXPLAIN statement and see if it tells you where the bottlenecks are.
- Try making a triple clustered index that goes (type, lifeLength, lifeLeft) and see if it makes the updates and deletes magically faster.
Note however that if #2 does magically work you probably want to consider the situation a bit more and not just blindly leave that there. A triple index in that order might make your inserts much slower. Use the query planner!
Do you have any more details of the query plan and/or what scenario(s) the performance issue occurs?
inserting sigs? editing? etc
I can assure you that the tables that are causing the issues (signatures & wormholes tables) are in fact InnoDB and are have table locks, which I thought was not possible because they are row locking engines.
Other tables in Tripwire’s DB are actually IASM because they are 99% of the time a read and I don’t have any issues with table locks.
I don’t mind blind suggestions (though yours isn’t completely blind) but I am already making these kind of blind changes myself. I am more looking for someone that can take the time to actually deeply look into the DB and figure out what is going on.
wow, thats some awesome info - thanks. You actually hinted at exactly something I recently tried and thought it fixed the issue only to have the issue come back (#2).
The issues are with the signatures & wormholes tables becoming locked. The cause seems to be according to the DB logs 2 events that are called wormholeClean and wormholeCritical. Which are responsible for removing expired wormholes and their signatures and the other marks wormholes as critical. Here is a sample of the logs:
2020-07-15T17:59:08.370317Z 263 [ERROR] Event Scheduler: [root@localhost][tripwire.wormholeClean] Lock wait timeout exceeded; try restarting transaction
2020-07-15T17:59:08.370367Z 263 [Note] Event Scheduler: [root@localhost].[tripwire.wormholeClean] event execution failed.
2020-07-15T17:59:08.425392Z 422 [ERROR] Event Scheduler: [root@localhost][tripwire.wormholeCritical] Deadlock found when trying to get lock; try restarting transaction
2020-07-15T17:59:08.425421Z 422 [ERROR] Event Scheduler: [root@localhost][tripwire.wormholeCritical] Unknown error
2020-07-15T17:59:08.425427Z 422 [Note] Event Scheduler: [root@localhost].[tripwire.wormholeCritical] event execution failed.
honesty i much rather have a working tripwire than wormholes being removed/made crit automatically
actually on that topic me and my corp often notice that wormholes are made crit or even deleted too soon quite often
Hrmm, I am mad suspicious about those update+innerjoin queries in the wormholeCritical event.
Again, not an expert in innodb and hard to say without the query planner, but according to this gentleman, you might be locking way more of the table rows than you expect: https://stackoverflow.com/questions/6664667/mysql-innodb-locks-on-joined-rows/12251122#12251122
You could try using a with clause, it might help hint the query planner to reduce the amount of rows locked by divorcing the table scans from the update. Again, the explain command will help you figure out whether the change has really made a difference, you should see radically different plans.
Consider this query:
UPDATE signatures s
INNER JOIN wormholes w ON (s.id = initialID OR s.id = secondaryID) AND life = 'stable'
SET modifiedByID = 0, modifiedByName = 'Tripwire', modifiedTime = NOW()
WHERE s.type = 'wormhole' AND lifeLength <> '0' AND DATE_SUB(lifeLeft, INTERVAL 4 HOUR) < NOW();
If we re-write it (I have done this by hand without testing, it may not work without tweaking) so that “wormholes w” is encapsulated in a WITH clause:
WITH w AS (
SELECT initialID, SecondaryID
FROM wormholes
WHERE life = 'stable'
)
UPDATE signatures S
SET s.modifiedByID = 0, s.modifiedByName = 'Tripwire', s.modifiedTime = NOW()
WHERE s.type = 'wormhole' AND s.lifeLength <> '0' AND DATE_SUB(s.lifeLeft, INTERVAL 4 HOUR) < NOW()
AND (s.id IN (w.initialID) OR s.id IN (w.secondaryID));
That might speed it up just by allowing the query planner to unlock a bunch of rows that maybe it wasn’t sure about before. However, this re-write helps show something else as well. Notice the condition used to generate the result set w. It’s only criteria is that the wormhole must be “stable”. How much of your wormhole table is lifetime “stable”?
Provided you’re culling dead wormholes from the table, I’m guessing a lot of it. Which means you’re having to compare every row in the signatures table against not one but two columns of most of the wormholes you know about in order to tell if it needs to be updated.
The database engine might be smart enough to invert the query and make it a “not in” if that reduces the data more, but that’s implementation dependent and I don’t know how innoDB profiles these things. You could try inverting the condition on w and on that last AND clause to hint to the query planner that there’s way more stable wormholes than not, see if excluding rather than including is faster in the average case?
With the INNER JOIN strategy, I wonder if perhaps the database can’t “abort early” for rows which don’t meet some conditions of the WHERE clause, and it’s being forced to lock a tonne of rows in both table that it needn’t because it has to join first before it can confidently execute the WHERE clause and start removing signatures that don’t meet the type condition.
Now, you do have indexes on the both initialID and secondaryID to help with these issues. But again, I’m not certain they’re doing what you expect. It seems to me that query planner would have to tunnel all the way down to the leaf node level for every entry in the signatures row to be sure it’s not in the result, and it has to do that twice, once for the initial and once for the secondary. At the risk of sounding like a cracked record, throw it through the explain statement and see what it’s really doing.
I also noticed the inner join updates yesterday which I found curious as joins are not quite the norm for updates, and will have a look into them sometime today as well
Two general situations we can possibly look at
a) known naturally long lived processes, assess if they are taking more locks than absolutely necessary
b) performance problems where the level of locking is appropriate but the performance is detected as sub-optimal
@Daimian_Mercer the two transactions you mentioned are the processes affected by locks, whilst they may also be the cause, but they could just be the processes affected by ‘something’ taking up locks its needing
The mention of deadlocks is interesting so please try what CCP Bartender is mentioning first to decouple, because if we indeed suspect the same query is causing itself problems, then we just need to go gun ho on performance tuning and reducing amount of locks held as much as possible by those statements
The below will help assess whether the same SQL is causing another instance of the same SQL to lock or something else entirely, if you are able to catch the situation when this occurs or replicate it as needed, you could try analyse the root of the lock by using the mySQL equivalent of sp_who
SHOW FULL PROCESSLIST
It shows all active transactions and their state/status, specifically whether they are waiting for locks to be granted or the transaction that is holding locks, this can also be used to check transactions lasting longer than appropriate irrespective if they have locks or not
Whilst unfortunately it doesn’t tell you explicitly which transaction is blocking another, if there are only a few long lived active transactions at the time, that information could be derived and whom is the offending transaction
Then you could apply an EXPLAIN on the transaction that we think is causing the locks
As a reference however and not sure if this has already been done, the following command shows in-depth detail about deadlocks if they aren’t currently logged in detail
SHOW ENGINE INNODB STATUS
Most relevant is the ‘LATEST DETECTED DEADLOCK’
If you want this continuously dumped than the following needs to be enabled
set global innodb_print_all_deadlocks=1;
External References:
https://www.xaprb.com/blog/2006/07/31/how-to-analyze-innodb-mysql-locks/
Change this to
lifeLeft < DATE_ADD(NOW(), INTERVAL 4 HOUR)
NOW()
is evaluated once at the start of the query and the execution planner should resolve the RHS to a constant. This avoids having to calculate the value for every row in the table and actually use the index on lifeLeft if it has one.