A few questions for the Project Discovery scientists re: gating parameters

As the subject reveals, I have a few questions for the scientists who are working with our flow cytometry data, now that I’ve been chewing through slides for a week or so. The nature of my questions are how to produce the most usable data, rather than how to score high for rewards since the two may not always be the same (i.e. always matching the parameters of a scoring template would never produce novel results).

  1. Some capsuleers have discovered that you can score 99% on a gold standard test slide if you just draw huge boxes that cover the entire slide including empty areas, as long as those boxes encapsulate the two clusters. This is sufficient to match the scoring algorithm, but does this satisfy your intent, or is it preferable to keep the gates as tight to the clusters as possible?

  2. What to do with scattered cells, outliers and cells that are uniformly distributed between clusters? Do you want every cell in a slide gated, or only those that seem to legitimately belong to a cluster? Some cells are in the vicinity of a cluster, but distant and diffuse enough that they don’t seem part of it: should they be included in the gate? A lot of times two clusters will have a diffuse halo of satellites that blend into each other: should we restrict the gates to what seems to be the un-shared boundary of each cluster, or just divide the diffuse smear of cells evenly between the two? Some slides have obvious clusters that seem to be amidst a uniform distribution of background cells that don’t seem to really be part of a cluster. Should those cells be included inside a gate anyway, or just circle the clear clusters? The underlying question behind most of this is: are you more interested in Sensitivity or Specificity?

  3. There’s a lot of different patterns in these slides, but all of the Gold Standard test slides seem to be variants of the same thing: one dense cluster at the top of the screen, and a smaller diffuse cluster at the bottom of the screen. Given that the gold standards aren’t just a test for accuracy, but a way to provide learning-feedback, why is every gold standard slide basically the same?

  4. Speaking of Gold Standard slides…some of us have come across gold standard slides that are clearly, unambiguously wrong. You’ll have a dense cluster at top and a smaller more diffuse cluster at bottom with a wide gap of completely empty space in between the two–the boundaries are unambiguous. But during the scoring stage you’ll find the gold standard gate encompasses the top cluster, reaches all the way across the large empty void and snags half of the cells from the bottom cluster, basically cutting the bottom cluster in half. Then a second gate captures the lower half of the bottom cluster. The gold standard is clearly wrong in these examples, by any definition of the term “cluster”. What is up with that?

  5. I know clusters aren’t supposed to overlap, by definition. However, I’ve encountered examples of two stretched-out clusters that seem to intersect and pass through each other at right angles–like comet tails forming a cross. What is the most useful way to gate this?

I know how to get a high accuracy score (draw huge boxes that cover the entire field and include every single cell, including wide empty regions. But I’d like to know any general additional feedback you can provide on gating parameters that will be most useful for your research , in particular the issue of sensitivity vs. specificity and whether to be as inclusive as possible of all cells into a gate, or to be more exclusionary of outliers and diffusely-spread cells.


Good questions!
I am tired of taking time and logically placing boundaries where the evidence points only to be told that standard indicates something significantly different ( and sometimes wrong). Perhaps I should join the big box crowd and raise my percentages at the mere cost of producing nearly useless data. Received my Marshal from the last PD, but this one frustrates me with its Rorschach test-like mechanics.

More science, less impressions would be fine with me.


I’d like to agree with the comments above - I’d like some better worked examples of non - binary results, in order to provide a best match


1 Like

I eventually stopped doing PD because it doesn’t feel like its being supported. None of the questions people posted have received any answers that I’m aware of.

Is there a benefit having a 99,9% score?

My wife loves Project Discovery and was searching for answers to the same questions you have. She discovered this podcast transcript from the scientists and collaborators that are working on the Covid -19 project discovery data. (I have provided the podcast transcript link below)

In this transcript you can see that there really isn’t a specific set of rules for data interpretation for us data analyzers. In fact they want us to use our best judgments so as to find a pattern or something else completely they haven’t seen before. And also line up our finding with the most common interpretations to find some results from all of this data

I guess with that said, use your best judgement and do not get to hung up on what “they” are looking for. Even “they” do not know what they are looking for yet

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.