Its been a long time since I analyzed Light Curves, but one thing I do remember is some are easy and can be found by obvious dips while most hidden transits can be 'teased' out by combinations of long term observations done in short intervals, good observations techniques, & good data acquisition. All of that data can and is used to analyze the curves.
In this necessarily simplified version most of that information is unknown to us, i would suspect it is near raw observations. And our instruction is to locate and mark dips in the patterns. An amateur or professional astronomer when analyzing light curves may use multiple methods either alone, in combination or in a sequences of steps (applying various methods of analysis after each step). Eye balling it for dips is just one method and cannot find most but the more obvious transits unless u spend a relatively long time looking individually at each segment of signal along the curve. Something Farmers and non science players resist.
I too fail many of the analysis test due to the same factors as others, ie., patterns that look like noise being most common. Its my opinion that these Control Curve Analysis's were done using methods mentioned above and NOT just using the MK1 eyeball and a simple folding overlay as we must do in game. They were found in data sets that may have been longer than what was presented to us (having more data to look at), and may have used frequency analysis, had sensor bias applied, etc, .. My point being that while they are actually of confirmed transits - the test samples may be from analysis arrived at by methods beyond our abilities in the game. and in some cases it may be that the necessary data used to be confident in the transit results is missing from the portions given to us.
A simple example of this is a test result that shows two defined dips and then a third dip midway between them that looks like random noise.
Many people have complained of failed result like this (I've seen many myself). There is no way a human in a few moments time would get this. My suspicion is that the test result when it was made by the MMOS used a longer sample (where we are just given a segment of the longer observation) where there is more obvious pattern of dips and the 'seemingly random portion' actually fits the data but the transit is obscured by the many reasons that cause such things. I would urge CCP to make sure this is not the case.
If we are to pick the transits out by eye and only a fold technique then the test samples provided should also be quality tested to the same restrictions and not relying on other analysis methods. And if ill chosen samples are not the case - then I would be at a loss to explain some of the failed test samples.
Light curve Analysis is not a 'glance at it and get it right' activity, light dips can be caused by factors other than a transit. and many known factors can obscure an otherwise obvious transit. I understand we are getting un-analyzed data not looked at by anyone (this is the reason for Citizen Science here after all) and the data cant be graded much before hand. The control samples on the other hand can be - and it is there that I would ask this question of each Test Sample.
1) Was the control sample tested using the exact same tools we have in game (or was it tested by other means)
Other than the above issue a couple more issues crop up (these are not criticisms of CCP) that people should keep in mind or strive to improve.
1) Before the Project hit Sisi some introductory info on transits and how the game would work might have been helpful. Links to real world Light Curve Articles would be very instructive.
2) the tutorial used such obvious examples that it doesn't match whats actually presented in the real samples.
Transits, Curves Explained
While using terms like V or Y dips is descriptive of an ideal transits it misses the mark somewhat in important details. the shape is determined by many factors:
In general as it enters and leaves transit (edge of the sun) the light slowly varies near the shoulders of the dip, and then deepens when the planet is near the max of the dip (center of the sun). Ideally we are presented with many data-points during this time
the sample rate - how close together the data is recorded - is a huge factor in the shape of the dip (many samples gives clear results and just one data-point may result in a dip indistinguishable from noise even though it is a real transit)
The relative sizes of the Sun and planet - If the planet covers a sizable portion of the suns disk, there is a large light drop, this gets smaller as the planet covers less of the sun. at a small enough ratio the light can actually get brighter in the middle of the transit due to a washing out effect of atmosphere, dust, etc as the suns intensity is max at it center and less at its edges.
Angle of Transit. many of the transits are not the ideal one that goes thru the center of the sun as seen by us. it can easily be tilted so that the object passes across only along the edge of the sun - never getting near the center.
Sun Activity. Suns just don't shine at a steady intensity except in the grossest sense and many are very variable or even violently variable at regular and/or unpredictable times.
This activity is what cause some graphs to be wave shaped (as explained in the sidebar section). This hides some dips because u should not just be looking for a dip (technically there are 100's of them in each sample as noise) but one that stands out as Lower than the current waveform)
Example: multiple transits in a sample that goes up and down in a repeating pattern may not be obvious because a number of them may be in the pattern at it peak - so that the lowest part of the dip is actually stronger than other parts of the waveform, But it is distinguished by noticing that the dip occurs while the trend of the suns output is going up not down. conversely when it occurs at the lowest part of the waveform it is clearly obvious.
looking at many light curves u can discern that there is a repeating pattern to the suns output that is sometimes not too obvious. And its not always a sine wave in shape. The detrend tool tries to get rid of this variation ( i believe it samples data to do this using the interval in the time slider next to it.)
Sensor Errors. All this data is recorded by light detectors. They are calibrated and have many factors in their construction and design that influence data quality and these can vary over time in known and sometimes unpredictable ways. Being very sensitive to light is the main characteristic. this value is known and kept track of. as the value varies it is applied to the data to compensate so that the current reading will be on the same scale as the previous readings. if this is done incorrectly or a change in value is missed this can make data incorrect and lead to analysis errors.
Noise is inherent in any electronics and is the cause of all the point by point variations u see in the graphs. the more a signal varies up and down from point to point, the closer to the sensors noise the actual received signal is. This is like listening to the AM radio for weak signals where the static is almost as loud or louder than the radio station. If we averaged out the static we likely average out the signal as well (at the least we change it from its actual values).
if your still reading this then you are stronger than most, lol
My point of this Mountain of Text is to help everyone who is not familiar with where these curves come from to understand some (and this is just a small list) of the things that go into collecting the data, why it can be contradictory in cases, and how impossible it is to provide simple known data when u are asking people to classify inherently unknown data samples that no one has seen before.. After all; if they had looked at it before hand - to make sure to give u good samples - there would be no point in giving them to u since by doing so they would have to analyze it themselves.
Whats the Point of All This?
Consensus. its doesn't matter if a few of us get the results wrong or even if the some of the light curve data set is corrupt. overall and over time as more results are in - there will be an agreement by a majority of people that certain curves have transits and others don't. these results will be used to pick the curves to do the more intense analysis on. basically that is the goal here. What complicates all this (even though its a separate thing) is that most of us don't don't do anything unless we get paid for it. (yes you are all bad people)
I would play Project Discovery for no rewards at all to be honest. But also perks are great. Personally since the consensus is the actual goal here, we should be rewarded on that.
Alternative reward plan: After 6 months (or whatever) they run the consensus results and compare them to who was in the highest consensus and who obviously just hit transits randomly, then reward those with very cool meaningful rewards. Then start a new data set (or mix in the old - with no consensus attached - and new data).. that way we can get rewards to be proud of and avoid the Give Me Free Stuff NOW Crowd.
(Like how did I see TWO contracts in Jita for every single new PD skin reward offered - BEFORE The First day of the update was Over?)
If its designed right - it will be accurate with all the 'random clicks' and the 'Pay Me! - Pay Me Now!' Analysis's discarded. from the forums i see that a lot of players don't seem to care for anything but the quickest way to billions in their wallet with no effort at all and say they won't play Project Discovery - good, it might make the results better.
Over all i think this is a very good effort by MMOS and CCP. Its my feeling that it is the control samples that are the issue here.