Devblog: Exoplanets: The Next Phase Of Project Discovery

Its been a long time since I analyzed Light Curves, but one thing I do remember is some are easy and can be found by obvious dips while most hidden transits can be ‘teased’ out by combinations of long term observations done in short intervals, good observations techniques, & good data acquisition. All of that data can and is used to analyze the curves.

In this necessarily simplified version most of that information is unknown to us, i would suspect it is near raw observations. And our instruction is to locate and mark dips in the patterns. An amateur or professional astronomer when analyzing light curves may use multiple methods either alone, in combination or in a sequences of steps (applying various methods of analysis after each step). Eye balling it for dips is just one method and cannot find most but the more obvious transits unless u spend a relatively long time looking individually at each segment of signal along the curve. Something Farmers and non science players resist.

I too fail many of the analysis test due to the same factors as others, ie., patterns that look like noise being most common. Its my opinion that these Control Curve Analysis’s were done using methods mentioned above and NOT just using the MK1 eyeball and a simple folding overlay as we must do in game. They were found in data sets that may have been longer than what was presented to us (having more data to look at), and may have used frequency analysis, had sensor bias applied, etc, … My point being that while they are actually of confirmed transits - the test samples may be from analysis arrived at by methods beyond our abilities in the game. and in some cases it may be that the necessary data used to be confident in the transit results is missing from the portions given to us.

A simple example of this is a test result that shows two defined dips and then a third dip midway between them that looks like random noise.

Many people have complained of failed result like this (I’ve seen many myself). There is no way a human in a few moments time would get this. My suspicion is that the test result when it was made by the MMOS used a longer sample (where we are just given a segment of the longer observation) where there is more obvious pattern of dips and the ‘seemingly random portion’ actually fits the data but the transit is obscured by the many reasons that cause such things. I would urge CCP to make sure this is not the case.

If we are to pick the transits out by eye and only a fold technique then the test samples provided should also be quality tested to the same restrictions and not relying on other analysis methods. And if ill chosen samples are not the case - then I would be at a loss to explain some of the failed test samples.

Light curve Analysis is not a ‘glance at it and get it right’ activity, light dips can be caused by factors other than a transit. and many known factors can obscure an otherwise obvious transit. I understand we are getting un-analyzed data not looked at by anyone (this is the reason for Citizen Science here after all) and the data cant be graded much before hand. The control samples on the other hand can be - and it is there that I would ask this question of each Test Sample.

  1. Was the control sample tested using the exact same tools we have in game (or was it tested by other means)

Other than the above issue a couple more issues crop up (these are not criticisms of CCP) that people should keep in mind or strive to improve.

  1. Before the Project hit Sisi some introductory info on transits and how the game would work might have been helpful. Links to real world Light Curve Articles would be very instructive.

  2. the tutorial used such obvious examples that it doesn’t match whats actually presented in the real samples.

Transits, Curves Explained

While using terms like V or Y dips is descriptive of an ideal transits it misses the mark somewhat in important details. the shape is determined by many factors:

In general as it enters and leaves transit (edge of the sun) the light slowly varies near the shoulders of the dip, and then deepens when the planet is near the max of the dip (center of the sun). Ideally we are presented with many data-points during this time

  • the sample rate - how close together the data is recorded - is a huge factor in the shape of the dip (many samples gives clear results and just one data-point may result in a dip indistinguishable from noise even though it is a real transit)

  • The relative sizes of the Sun and planet - If the planet covers a sizable portion of the suns disk, there is a large light drop, this gets smaller as the planet covers less of the sun. at a small enough ratio the light can actually get brighter in the middle of the transit due to a washing out effect of atmosphere, dust, etc as the suns intensity is max at it center and less at its edges.

  • Angle of Transit. many of the transits are not the ideal one that goes thru the center of the sun as seen by us. it can easily be tilted so that the object passes across only along the edge of the sun - never getting near the center.

  • Sun Activity. Suns just don’t shine at a steady intensity except in the grossest sense and many are very variable or even violently variable at regular and/or unpredictable times.

This activity is what cause some graphs to be wave shaped (as explained in the sidebar section). This hides some dips because u should not just be looking for a dip (technically there are 100’s of them in each sample as noise) but one that stands out as Lower than the current waveform)

Example: multiple transits in a sample that goes up and down in a repeating pattern may not be obvious because a number of them may be in the pattern at it peak - so that the lowest part of the dip is actually stronger than other parts of the waveform, But it is distinguished by noticing that the dip occurs while the trend of the suns output is going up not down. conversely when it occurs at the lowest part of the waveform it is clearly obvious.

looking at many light curves u can discern that there is a repeating pattern to the suns output that is sometimes not too obvious. And its not always a sine wave in shape. The detrend tool tries to get rid of this variation ( i believe it samples data to do this using the interval in the time slider next to it.)

  • Sensor Errors. All this data is recorded by light detectors. They are calibrated and have many factors in their construction and design that influence data quality and these can vary over time in known and sometimes unpredictable ways. Being very sensitive to light is the main characteristic. this value is known and kept track of. as the value varies it is applied to the data to compensate so that the current reading will be on the same scale as the previous readings. if this is done incorrectly or a change in value is missed this can make data incorrect and lead to analysis errors.

  • Noise is inherent in any electronics and is the cause of all the point by point variations u see in the graphs. the more a signal varies up and down from point to point, the closer to the sensors noise the actual received signal is. This is like listening to the AM radio for weak signals where the static is almost as loud or louder than the radio station. If we averaged out the static we likely average out the signal as well (at the least we change it from its actual values).

  • More reasons


if your still reading this then you are stronger than most, lol

My point of this Mountain of Text is to help everyone who is not familiar with where these curves come from to understand some (and this is just a small list) of the things that go into collecting the data, why it can be contradictory in cases, and how impossible it is to provide simple known data when u are asking people to classify inherently unknown data samples that no one has seen before… After all; if they had looked at it before hand - to make sure to give u good samples - there would be no point in giving them to u since by doing so they would have to analyze it themselves.

Whats the Point of All This?

Consensus. its doesn’t matter if a few of us get the results wrong or even if the some of the light curve data set is corrupt. overall and over time as more results are in - there will be an agreement by a majority of people that certain curves have transits and others don’t. these results will be used to pick the curves to do the more intense analysis on. basically that is the goal here. What complicates all this (even though its a separate thing) is that most of us don’t don’t do anything unless we get paid for it. (yes you are all bad people)

I would play Project Discovery for no rewards at all to be honest. But also perks are great. Personally since the consensus is the actual goal here, we should be rewarded on that.

Alternative reward plan: After 6 months (or whatever) they run the consensus results and compare them to who was in the highest consensus and who obviously just hit transits randomly, then reward those with very cool meaningful rewards. Then start a new data set (or mix in the old - with no consensus attached - and new data)… that way we can get rewards to be proud of and avoid the Give Me Free Stuff NOW Crowd.

(Like how did I see TWO contracts in Jita for every single new PD skin reward offered - BEFORE The First day of the update was Over?)

If its designed right - it will be accurate with all the ‘random clicks’ and the ‘Pay Me! - Pay Me Now!’ Analysis’s discarded. from the forums i see that a lot of players don’t seem to care for anything but the quickest way to billions in their wallet with no effort at all and say they won’t play Project Discovery - good, it might make the results better.

Over all i think this is a very good effort by MMOS and CCP. Its my feeling that it is the control samples that are the issue here.

6 Likes

Xposting Scott Manley’s feedback from youtube:

After a failed analysis we should be able to continue analyzing the data to see what we did wrong before clicking Finish

I spotted:

It’s possible to get stuck in tutorial 2. Mark the first transit, then mark a nonadjacent transit, then click the fold button. I can’t find any way to zoom out and set the correct period once the folding tutorial is activated.

What I do not understand is, why do you need people to look at this samples at all? What’s in here an algorithm can’t do thousand times faster and more efficient?

1 Like

When I click Discard from fold view, the zoom level should return to whatever it was before I clicked Fold.

edit Other approaches would work too, like doubleclicking to toggle 100% and 20%. When looking at the not-obvious data sets I want to be zoomed in the whole time to spot subtle transits. I have to precisely mouse and drag the window edges back. I can use scrollwheel, but I have to scroll 10 times to go from 100% to 20% which is even less practical than drag handles.

Not sure if this have been brougth up.

I’m sure I talk for most people when i want a bigger window, or the possibility to make it bigger than the current maximum. Even with the zoom feature you are adding (hoping this comes fast) Our brain would have a easier time detecting patterns if our samples were bigger. I’m already operating on a 27 inch, and i feel we are only using half the screen. Accurcy is what were after, why not give us better working conditions?

It would be nice with a second line that count hours, and not just days. Also like a mark up tool that says in hours how long current interval is. I feel the current layout is very uninformative. This is something we use alot in music production to help with repetative parts.

Adding Arrow keys to push zoom area back and forth would be a nice addition.
:slight_smile:

1 Like

Again, 200194218 only every second dip is accepted as correct. There is no visual difference between the dips. CCP … FIX YOUR STUFF!

here you find some of the failed control curves that I had

Bang on! Full screen AND zoom would make our work much easier.

and while I’m here… it would be helpful to be able to analyze our errors, or get some more detailed feedback about what we’ve missed. I’m not sure if what we’re currently getting was made vague in order to keep treasure hunters from gaming the system, but for my part, I find it immensely frustrating to miss a subtle transit and be none the wiser for the absence of constructive/instructive feedback.

1 Like

Reached “intermediate analyst” rating at level 25 … lol. The only way to keep accuracy reasonable high is to learn the control samples and skip if you encounter a wrong/impossible one, or give the “right” answer if you remember.

Because most samples with transits are control samples, if you spot something like an obscure transit better skip unless you know the “right” solution.

It was a nice ride to get here, but the rewards are not worth the further grind.

I feel the interest around this is huge, and the potential is great. But for some reason, it seems that we are getting somewhat limited in our eagerness to help out. Wether this is choosen due to scientific method, or some obscure reason to to keep it mystified and game like, I don’t know. Is CCP worried that if we get better tools, we will exploit the feature?

EDIT:
A temporary solution to increase the windowsize for PD is to increase the UI size. 125% is what i found works best :slight_smile:

I have been toying with the project Discovery a bit and I got to say that while it is fun, it gives me sometime weird (unjustified ?) results.

Take this one for example:
image

It says that I failed, when my markers are right on the spot.

Or in this one:
image
It mark the ‘correct’ answers where there are no sharp changes, nor patterns.

So I am curious a little bit about the validity of the results that we are compared against.
Some explanations about those results would be welcomed to avoid further frustration and ensure the success of project Discovery.

Thanks.

3 Likes

Maybe its just me but this whole thing confused the hell out of me even after the tutorial.

2 Likes

Literally exactly this, I was going to make this post, thanks for making it for me.

As a super new player I was enjoying the Cell staining thing. Wile this looks cool I failed most of the tutorial ones. I was really enjoying the cell thing. Is there going to be a way to bounce between the 2? Id like to do both. FOR SCIENCE! but this picking dips in a wave of dips makes my brain hurt trying to figure it out. Its new I’m sure ill get the hang of it. But ouch new brain cell connections at 1am after working all evening…

I submitted this earlier. I have no idea of what’s wrong with my attempts. It is very very confusing.

The variation looks too big to be a planet.

These are impossible to spot. I’ve failed the last 12 evaluation samples and dropped 10% accuracy because I keep getting this garbage that’s impossible to spot, I’m fast losing the will to bother with discovery.

You used detrend, which marked the solar activity this way. if you would not detrend, youd see its actually no drop.

I wish there was a button to divide the orbit period by an integer and see if the pattern is still there in the folded view. Would help a lot with samples like

The sample shows us that one star is a bit bigger then another one, which is marked by greater transition amplitude sink (1.5- 2 times). It will be correct to mark them with different Epoch values and identical periods. Mostly, binary stars haven’t planets.