Difference: Real-BogusClassifications (1 vs. 15)

Revision 152020-10-23 - AshishMahabal

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"
Added:
>
>
This page is mostly about the RF work led by Umaa Rebbapragada (included in https://arxiv.org/abs/1902.01936).

The currently deployed model is braai that uses deep learning and is published here: https://arxiv.org/abs/1907.11259 with the associated GitHub including notebooks.

 

Overview

The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.

Line: 47 to 51
 
    • Classifier error analysis: What do low RB reals look like, what do high RB bogus look like?
    • Report feature importance by correlated feature groups

Data Collection

Changed:
<
<
  • Automated Data Contamination [ Charlotte ] * Find contamination using clustering.
  • Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
>
>
  • Automated Data Contamination [ Charlotte ] * Find contamination using clustering.
  • Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
 
  • Other Data Collection Sources (preferably automated)
    • Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
    • Automate cross matches from relevant catalogs (e.g., TNS)

Revision 142019-02-20 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 36 to 36
 
  • Flag boguses with light curve observations greater than 2.
  • Catch variable stars with isdiffpos=True and false in the light curve
  • Implement proper grid search over RF parameter space
Added:
>
>
  • finish the work with the alternate test set, and getting stats on that.
  • make sure imputation is happening correctly
  • get Tomas's other plots integrated
  • KL divergence code, should be updated.
 
  • Pipeline Analysis to automate:
    • Score improvement on known false positive, negatives (Ragnhild's List)
    • KL divergence between training set, test set features (to find major divergence)
    • Plot feature distributions on reals vs. boguses (Tiara's code)
Changed:
<
<
    • What do low RB reals look like, what do high RB bogus look like?
    • Unlabeled Data
    • Score bias per features
>
>
    • Classifier error analysis: What do low RB reals look like, what do high RB bogus look like?
 
    • Report feature importance by correlated feature groups

Data Collection

  • Automated Data Contamination [ Charlotte ]

Revision 132019-02-19 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 25 to 25
 
t7_f4_c3 08 May 2018
t8_f5_c3 29 May 2018
t12_f5_c3 07 Aug 2018
Added:
>
>
t15_f5_c3 10 Jan 2019
 

To Do List

Deleted:
<
<

Issues

  • Why are candids from IPAC DB not found in alerts db?
  • Why is cross validation performance decreasing steadily since t8
 

Pipeline Development

Changed:
<
<
  • Separate Galactic vs. Extra-Galactic
  • Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]
  • Get an override label module (to override the majority vote)
>
>
  • Get light curve observations from reals into training vetted
 
  • Add cross matches from catalogs as a source (Nadia's from TNS)
  • Flag boguses with light curve observations greater than 2.
  • Catch variable stars with isdiffpos=True and false in the light curve
Deleted:
<
<
  • Kill very old objects (from before Feb 5, 2018)
 
  • Implement proper grid search over RF parameter space
Changed:
<
<
  • Get the testing framework into a Jupiter notebook
  • Get features from Kowalski vs. IPAC? Do same query against Kowalski and see if we recoup candidateIds
  • Pipeline Analysis:
    • Labeled Data Test Set * RB score improvements
>
>
  • Pipeline Analysis to automate:
  * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence) * Plot feature distributions on reals vs. boguses (Tiara's code)
Changed:
<
<
* what do low RB reals look like, what do high RB bogus look like?
>
>
    • What do low RB reals look like, what do high RB bogus look like?
  * Unlabeled Data * Score bias per features * Report feature importance by correlated feature groups
Line: 62 to 50
 
  • Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
  • Other Data Collection Sources (preferably automated)
Changed:
<
<
    • Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
>
>
    • Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
 
    • Automate cross matches from relevant catalogs (e.g., TNS)
  • Improving Quality of GROWTH Marshall Feed
    • ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved. Is my list spectroscopically-confirmed? Email to Ashot and Mani
Line: 71 to 59
 

Open Issues / Experiments

  • Kowalski / IPAC candidate discrepancies
Added:
>
>
  • Separate Galactic vs. Extra-Galactic classifiers
 
  • Pixel Clump issues on certain x,y positions (see Ashish's email to Umaa on 6/28/18)
  • Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
  • Deep Learning
  • Known boguses: ZTF18aabtvch, ZTF18aaiafnn, ZTF18aaizvmy
Deleted:
<
<

Papers:

  • ML Overview (response to Pub Board comments)
  • RB Paper
 

DONE

  • 2018-08 Correlation between nbad and boguses Report Here
Changed:
<
<
>
>
  • 2018-10 combine_labels.py has an override switch, to override the majority vote for examples that have been revetted
  • 2018-11 Filter out old sources (from before Feb 5, 2018)
  • 2018-11 Get features from Kowalski vs. IPAC? Kowalski packets are NOT a superset of IPAC db feats! An experiment limited to the intersection of Kowalski and IPAC db feats showed classifier performance decreased! But found workaround for DB performance issues (Frank gave me a way to get nid, rcid from a candid)
  • 2018-12 Cross validation performance decreasing steadily since t8 due to persistent contamination within the GROWTH marshall feed
  • 2019-01 Get the testing framework into a Jupyter notebook
  -- UmaaRebbapragada - 08 Aug 2018
Line: 93 to 80
 
META FILEATTACHMENT attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"
META FILEATTACHMENT attachment="2018-08-30-t13_f5_c3.pptx" attr="" comment="Analysis of RB version: t13_f5_c3" date="1535748841" name="2018-08-30-t13_f5_c3.pptx" path="2018-08-30-t13_f5_c3.pptx" size="243456" stream="2018-08-30-t13_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
Added:
>
>
META FILEATTACHMENT attachment="2019-01-t15_f5_c3.pptx" attr="" comment="Analysis of RB version: t15_f5_c3" date="1550599262" name="2019-01-t15_f5_c3.pptx" path="2019-01-t15_f5_c3.pptx" size="269747" user="UmaaRebbapragada" version="1"

Revision 122018-09-13 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 35 to 35
 

Pipeline Development

Added:
>
>
  • Separate Galactic vs. Extra-Galactic
 
  • Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]
  • Get an override label module (to override the majority vote)
  • Add cross matches from catalogs as a source (Nadia's from TNS)
Added:
>
>
  • Flag boguses with light curve observations greater than 2.
  • Catch variable stars with isdiffpos=True and false in the light curve
 
  • Kill very old objects (from before Feb 5, 2018)
  • Implement proper grid search over RF parameter space
  • Get the testing framework into a Jupiter notebook
Line: 47 to 50
  * RB score improvements * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence)
Changed:
<
<
* Plot feature distributions on reals vs. boguses
>
>
* Plot feature distributions on reals vs. boguses (Tiara's code) * what do low RB reals look like, what do high RB bogus look like?
  * Unlabeled Data * Score bias per features
Added:
>
>
* Report feature importance by correlated feature groups
 

Data Collection

  • Automated Data Contamination [ Charlotte ]
Line: 65 to 70
 

Open Issues / Experiments

Changed:
<
<
  • Separate classifiers:
    • Galactic vs. Extra-Galactic
>
>
  • Kowalski / IPAC candidate discrepancies
  • Pixel Clump issues on certain x,y positions (see Ashish's email to Umaa on 6/28/18)
 
  • Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
  • Deep Learning
Added:
>
>
  • Known boguses: ZTF18aabtvch, ZTF18aaiafnn, ZTF18aaizvmy

 

Papers:

  • ML Overview (response to Pub Board comments)

Revision 112018-08-31 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 28 to 28
 

To Do List

Added:
>
>

Issues

  • Why are candids from IPAC DB not found in alerts db?
  • Why is cross validation performance decreasing steadily since t8
 

Pipeline Development

Changed:
<
<
  • Get light curve observations from reals into training
>
>
  • Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]
 
  • Get an override label module (to override the majority vote)
  • Add cross matches from catalogs as a source (Nadia's from TNS)
  • Kill very old objects (from before Feb 5, 2018)
Line: 43 to 48
  * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence) * Plot feature distributions on reals vs. boguses
Changed:
<
<
* Unlabeled Data (
>
>
* Unlabeled Data
  * Score bias per features

Data Collection

Line: 60 to 65
 

Open Issues / Experiments

Deleted:
<
<
  • Correlation between nbad and boguses
 
  • Separate classifiers:
    • Galactic vs. Extra-Galactic
  • Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
Line: 70 to 74
 
  • ML Overview (response to Pub Board comments)
  • RB Paper
Added:
>
>

DONE

  • 2018-08 Correlation between nbad and boguses Report Here
  -- UmaaRebbapragada - 08 Aug 2018
Line: 78 to 85
 
META FILEATTACHMENT attachment="2018-08-04-t12_f5_c3.pptx" attr="" comment="Analysis of RB Version: t12_f5_c3" date="1533732554" name="2018-08-04-t12_f5_c3.pptx" path="2018-08-04-t12_f5_c3.pptx" size="620436" stream="2018-08-04-t12_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"
Added:
>
>
META FILEATTACHMENT attachment="2018-08-30-t13_f5_c3.pptx" attr="" comment="Analysis of RB version: t13_f5_c3" date="1535748841" name="2018-08-30-t13_f5_c3.pptx" path="2018-08-30-t13_f5_c3.pptx" size="243456" stream="2018-08-30-t13_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"

Revision 102018-08-14 - CharlotteWard

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 77 to 77
 
META FILEATTACHMENT attachment="2018-07-19-t11_f5_c3.pptx" attr="" comment="Analysis of RB Version: t11_f5_c3" date="1533732505" name="2018-07-19-t11_f5_c3.pptx" path="2018-07-19-t11_f5_c3.pptx" size="633196" stream="2018-07-19-t11_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="2018-08-04-t12_f5_c3.pptx" attr="" comment="Analysis of RB Version: t12_f5_c3" date="1533732554" name="2018-08-04-t12_f5_c3.pptx" path="2018-08-04-t12_f5_c3.pptx" size="620436" stream="2018-08-04-t12_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"

Revision 92018-08-08 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Changed:
<
<
The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline, that are available in real-time. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.
>
>
The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.
  The bogus label should be attributed to artifacts of the telescope optics or the data pipelines. The real label should be attributed to astronomical objects.
Line: 15 to 15
  The version tag that Real-Bogus uses is tX_fY_cZ where:
Changed:
<
<
  • X is training set version. This is the mix of examples that compromises the training set
>
>
  • X is training set version. This is the mix of examples that comprises the training set
 
  • Y is feature version. This is the set of features selected for each example.
  • Z is classifier software version. This is the version of the software deployed into the data pipeline. This is not to be confused with the software that generates the training data.

Revision 82018-08-08 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 26 to 26
 
t8_f5_c3 29 May 2018
t12_f5_c3 07 Aug 2018
Deleted:
<
<

 

To Do List

Pipeline Development

Line: 49 to 47
  * Score bias per features

Data Collection

Changed:
<
<
  • Find variables
    • bogus objects that have multiple alert packets
    • objects >= n_obs
    • objects with both positive and negative subtractions?
>
>
  • Automated Data Contamination [ Charlotte ] * Find contamination using clustering.
  • Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
  • Other Data Collection Sources (preferably automated)
    • Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
 
  • Automate cross matches from relevant catalogs (e.g., TNS)
Changed:
<
<
  • Data contamination
    • ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved.
>
>
  • Improving Quality of GROWTH Marshall Feed
    • ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved. Is my list spectroscopically-confirmed? Email to Ashot and Mani
 
    • Some in the science programs are classifying stars as "Bogus"
Changed:
<
<

Open Issues

  • Correlation between nbad and boguses?
  • Discrete features analysis
>
>

Open Issues / Experiments

  • Correlation between nbad and boguses
 
  • Separate classifiers:
    • Galactic vs. Extra-Galactic
  • Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)

Revision 72018-08-08 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 26 to 26
 
t8_f5_c3 29 May 2018
t12_f5_c3 07 Aug 2018
Added:
>
>

 

To Do List

Pipeline Development

Revision 62018-08-08 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Overview

Line: 6 to 6
  The bogus label should be attributed to artifacts of the telescope optics or the data pipelines. The real label should be attributed to astronomical objects.
Added:
>
>
The primary avenues of data collection are:

  • Transient Marshal: A weekly feed of reals and boguses are emailed. The boguses are listed by candid with a classification. The reals are listed by ztf object id.
  • Zooniverse: . The Galactic Plane was prioritized as an area of improvement by the Science Steering Committee (SSC).
 

Version History

The version tag that Real-Bogus uses is tX_fY_cZ where:

Line: 21 to 26
 
t8_f5_c3 29 May 2018
t12_f5_c3 07 Aug 2018
Deleted:
<
<

Training Data Collection

  • Transient Marshal has been our primary data collection avenue. A weekly feed of reals and boguses are emailed. The boguses are listed by candid with a classification. The reals are listed by ztf object id.
  • Zooniverse campaigns have been our primary data collection avenue for the Galactic Plane. The Galactic Plane was prioritized as an area of improvement by the Science Steering Committee (SSC).
 

To Do List

Added:
>
>

Pipeline Development

  • Get light curve observations from reals into training
  • Get an override label module (to override the majority vote)
  • Add cross matches from catalogs as a source (Nadia's from TNS)
  • Kill very old objects (from before Feb 5, 2018)
  • Implement proper grid search over RF parameter space
  • Get the testing framework into a Jupiter notebook
  • Get features from Kowalski vs. IPAC? Do same query against Kowalski and see if we recoup candidateIds
  • Pipeline Analysis:
    • Labeled Data Test Set * RB score improvements * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence) * Plot feature distributions on reals vs. boguses * Unlabeled Data ( * Score bias per features

Data Collection

  • Find variables
    • bogus objects that have multiple alert packets
    • objects >= n_obs
    • objects with both positive and negative subtractions?
  • Automate cross matches from relevant catalogs (e.g., TNS)
  • Data contamination
    • ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved.
    • Some in the science programs are classifying stars as "Bogus"

Open Issues

  • Correlation between nbad and boguses?
  • Discrete features analysis
  • Separate classifiers:
    • Galactic vs. Extra-Galactic
  • Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
  • Deep Learning

Papers:

  • ML Overview (response to Pub Board comments)
  • RB Paper
 
Changed:
<
<
-- AshishMahabal - 19 Mar 2018
>
>
-- UmaaRebbapragada - 08 Aug 2018
 
META FILEATTACHMENT attachment="RB_presentation_Ward.pptx" attr="" comment="RB presentation for ZTF ML meeting 20180322" date="1522086648" name="RB_presentation_Ward.pptx" path="RB_presentation_Ward.pptx" size="1470794" stream="RB_presentation_Ward.pptx" user="Main.TiaraHung" version="1"
Added:
>
>
META FILEATTACHMENT attachment="2018-07-19-t11_f5_c3.pptx" attr="" comment="Analysis of RB Version: t11_f5_c3" date="1533732505" name="2018-07-19-t11_f5_c3.pptx" path="2018-07-19-t11_f5_c3.pptx" size="633196" stream="2018-07-19-t11_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="2018-08-04-t12_f5_c3.pptx" attr="" comment="Analysis of RB Version: t12_f5_c3" date="1533732554" name="2018-08-04-t12_f5_c3.pptx" path="2018-08-04-t12_f5_c3.pptx" size="620436" stream="2018-08-04-t12_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"

Revision 52018-08-08 - UmaaRebbapragada

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"
Changed:
<
<

Data Collection

  • Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.
  • Transient Marshal is our other avenue. SWG members mark events as real or bogus and that provides useful input especially for bogus events, providing feedback to improve the RB scores.
  • Archive Marshal (Volunteers needed): input from this stream is not active yet.

Methods

  • Random Forests: we start on the lines of the [i]PTF model, but use the 68 features output by ZOGY subtractions.
  • Deep Learning: Start with triplets (Sci, Ref, Diff), and move on to just (Sci and Ref). Use CNNs.

Tests

  • Charlotte Ward's experiments
  • Tiara Hung's experiments

Output

  • Each object gets a score between 0 and 1.
  • Smaller RB scores indicates objects that are more likely to be bogus
  • Higher scores indicate more real objects
  • Models being improved regularly
>
>

Overview

The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline, that are available in real-time. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.

The bogus label should be attributed to artifacts of the telescope optics or the data pipelines. The real label should be attributed to astronomical objects.

Version History

The version tag that Real-Bogus uses is tX_fY_cZ where:

  • X is training set version. This is the mix of examples that compromises the training set
  • Y is feature version. This is the set of features selected for each example.
  • Z is classifier software version. This is the version of the software deployed into the data pipeline. This is not to be confused with the software that generates the training data.

Tag Deployed
<-- -->
Sorted descending
t8_f5_c3 29 May 2018
t1_f1_c1 12 Jan 2018
t7_f4_c3 08 May 2018
t12_f5_c3 07 Aug 2018
t6_f4_c3 04 April 2018

Training Data Collection

  • Transient Marshal has been our primary data collection avenue. A weekly feed of reals and boguses are emailed. The boguses are listed by candid with a classification. The reals are listed by ztf object id.
  • Zooniverse campaigns have been our primary data collection avenue for the Galactic Plane. The Galactic Plane was prioritized as an area of improvement by the Science Steering Committee (SSC).

To Do List

 

-- AshishMahabal - 19 Mar 2018

Revision 42018-03-29 - AshishMahabal

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Data Collection

  • Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.
Line: 9 to 9
 
  • Random Forests: we start on the lines of the [i]PTF model, but use the 68 features output by ZOGY subtractions.
  • Deep Learning: Start with triplets (Sci, Ref, Diff), and move on to just (Sci and Ref). Use CNNs.
Added:
>
>

Tests

  • Charlotte Ward's experiments
  • Tiara Hung's experiments
 

Output

  • Each object gets a score between 0 and 1.
  • Smaller RB scores indicates objects that are more likely to be bogus

Revision 32018-03-26 - TiaraHung

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Data Collection

  • Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.
Line: 17 to 17
 

-- AshishMahabal - 19 Mar 2018

Added:
>
>
META FILEATTACHMENT attachment="RB_presentation_Ward.pptx" attr="" comment="RB presentation for ZTF ML meeting 20180322" date="1522086648" name="RB_presentation_Ward.pptx" path="RB_presentation_Ward.pptx" size="1470794" stream="RB_presentation_Ward.pptx" user="Main.TiaraHung" version="1"

Revision 22018-03-21 - AshishMahabal

Line: 1 to 1
 
META TOPICPARENT name="MLRoad-map"

Data Collection

Changed:
<
<
  • Zooniverse
  • Transient Marshal
  • Archive Marshal (Volunteers needed)
>
>
  • Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.
  • Transient Marshal is our other avenue. SWG members mark events as real or bogus and that provides useful input especially for bogus events, providing feedback to improve the RB scores.
  • Archive Marshal (Volunteers needed): input from this stream is not active yet.
 

Methods

Changed:
<
<
  • Random Forests
  • Deep Learning
>
>
  • Random Forests: we start on the lines of the [i]PTF model, but use the 68 features output by ZOGY subtractions.
  • Deep Learning: Start with triplets (Sci, Ref, Diff), and move on to just (Sci and Ref). Use CNNs.
 
Added:
>
>

Output

  • Each object gets a score between 0 and 1.
  • Smaller RB scores indicates objects that are more likely to be bogus
  • Higher scores indicate more real objects
  • Models being improved regularly
 

-- AshishMahabal - 19 Mar 2018

Revision 12018-03-19 - AshishMahabal

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="MLRoad-map"

Data Collection

  • Zooniverse
  • Transient Marshal
  • Archive Marshal (Volunteers needed)

Methods

  • Random Forests
  • Deep Learning

-- AshishMahabal - 19 Mar 2018

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback