Difference: Real-BogusClassifications (1 vs. 15)

Revision 152020-10-23 - AshishMahabal

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Added:

>
>

This page is mostly about the RF work led by Umaa Rebbapragada (included in https://arxiv.org/abs/1902.01936).

The currently deployed model is braai that uses deep learning and is published here: https://arxiv.org/abs/1907.11259 with the associated GitHub including notebooks.

Overview

The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.

Line: 47 to 51

- Classifier error analysis: What do low RB reals look like, what do high RB bogus look like?
- Report feature importance by correlated feature groups

Data Collection

Changed:

<
<

Automated Data Contamination [ Charlotte ] * Find contamination using clustering.
Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)

>
>

Automated Data Contamination [ Charlotte ] * Find contamination using clustering.
Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)

Other Data Collection Sources (preferably automated)
- Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)
- Automate cross matches from relevant catalogs (e.g., TNS)

Revision 142019-02-20 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 36 to 36

Flag boguses with light curve observations greater than 2.
Catch variable stars with isdiffpos=True and false in the light curve
Implement proper grid search over RF parameter space

Added:

>
>

finish the work with the alternate test set, and getting stats on that.
make sure imputation is happening correctly
get Tomas's other plots integrated
KL divergence code, should be updated.

Pipeline Analysis to automate:
- Score improvement on known false positive, negatives (Ragnhild's List)
- KL divergence between training set, test set features (to find major divergence)
- Plot feature distributions on reals vs. boguses (Tiara's code)

Changed:

<
<

- What do low RB reals look like, what do high RB bogus look like?
- Unlabeled Data
- Score bias per features

>
>

- Classifier error analysis: What do low RB reals look like, what do high RB bogus look like?

- Report feature importance by correlated feature groups

Data Collection

Automated Data Contamination [ Charlotte ]

Revision 132019-02-19 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 25 to 25

t7_f4_c3	08 May 2018
t8_f5_c3	29 May 2018
t12_f5_c3	07 Aug 2018

Added:

>
>

t15_f5_c3

10 Jan 2019

To Do List

Deleted:

<
<

Issues

Why are candids from IPAC DB not found in alerts db?
Why is cross validation performance decreasing steadily since t8

Pipeline Development

Changed:

<
<

Separate Galactic vs. Extra-Galactic
Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]
Get an override label module (to override the majority vote)

>
>

Get light curve observations from reals into training vetted

Add cross matches from catalogs as a source (Nadia's from TNS)
Flag boguses with light curve observations greater than 2.
Catch variable stars with isdiffpos=True and false in the light curve

Deleted:

<
<

Kill very old objects (from before Feb 5, 2018)

Implement proper grid search over RF parameter space

Changed:

<
<

Get the testing framework into a Jupiter notebook
Get features from Kowalski vs. IPAC? Do same query against Kowalski and see if we recoup candidateIds
Pipeline Analysis:
- Labeled Data Test Set * RB score improvements

>
>

Pipeline Analysis to automate:

* Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence) * Plot feature distributions on reals vs. boguses (Tiara's code)

Changed:

<
<

* what do low RB reals look like, what do high RB bogus look like?

>
>

- What do low RB reals look like, what do high RB bogus look like?

* Unlabeled Data * Score bias per features * Report feature importance by correlated feature groups

Line: 62 to 50

Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
Other Data Collection Sources (preferably automated)

Changed:

<
<

- Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)

>
>

- Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)

- Automate cross matches from relevant catalogs (e.g., TNS)
Improving Quality of GROWTH Marshall Feed
- ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved. Is my list spectroscopically-confirmed? Email to Ashot and Mani

Line: 71 to 59

Open Issues / Experiments

Kowalski / IPAC candidate discrepancies

Added:

>
>

Separate Galactic vs. Extra-Galactic classifiers

Pixel Clump issues on certain x,y positions (see Ashish's email to Umaa on 6/28/18)
Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
Deep Learning
Known boguses: ZTF18aabtvch, ZTF18aaiafnn, ZTF18aaizvmy

Deleted:

<
<

Papers:

ML Overview (response to Pub Board comments)
RB Paper

DONE

2018-08 Correlation between nbad and boguses Report Here

Changed:

<
<

>
>

2018-10 combine_labels.py has an override switch, to override the majority vote for examples that have been revetted
2018-11 Filter out old sources (from before Feb 5, 2018)
2018-11 Get features from Kowalski vs. IPAC? Kowalski packets are NOT a superset of IPAC db feats! An experiment limited to the intersection of Kowalski and IPAC db feats showed classifier performance decreased! But found workaround for DB performance issues (Frank gave me a way to get nid, rcid from a candid)
2018-12 Cross validation performance decreasing steadily since t8 due to persistent contamination within the GROWTH marshall feed
2019-01 Get the testing framework into a Jupyter notebook

-- UmaaRebbapragada - 08 Aug 2018

Line: 93 to 80

META FILEATTACHMENT	attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"
META FILEATTACHMENT	attachment="2018-08-30-t13_f5_c3.pptx" attr="" comment="Analysis of RB version: t13_f5_c3" date="1535748841" name="2018-08-30-t13_f5_c3.pptx" path="2018-08-30-t13_f5_c3.pptx" size="243456" stream="2018-08-30-t13_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"

Added:

>
>

META FILEATTACHMENT	attachment="2019-01-t15_f5_c3.pptx" attr="" comment="Analysis of RB version: t15_f5_c3" date="1550599262" name="2019-01-t15_f5_c3.pptx" path="2019-01-t15_f5_c3.pptx" size="269747" user="UmaaRebbapragada" version="1"

Revision 122018-09-13 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 35 to 35

Pipeline Development

Added:

>
>

Separate Galactic vs. Extra-Galactic

Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]
Get an override label module (to override the majority vote)
Add cross matches from catalogs as a source (Nadia's from TNS)

Added:

>
>

Flag boguses with light curve observations greater than 2.
Catch variable stars with isdiffpos=True and false in the light curve

Kill very old objects (from before Feb 5, 2018)
Implement proper grid search over RF parameter space
Get the testing framework into a Jupiter notebook

Line: 47 to 50

* RB score improvements * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence)

Changed:

<
<

* Plot feature distributions on reals vs. boguses

>
>

* Plot feature distributions on reals vs. boguses (Tiara's code) * what do low RB reals look like, what do high RB bogus look like?

* Unlabeled Data * Score bias per features

Added:

>
>

* Report feature importance by correlated feature groups

Data Collection

Automated Data Contamination [ Charlotte ]

Line: 65 to 70

Open Issues / Experiments

Changed:

<
<

Separate classifiers:
- Galactic vs. Extra-Galactic

>
>

Kowalski / IPAC candidate discrepancies
Pixel Clump issues on certain x,y positions (see Ashish's email to Umaa on 6/28/18)

Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
Deep Learning

Added:

>
>

Known boguses: ZTF18aabtvch, ZTF18aaiafnn, ZTF18aaizvmy

Papers:

ML Overview (response to Pub Board comments)

Revision 112018-08-31 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 28 to 28

To Do List

Added:

>
>

Issues

Why are candids from IPAC DB not found in alerts db?
Why is cross validation performance decreasing steadily since t8

Pipeline Development

Changed:

<
<

Get light curve observations from reals into training

>
>

Get light curve observations from reals into training [ POSTPONED - no way to get all confirmed spectroscopic reals ]

Get an override label module (to override the majority vote)
Add cross matches from catalogs as a source (Nadia's from TNS)
Kill very old objects (from before Feb 5, 2018)

Line: 43 to 48

Changed:

<
<

* Unlabeled Data (

>
>

* Unlabeled Data

* Score bias per features

Data Collection

Line: 60 to 65

Open Issues / Experiments

Deleted:

<
<

Correlation between nbad and boguses

Separate classifiers:
- Galactic vs. Extra-Galactic
Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)

Line: 70 to 74

ML Overview (response to Pub Board comments)
RB Paper

Added:

>
>

DONE

2018-08 Correlation between nbad and boguses Report Here

-- UmaaRebbapragada - 08 Aug 2018

Line: 78 to 85

META FILEATTACHMENT	attachment="2018-08-04-t12_f5_c3.pptx" attr="" comment="Analysis of RB Version: t12_f5_c3" date="1533732554" name="2018-08-04-t12_f5_c3.pptx" path="2018-08-04-t12_f5_c3.pptx" size="620436" stream="2018-08-04-t12_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"

Added:

>
>

META FILEATTACHMENT	attachment="2018-08-30-t13_f5_c3.pptx" attr="" comment="Analysis of RB version: t13_f5_c3" date="1535748841" name="2018-08-30-t13_f5_c3.pptx" path="2018-08-30-t13_f5_c3.pptx" size="243456" stream="2018-08-30-t13_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"

Revision 102018-08-14 - CharlotteWard

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 77 to 77

META FILEATTACHMENT	attachment="2018-07-19-t11_f5_c3.pptx" attr="" comment="Analysis of RB Version: t11_f5_c3" date="1533732505" name="2018-07-19-t11_f5_c3.pptx" path="2018-07-19-t11_f5_c3.pptx" size="633196" stream="2018-07-19-t11_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="2018-08-04-t12_f5_c3.pptx" attr="" comment="Analysis of RB Version: t12_f5_c3" date="1533732554" name="2018-08-04-t12_f5_c3.pptx" path="2018-08-04-t12_f5_c3.pptx" size="620436" stream="2018-08-04-t12_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"

Added:

>
>

META FILEATTACHMENT	attachment="Nbad_analysis.pdf" attr="" comment="Analysis of nbad feature" date="1534290349" name="Nbad_analysis.pdf" path="Nbad analysis.pdf" size="704934" stream="Nbad analysis.pdf" user="Main.CharlotteWard" version="1"

Revision 92018-08-08 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Changed:

<
<

The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline, that are available in real-time. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.

>
>

The bogus label should be attributed to artifacts of the telescope optics or the data pipelines. The real label should be attributed to astronomical objects.

Line: 15 to 15

The version tag that Real-Bogus uses is tX_fY_cZ where:

Changed:

<
<

X is training set version. This is the mix of examples that compromises the training set

>
>

X is training set version. This is the mix of examples that comprises the training set

Y is feature version. This is the set of features selected for each example.
Z is classifier software version. This is the version of the software deployed into the data pipeline. This is not to be confused with the software that generates the training data.

Revision 82018-08-08 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 26 to 26

t8_f5_c3	29 May 2018
t12_f5_c3	07 Aug 2018

Deleted:

<
<

To Do List

Pipeline Development

Line: 49 to 47

* Score bias per features

Data Collection

Changed:

<
<

Find variables
- bogus objects that have multiple alert packets
- objects >= n_obs
- objects with both positive and negative subtractions?

>
>

Automated Data Contamination [ Charlotte ] * Find contamination using clustering.
Active Learning to Improve Training Data Selection [ Sara ] * Use active learning to discover potential batches of boguses (and reals, alternatively)
Other Data Collection Sources (preferably automated)
- Find variables (bogus objects that have multiple alert packets, objects >= n_obs, objects with both positive and negative subtractions?)

Automate cross matches from relevant catalogs (e.g., TNS)

Changed:

<
<

Data contamination
- ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved.

>
>

Improving Quality of GROWTH Marshall Feed
- ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved. Is my list spectroscopically-confirmed? Email to Ashot and Mani

- Some in the science programs are classifying stars as "Bogus"

Changed:

<
<

Open Issues

Correlation between nbad and boguses?
Discrete features analysis

>
>

Open Issues / Experiments

Correlation between nbad and boguses

Separate classifiers:
- Galactic vs. Extra-Galactic
Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)

Revision 72018-08-08 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 26 to 26

t8_f5_c3	29 May 2018
t12_f5_c3	07 Aug 2018

Added:

>
>

To Do List

Pipeline Development

Revision 62018-08-08 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Overview

Line: 6 to 6

The bogus label should be attributed to artifacts of the telescope optics or the data pipelines. The real label should be attributed to astronomical objects.

Added:

>
>

The primary avenues of data collection are:

Transient Marshal: A weekly feed of reals and boguses are emailed. The boguses are listed by candid with a classification. The reals are listed by ztf object id.
Zooniverse: . The Galactic Plane was prioritized as an area of improvement by the Science Steering Committee (SSC).

Version History

The version tag that Real-Bogus uses is tX_fY_cZ where:

Line: 21 to 26

t8_f5_c3	29 May 2018
t12_f5_c3	07 Aug 2018

Deleted:

<
<

Training Data Collection

Transient Marshal has been our primary data collection avenue. A weekly feed of reals and boguses are emailed. The boguses are listed by candid with a classification. The reals are listed by ztf object id.
Zooniverse campaigns have been our primary data collection avenue for the Galactic Plane. The Galactic Plane was prioritized as an area of improvement by the Science Steering Committee (SSC).

To Do List

Added:

>
>

Pipeline Development

Get light curve observations from reals into training
Get an override label module (to override the majority vote)
Add cross matches from catalogs as a source (Nadia's from TNS)
Kill very old objects (from before Feb 5, 2018)
Implement proper grid search over RF parameter space
Get the testing framework into a Jupiter notebook
Get features from Kowalski vs. IPAC? Do same query against Kowalski and see if we recoup candidateIds
Pipeline Analysis:
- Labeled Data Test Set * RB score improvements * Score improvement on known false positive, negatives (Ragnhild's List) * KL divergence between training set, test set features (to find major divergence) * Plot feature distributions on reals vs. boguses * Unlabeled Data ( * Score bias per features

Data Collection

Find variables
- bogus objects that have multiple alert packets
- objects >= n_obs
- objects with both positive and negative subtractions?
Automate cross matches from relevant catalogs (e.g., TNS)
Data contamination
- ZTF objects provided in the GROWTH marshall are not necessarily spectroscopically-confirmed, they are saved.
- Some in the science programs are classifying stars as "Bogus"

Open Issues

Correlation between nbad and boguses?
Discrete features analysis
Separate classifiers:
- Galactic vs. Extra-Galactic
Use alert data for Real-Bogus in real-time (which means access to 150 features and postage stamps)
Deep Learning

Papers:

ML Overview (response to Pub Board comments)
RB Paper

Changed:

<
<

-- AshishMahabal - 19 Mar 2018

>
>

-- UmaaRebbapragada - 08 Aug 2018

META FILEATTACHMENT	attachment="RB_presentation_Ward.pptx" attr="" comment="RB presentation for ZTF ML meeting 20180322" date="1522086648" name="RB_presentation_Ward.pptx" path="RB_presentation_Ward.pptx" size="1470794" stream="RB_presentation_Ward.pptx" user="Main.TiaraHung" version="1"

Added:

>
>

META FILEATTACHMENT	attachment="2018-07-19-t11_f5_c3.pptx" attr="" comment="Analysis of RB Version: t11_f5_c3" date="1533732505" name="2018-07-19-t11_f5_c3.pptx" path="2018-07-19-t11_f5_c3.pptx" size="633196" stream="2018-07-19-t11_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="2018-08-04-t12_f5_c3.pptx" attr="" comment="Analysis of RB Version: t12_f5_c3" date="1533732554" name="2018-08-04-t12_f5_c3.pptx" path="2018-08-04-t12_f5_c3.pptx" size="620436" stream="2018-08-04-t12_f5_c3.pptx" user="Main.UmaaRebbapragada" version="1"
META FILEATTACHMENT	attachment="2018-08-07-ZTF-Team-Meeting-RB.pptx" attr="" comment="Umaa Rebbapragada's Presentation on RB at the ZTF Team Meeting, Stockholm" date="1533732627" name="2018-08-07-ZTF-Team-Meeting-RB.pptx" path="2018-08-07-ZTF-Team-Meeting-RB.pptx" size="641834" stream="2018-08-07-ZTF-Team-Meeting-RB.pptx" user="Main.UmaaRebbapragada" version="1"

Revision 52018-08-08 - UmaaRebbapragada

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Changed:

<
<

Data Collection

Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.
Transient Marshal is our other avenue. SWG members mark events as real or bogus and that provides useful input especially for bogus events, providing feedback to improve the RB scores.
Archive Marshal (Volunteers needed): input from this stream is not active yet.

Methods

Random Forests: we start on the lines of the [i]PTF model, but use the 68 features output by ZOGY subtractions.
Deep Learning: Start with triplets (Sci, Ref, Diff), and move on to just (Sci and Ref). Use CNNs.

Tests

Charlotte Ward's experiments
Tiara Hung's experiments

Output

Each object gets a score between 0 and 1.
Smaller RB scores indicates objects that are more likely to be bogus
Higher scores indicate more real objects
Models being improved regularly

>
>

Overview

The Real-Bogus classifier scores sources on a scale of 0 (bogus) to 1 (real). It is currently a Random Forest classifier that is built upon 'features', which are a collection of statistics and outputs of the real-time data pipeline, that are available in real-time. The classifier is trained on a set of labeled data. Labels are provided via two data collection venues: 1) Zooniverse and 2) GROWTH marshall.

The bogus label should be attributed to artifacts of the telescope optics or the data pipelines. The real label should be attributed to astronomical objects.

Version History

The version tag that Real-Bogus uses is tX_fY_cZ where:

X is training set version. This is the mix of examples that compromises the training set
Y is feature version. This is the set of features selected for each example.
Z is classifier software version. This is the version of the software deployed into the data pipeline. This is not to be confused with the software that generates the training data.

Tag	Deployed <-- -->
t8_f5_c3	29 May 2018
t1_f1_c1	12 Jan 2018
t7_f4_c3	08 May 2018
t12_f5_c3	07 Aug 2018
t6_f4_c3	04 April 2018

Training Data Collection

Transient Marshal has been our primary data collection avenue. A weekly feed of reals and boguses are emailed. The boguses are listed by candid with a classification. The reals are listed by ztf object id.
Zooniverse campaigns have been our primary data collection avenue for the Galactic Plane. The Galactic Plane was prioritized as an area of improvement by the Science Steering Committee (SSC).

To Do List

-- AshishMahabal - 19 Mar 2018

Revision 42018-03-29 - AshishMahabal

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Data Collection

Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.

Line: 9 to 9

Random Forests: we start on the lines of the [i]PTF model, but use the 68 features output by ZOGY subtractions.
Deep Learning: Start with triplets (Sci, Ref, Diff), and move on to just (Sci and Ref). Use CNNs.

Added:

>
>

Tests

Charlotte Ward's experiments
Tiara Hung's experiments

Output

Each object gets a score between 0 and 1.
Smaller RB scores indicates objects that are more likely to be bogus

Revision 32018-03-26 - TiaraHung

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Data Collection

Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.

Line: 17 to 17

-- AshishMahabal - 19 Mar 2018

Added:

>
>

META FILEATTACHMENT	attachment="RB_presentation_Ward.pptx" attr="" comment="RB presentation for ZTF ML meeting 20180322" date="1522086648" name="RB_presentation_Ward.pptx" path="RB_presentation_Ward.pptx" size="1470794" stream="RB_presentation_Ward.pptx" user="Main.TiaraHung" version="1"

Revision 22018-03-21 - AshishMahabal

Line: 1 to 1

META TOPICPARENT	name="MLRoad-map"

Data Collection

Changed:

<
<

Zooniverse
Transient Marshal
Archive Marshal (Volunteers needed)

>
>

Zooniverse campaigns have been our primary data collection avenue for reals and bogus events. Better subtraction by ZOGY (compared to what was in [i]PTF), combined with stricter (higher) SNR cutoff, means we are receiving fewer bogus events.
Transient Marshal is our other avenue. SWG members mark events as real or bogus and that provides useful input especially for bogus events, providing feedback to improve the RB scores.
Archive Marshal (Volunteers needed): input from this stream is not active yet.

Methods

Changed:

<
<

Random Forests
Deep Learning

>
>

Random Forests: we start on the lines of the [i]PTF model, but use the 68 features output by ZOGY subtractions.
Deep Learning: Start with triplets (Sci, Ref, Diff), and move on to just (Sci and Ref). Use CNNs.

Added:

>
>

Output

Each object gets a score between 0 and 1.
Smaller RB scores indicates objects that are more likely to be bogus
Higher scores indicate more real objects
Models being improved regularly

-- AshishMahabal - 19 Mar 2018

Revision 12018-03-19 - AshishMahabal

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="MLRoad-map"

Data Collection

Zooniverse
Transient Marshal
Archive Marshal (Volunteers needed)

Methods

Random Forests
Deep Learning

-- AshishMahabal - 19 Mar 2018

View topic | History: r15 < r14 < r13 < r12 | More topic actions...