News

July 11th:	The papers of the participants are available in the proceedings of ICGI'23: the one of the winning team (EdiMuskardin), the one of the team that finished second (NeuralChecker), and the one of the Redouane Hakimi team.
June 26th:	This paper, that will appear in the proceedings of the 16th International Conference on Grammatical Inference, describes the competition. It provides info on the data used, the trained models, etc. Following our Creatives Commons Licence, please cite it if you are using the TAYSIR models in your work.
June 23rd:	The trained Neural Nets and the datasets of the competition are now available in this archive. We also added several transformers we trained on our datasets but did not used for the competition. We hope that this TAYSIR Benchmark will be used from now on to compare approached on distillation of Neural Networks trained on sequantial symbolic data
May 1st:	The Competition is over! The team from Edi Muskardin won the competition. Congrats! Thanks everyone for your participation.
March 22nd:	The second (and last) Track (Language Modeling/Density Estimation) can be found here. Its manual, where main elements are described, was updated. Have fun!
March 6th:	The Competition is on! The first Track (Binary Classification) can the found here. Its manual, where main elements are described, is here.
February 22nd:	The competition is in beta-testing: you will find the link to access it in our Manual
February 18th:	We created a place on discord for discussions about the competition. Feel free to join us there
February 15th:	We faced some difficulties (tackled!) and we are running late on our schedule. We plan to open the competition on February 22nd.
February 1st:	Our first call for participation is now available. Check it out!!

What it is

Transformers+RNN: Algorithms to Yield Simple and Interpretable Representations (TAYSIR, the arab word for 'simple') competition was an on-line challenge on extracting simpler models from already trained neural networks. These neural nets are trained on sequential categorial (=symbolic) data. Some of these data are artificial, some come from real world problems (NLP, Bioinformatics, Software Engineering, etc.)

The quality of the extracted models are evaluated following two directions:
► Approximation of the original model
► Simplicity of the extracted model

Two tracks are proposed:
► Neural nets trained for binary classification, that is, a language in the Formal Language Theory sense
► Neural nets trained for regression, that is, a function assigning a real number to any finite sequence of symbols (e.g the density estimation of Language modelling RNN) Each track consists of roughly 15 trained models.

The trained models are in pytorch but available in a MLFlow format for compatibility with other frameworks.

When

The Competition took place during Spring 2023, between mid-February and April.

Half a day will be dedicated to the competition results during the 16th International Conference on Grammatical Inference to be held in Morocco in July 2023.

Participants in TAYSIR will be encouraged to attend ICGI 2023 and to submit an extended abstract presenting their work (2 to 4 pages, including appendices) which will be appended to the proceedings of ICGI (Publisher: PMLR) in a track dedicated to the competition. The deadline for this abstract is May 15th, 2 weeks after the end of the competition - note that it is different than the ICGI paper submission deadline which is at the beginning of March. These abstracts will be peer-reviewed primarily for clarity of presentation.

Timeline:

February 2023:	Beginning of the competition
April 30th 2023:	End of the competition
May 15th 2023:	Submission deadline for the extended abstracts presenting your work
July 10th-13th 2023:	ICGI 2023, including a dedicated session about TAYSIR with presentations from some participants

How to get the data of the TAYSIR Benchmark

Everything, including trained Neural Nets, datasets, script for evaluation, manual, scientific article, and bonus Transformer models can be found here.

The first Track (Binary Classification) was at here. The second Track (Language Modeling/Density Estimation) was at here. There you needed to register in order to download the already trained model. The manual, where main elements are described, is a good start.

After running your extraction algorithm, we will expect participants to upload on our website their extracted model as an archive containing a MLFlow pyfunc. We provide a tool kit that can save any python function in the desirated format. The detail of its (simple) usage can be found in the starter kit given with the models on the submission website.

Evaluation

We used 2 types of metrics: one to evaluate the quality of the approximation, the other to evaluate the simplicity of the submitted model.

Quality

For Track 1 (Neural Nets trained on binary classification tasks), the quality is avaluated using the accuracy, that is, the proportion on the sequences of the test set (unkown to participants) on which your model and the neural net agree.

For Track 2 (Neural Nets trained on language modelling tasks), the quality is avaluated using the mean square error. Models are seen as density estimators: they can assign a probability to each sequence of the test set: we compute the square of the difference between your model and the original neural net on each sequence and look at the mean over the whole test set.

Simplicity

We use two metrics to evaluate the simplicity of your submitted models:
Memory usage: The memory footprint of your model during a prediction, in megabytes.
CPU time: The time spent in CPU of your model during a prediction, in milliseconds.

Notice that when you first submit to the competition, we are creating a virtual environement

Thanks

This competition is financially supported by the ANR TAUDoS and the firm EURA NOVA.

Main organizers

Chihiro Shibata

Hosei University, Japan
Dakotah Lambert

Université Jean Monnet, Saint-Etienne, France
Jeffrey Heinz

Stony Brook University, New York, USA
Rémi Eyraud

Université Jean Monnet, Saint-Etienne, France

Data Scientists

Mathias Cabanne

EURA NOVA

Nicolas Buton

DiLySS - IRISA

Aidar Gaffarov

MLDM Mater program

Badr Tahri Joutei

MLDM Master program

Scientific Committee

Ariadna Quattoni

Universitat Politècnica de Catalunya

Bob Frank

Yale University, USA

Borja Balle

DeepMind

François Coste

INRIA Rennes

Jean-Christophe Janodet

Université Paris-Saclay

Matthias Gallé

Naver Lab

Sicco Verwer

TU Delft

News

What it is

When

Timeline:

How to get the data of the TAYSIR Benchmark

Evaluation

Quality

Simplicity

Thanks

Main organizers

Chihiro Shibata

Dakotah Lambert

Jeffrey Heinz

Rémi Eyraud

Data Scientists

Mathias Cabanne

Nicolas Buton

Aidar Gaffarov

Badr Tahri Joutei

Scientific Committee

Ariadna Quattoni

Bob Frank

Borja Balle

François Coste

Jean-Christophe Janodet

Matthias Gallé

Sicco Verwer