JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  June 2023

ALLSTAT June 2023

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

FinTOC 2023 Call For Participation

From:

FinTOC SharedTask <[log in to unmask]>

Reply-To:

FinTOC SharedTask <[log in to unmask]>

Date:

Thu, 15 Jun 2023 11:33:29 +0200

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (216 lines)

Please find the FINTOC 2023 Shared Task Call for Participation below.


Apologies for cross-posting.


With best wishes,

FinTOC 2023 Shared Task organizing committee


---

Call for participation:


FNP-2023 Shared Task: FinTOC - Financial Document Structure Extraction

Practical Information:

To be held as part of the 5th Financial Narrative Processing Workshop (FNP
2023) <https://wp.lancs.ac.uk/cfie/fnp2023/>during the 2023 IEEE
International Conference on Big Data (IEEE BigData 2023)
<http://bigdataieee.org/BigData2023/>, Sorrento, Italy, from 15th December
to 18th December, 2023. It is a one-day event of which the exact date is to
be announced.

===================

Shared Task URL: http://wp.lancs.ac.uk/cfie/fintoc2023/
<http://wp.lancs.ac.uk/cfie/fintoc2022/>

Workshop URL: https://wp.lancs.ac.uk/cfie/fnp2023/

Participation Form:
https://docs.google.com/forms/d/e/1FAIpQLSdqUKy3YGho0Cw2GF__VHilHZZbR75UDG3JRBC4k0Yxw4acWg/viewform?usp=pp_url

___________________________________________________________


Shared Task Description:

A vast and continuously growing volume of financial documents are being
created and published in machine-readable formats, predominantly in aPDF
format. Unfortunately, these documents often lack comprehensive structural
information, presenting a challenge for efficient analysis and
interpretation. Nevertheless, these documents play a crucial role in
enabling firms to report their activities, financial situation, and
investment plans to shareholders, investors, and the financial markets.
They serve as corporate annual reports, offering detailed financial and
operational information.

In certain countries like the United States and France, regulators such as
the SEC (Securities and Exchange Commission) and the AMF (Financial Markets
Authority) have implemented requirements for firms to adhere to specific
reporting templates. These regulations aim to promote standardization and
consistency across firms' disclosures. However, in various European
countries, management typically possesses more flexibility in determining
what, where, and how to report financial information, resulting in a lack
of standardization among financial documents published within the same
market.

Although there has been some research conducted on the recognition of books
and document table of contents (TOC), most of the existing work has focused
on small-scale, application-dependent and domain-specific datasets. This
limited scope poses challenges when dealing with a vast collection of
heterogeneous documents and books, where TOCs from different domains
exhibit significant variations in visual layout and style. Consequently,
recognizing and extracting TOCs becomes an intricate problem. Indeed, in
comparison to regular books that are typically provided in a full-text
format with limited structural information such as pages and paragraphs,
financial documents possess a more complex structure. They consist of
various elements, including parts, sections, sub-sections, and even
sub-sub-sections, incorporating both textual and non-textual content. Thus,
TOC pages are not always present to help readers navigate the document, and
when they are, they often only provide access to the main sections.

In this shared task, our objective is to undertake the analysis of various
types of financial documents, encompassing KIID (Key Investor Information
Document), Prospectus (official PDF documents where investment funds
meticulously describe their characteristics and investment modalities),
Réglement and Financial Annual Reports/Financial Statements (that provide a
detailed overview of a company's financial performance and operations over
the course of a fiscal year). These documents play a vital role in
providing crucial information to investors, stakeholders, and regulatory
bodies. While the content they must contain is often prescribed and
regulated, their format lacks standardization, leading to a significant
degree of variability. The presentation styles range from plain text format
to more visually rich and data-driven graphical and tabular
representations. Notably, the majority of those documents are published
without a table of contents . A TOC is typically essential for readers as
it enables easy navigation within the document by providing a clear outline
of headers and corresponding page numbers. Additionally, TOCs serve as a
valuable resource for legal teams, facilitating the verification of the
inclusion of all the required contents. Consequently, the automated
analysis of these documents to extract their structure is becoming
increasingly useful for numerous firms worldwide.

Our primary focus for this edition is to expand the extraction of table of
contents to a wider variety of financial documents, and the task will
involve developing highly efficient algorithms and methodologies to address
the challenges associated with such a dataset. Our aim is to achieve a
level of generalization ensuring that the developed system can be applied
to different types of financial documents. This broader scope allows us to
explore the applicability of our methodologies across a range of financial
document categories, such as KIID, Prospectus, Réglement and Financial
Annual Reports/Financial Statements. This way, we want to demonstrate the
versatility and effectiveness of the ML algorithms used in TOC extraction,
enabling a streamlined and consistent approach across various financial
document types.

In addition, for this edition, we are excited to introduce a dataset that
goes beyond textual annotations. Our proposed dataset will include visual
(spatial) annotations that capture the coordinates of the titles and
hierarchical structure of the documents. This comprehensive approach
enables a more holistic analysis and understanding of financial documents.

By incorporating visual annotations, we can capture the visual cues and
design elements that contribute to the overall structure and organization
of the documents. This allows us to delve deeper into the visual
representation of the table of contents and extract valuable insights from
the visual hierarchy present in these financial documents. The combination
of textual and visual annotations provides a richer and more nuanced
dataset, making it possible to increase the accuracy and effectiveness of
the machine learning algorithms and methodologies employed in TOC
extraction.


Thanks to the contribution of the Autonomous University of Madrid (UAM,
Spain), the fifth edition of the FinTOC Shared Task welcomes a specific
track for Spanish documents, continuing from the previous edition.

In this edition, systems will be scored based on their performance in both
Title detection and TOC generation using more precise evaluation metrics
based on visual annotations.

Participants are required to register for the Shared Task. Once registered,
all participating teams will receive a common training dataset consisting
of PDF documents along with the associated TOC annotations.


To participate please use the registration form below to add details about
your team:
https://docs.google.com/forms/d/e/1FAIpQLSdqUKy3YGho0Cw2GF__VHilHZZbR75UDG3JRBC4k0Yxw4acWg/viewform?usp=pp_url
(now open as of 06/01/2023)


_____________________________________________


   -

   1st Call for papers & shared task participants: June 12, 2023
   -

   2nd Call for papers & shared task participants: July 17, 2023
   -

   Final Call for papers & shared task participants: August 17, 2023
   -

   Training set release: August 21, 2023
   -

   Blind test set release: September 21, 2023
   -

   Systems submission: October 03, 2023
   -

   Release of results: October 09, 2023
   -

   Paper submission deadline: October 18, 2023 (anywhere in the world)
   -

   Notification of paper acceptance to authors: November 01, 2023
   -

   Camera-ready of accepted papers: November 15, 2023
   -

   Workshop date (1 day event) : December 15-18, 2023 (exact date to be
   announced)

_____________________________________________

Contact:

For any questions on the shared task please contact us on:

[log in to unmask]

_____________________________________________

Shared Task Organizers:

- Abderrahim Ait Azzi, 3DS Outscale (ex Fortia), France

- Sandra Bellato, 3DS Outscale (ex Fortia), France

- Blanca Carbajo Coronado, Universidad Autónoma de Madrid

- Dr Ismail El Maarouf, Imprevicible

- Dr Juyeon Kang, 3DS Outscale (ex Fortia), France

- Prof. Ana Gisbert, Universidad Autónoma de Madrid
- Prof. Antonio Moreno Sandoval, Universidad Autónoma de Madrid

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager