JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for LIS-PUB-LIBS Archives


LIS-PUB-LIBS Archives

LIS-PUB-LIBS Archives


LIS-PUB-LIBS@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

LIS-PUB-LIBS Home

LIS-PUB-LIBS Home

LIS-PUB-LIBS  October 2006

LIS-PUB-LIBS October 2006

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

DIGITAL TEXTS IN EDITABLE FORMAT conference at jadavpur

From:

Simon Tanner <[log in to unmask]>

Reply-To:

Simon Tanner <[log in to unmask]>

Date:

Mon, 9 Oct 2006 09:06:30 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (646 lines)

>The School of Cultural Texts and Records, Jadavpur University
>
>in collaboration with
>
>The Association for Literary and Linguistic Computing, UK
>
>presents
>
>
>
>DIGITAL TEXTS IN EDITABLE FORMAT
>
>with special reference to indic languages
>
>CONFERENCE, 7-8 FEBRUARY 2007  |  WORKSHOP, 9-10 FEBRUARY 2007
>
>Directors: Sukanta Chaudhuri, Subha Chakraborty Dasgupta, Samar Bhattacharya,
>
>Anirban Ray Chaudhuri
>
>
>
>Both the conference and the workshop will cover the full range of issues
>
>relating to digitising of texts and the creation 
>of digital archives. The major
>
>objectives will be
>
>   to address the technical problems of digitising Indic scripts in editable
>
>   format through Optical Character Recognition
>
>   to address the textual and archival aspects 
> of digitising texts and documents
>
>   in Indic scripts: locating, compiling and editing the resources
>
>However, all presentations need not address texts in Indic languages. General
>
>papers on digital text technology are welcome, 
>as also reports and analyses of
>
>representative projects in digital archiving in 
>any language. While a good many
>
>presentations will be on Bengali texts, material on other Indic languages is
>
>particularly welcome. All presentations must be in English.
>
>
>
>THE CONFERENCE on 7-8 February 2007 will be for a general audience comprising
>
>all persons interested in archival, editorial 
>and textual study, or in digital
>
>technology and literary and linguistic computing. It should be of interest to
>
>students of literature, history, linguistics, or 
>any other discipline involving
>
>the study of texts and documents.
>
>THE WORKSHOP on 9-10 February 2007 is intended 
>for a more specialised group of
>
>participants, with direct experience or strong 
>interests in digital technology
>
>(especially font generation and OCR), electronic archiving and electronic
>
>editing. Please inform us separately if you wish 
>to take part in the workshop.
>
>
>
>Most papers will be by invitation. For more information please contact:
>
>
>
>Sukanta Chaudhuri at 
><mailto:[log in to unmask]>[log in to unmask]
>
>or Subha Chakraborty Dasgupta at <mailto:[log in to unmask]>[log in to unmask]
>
>or enquire by post from
>
>Sukanta Chaudhuri, Director, School of Cultural Texts and Records, Jadavpur
>University, Kolkata 700 032, India
>
>________________________________________________________________________________________________ 
>
>PROPOSED WORKSHOP ON
>
>DIGITISING INDIC TEXTS IN EDITABLE FORMAT
>
>TO BE JOINTLY ORGANISED BY
>
>THE ASSOCIATION FOR LITERARY AND LINGUISTIC COMPUTING, UK
>
>AND
>
>JADAVPUR UNIVERSITY, KOLKATA, INDIA
>
>
>
>CONFERENCE: 7-8 February 2007          WORKSHOP: 9-10 February 2007
>
>Event Directors: Sukanta Chaudhuri, Subha Chakraborty Dasgupta, Samar
>
>Bhattacharya, Anirban Ray Chaudhuri
>
>
>
>Aims and Scope:
>
>Both the Conference and the Workshop will 
>address the digitization of texts in
>
>editable format with special reference to Indic 
>languages, particularly Bengali.
>
>The conference will be for a bigger and less 
>specialized group of participants:
>
>scholars of literature, history and all other disciplines involving textual
>
>study and archiving, as well as computer 
>scientists and other technologists with
>
>an interest in literary and linguistic computing. The Workshop will be for
>
>fewer, technically specialized participants who have actually worked in this
>
>field. Others may attend as auditors.
>
>
>
>We propose Bengali as the language to focus on, as (a) the language of the
>
>region where Jadavpur University is located; and (b) a language where a good
>
>deal of work has already been done in the above 
>respects, ensuring an informed
>
>and interactive milieu. Today, efficient word-processing programmes exist for
>
>all major Indian languages. Some work has been done on OCR programmes in
>
>Devanagari (Hindi) script, that being the country's official language, and in
>
>Brahmi-based scripts including Bengali. But 
>other advanced functions, essential
>
>for textual study and editorial processing, have 
>hardly been pursued anywhere in
>
>India. In Bengali, at least a foundation has 
>been laid which can be consolidated
>
>in the Workshop.
>
>    Of course we shall invite experts in other 
> languages, as most of the issues
>
>are germane to their work as well. The experts from abroad may not have
>
>knowledge of Bengali or other Indic scripts. 
>Rather, their contribution will be
>
>valuable precisely by virtue of drawing on a broader field of research and
>
>experience.
>
>
>
>The subject of the workshop and conference is of importance from two angles:
>
>   technical: to solve the special problems of digitizing Indic scripts.
>
>   scholarly, archival and bibliographical: to create literary archives and
>
>   foster an editorial culture.
>
>(a) Technical:
>
>The special problems of many Indic alphabets, 
>including Bengali, are as follows:
>
>   Only consonants are written in full, with the accompanying vowel sounds
>
>   indicated by tagged-on vowel markers. 
> Although these vowel sounds phonetically
>
>   follow the consonant, they are sometimes 
> written before it, and at other times
>
>   above or below. This makes font creation and screen visualization more
>
>   difficult. It also makes certain functions of 
> text analysis – e.g., phonetic
>
>   analysis, or collation of texts with variant spellings – specially
>
>   problematic, as the visual sequence presented on screen does not match the
>
>   phonetic sequence followed during keying-in and hence registered in the
>
>   processing unit.
>
>   The Bengali alphabet has about 50 letters, without majuscule/miniscule
>
>   variation. (The 'about' is significant: there is some debate as to what
>
>   constitutes a letter.) There are also a huge 
> number of conjunct letters (2, 3
>
>   or even 4 conjunct consonants plus a vowel), 
> besides a range of vowel tags (in
>
>   several forms for each vowel, depending on 
> the consonant it is attached to)
>
>   and some 'half-letters' (consonants without 
> vowels). All this vastly increases
>
>   the number of glyphs, to a total of 450-500 
> items in the average 'typecase'.
>
>   Moreover, the forms of these conjuncts vary from font to font in print.
>
>   Despite a trend towards simplification, most 
> of these conjunct letters will be
>
>   with us for a long time to come. And needless 
> to say, they will always have to
>
>   be processed in the case of extant texts.
>
>   This makes it a great challenge both
>
>     to generate these conjuncts in fonts for 
> electronic use: though many Bengali
>
>     fonts have been generated, they nearly 
> always have a measure of glitches or
>
>     instability; and
>
>     to develop an OCR programme that can 'read' 
> these conjuncts in extant print
>
>     fonts. Again, much work has been done, but 
> with an accuracy of 95%+ only in
>
>     certain text situations. It is often no more than 85%.
>
>The proposed workshop would offer a rare chance 
>for specialists from abroad to
>
>learn of these problems and the ways Indian experts are tackling them, and in
>
>turn to suggest new approaches and solutions based on their experience with
>
>Roman or other alphabets.
>
>    The issues to be taken up could be:
>
>   improving and stabilising Unicode fonts in Bengali and related Indian
>
>   languages;
>
>   improving and extending the OCR programmes 
> developed so far, to create texts
>
>   suitable for all kinds of textual and phonetic analysis;
>
>   further mark-up of these texts to produce fully editable, collatable and
>
>   searchable versions;
>
>   improving an experimental collation programme already developed.
>
>All this would help us work towards the ultimate 
>goal of ensuring that texts in
>
>Indic languages can, one day, be processed in 
>every way considered standard for
>
>the Roman alphabet: search/concordance, 
>collation, OCR, phonetic analysis etc.
>
>
>
>(b) Scholarly, archival and bibliographical
>
>      Bengali has an extensive literature, whose 'modern' phase began in the
>
>early 19th century. It was the first Asian language to come into extensive
>
>contact with Western literature and thought. 
> From the 19th century, it produced
>
>a great range of educational, social, religious and even
>
>scientific/technological works, as well as the first body of modern creative
>
>literature in any Indian language. This corpus 
>forms the textual basis of what
>
>is often called the Bengal Renaissance, reaching its climax in the work of
>
>Rabindranath Tagore (1861-1941). Despite dramatic changes over the last
>
>half-century, Bengali literature and culture can still be said to live in the
>
>aftermath of the Bengal Renaissance.
>
>       Bengali was also (except for a brief, soon closed chapter in western
>
>India) the first Asian language to achieve print. There is a huge body of
>
>printed texts from the late 18th century onwards. Quantity apart, the earlier
>
>material is seminal for Indian – indeed, world – printing history. It shows a
>
>specially interesting amalgam of Western 
>techniques developed over 350-400 years
>
>with innovations specific to the local script and conditions of production.
>
>       Bengal is famous for its vibrant literary culture, with a rich body of
>
>creative works and their scholarly interpretation. But so far as textual
>
>scholarship and editorial attention is concerned, this creative and critical
>
>activity is taking place in a near-vacuum. 
>Hardly a score of Bengali texts are
>
>available in critical editions as international scholarship understands the
>
>term. Original 19th-century works are often hard 
>to come by, surviving only in
>
>one or two copies, often badly preserved and 
>deteriorating in the hot and humid
>
>climate.
>
>       Combining the technical and scholarly 
> imperatives, the Workshop would help
>
>us work towards the goal of ensuring that texts in Indic languages can be
>
>processed in every way considered standard for the Roman alphabet –
>
>search/concordance, collation, OCR, phonetic analysis etc. – and hence made
>
>accessible for all kinds of editorial and 
>scholarly activity. Thus our two major
>
>needs would be served:
>
>     to ensure the sheer physical record of this 
> rich body of works in digital
>
>     form.
>
>     to generate an editorial culture by 
> producing electronic texts in editable
>
>     format.
>
>
>
>LOCAL PARTICIPANTS
>
>       As stated above, the Conference on 7-8 February will attract a more
>
>general audience. The Workshop may have 
>intensive participation by a core group
>
>of approx.15 local members. Another 15-20 persons – a few senior members, but
>
>chiefly young project and research staff – may 
>attend to absorb the culture of
>
>electronic texts. These 'auditors' will be 
>welcome to take active part, but they
>
>are unlikely to do so often. We hope, nonetheless, that they will feel
>
>encouraged to interact with the experts outside 
>the workshop and in the future.
>
>In particular, young staff working on a single limited aspect of electronic
>
>texts will benefit greatly from this broader experience.
>
>       Among the established scholars and 
> workers in the field who, we hope, will
>
>attend the Workshop are the following. This list is neither confirmed nor
>
>complete.
>
>       a) Professor Kalyan Kumar Datta and 
> Professor Samar Bhattacharya, School
>
>of Education Technology, Jadavpur University: 
>members of the 'Vidyasagar' group
>
>that developed the first Bengali electronic fonts.
>
>       b) Professor Mita Nasipuri, Dr Anirban 
> Ray Chaudhuri, and other members of
>
>the Department of Computer Science and Engineering, Jadavpur University
>
>associated with CMATER, an OCR development 
>centre attached to their Department.
>
>       c) Professor Bidyut Baran Chaudhuri, Indian Statistical Institute,
>
>Kolkata, who developed the first viable OCR programme in Bengali.
>
>       d) Professor Ashok Mukhopadhyay, sometime Professor of Printing
>
>Engineering, Jadavpur University and CEO of the University Press attached to
>
>Visva-Bharati, the university founded by 
>Rabindranath Tagore and till recently
>
>custodian of his works.
>
>       e) Professor Gautam Sengupta, Professor of Linguistics, University of
>
>Hyderabad: a noted applied linguist with much work on Bengali fonts and
>
>electronic texts.
>
>       f) Professor Palash Baran Pal and 
> Professor Somendra Mohan Bhattacharya,
>
>Saha Institute of Nuclear Physics, Kolkata: physicists who have also worked
>
>extensively on Bengali fonts, word-processing programmes and online text
>
>databases.
>
>       g) Members of local software groups – 
> professional, semi-professional and
>
>amateur – such as the 'Ankur' group, who are working with Bengali electronic
>
>fonts and texts.
>
>       h) Members of the School of Cultural Texts and Records, Jadavpur
>
>University: literary and humanistic scholars 
>with expertise in electronic texts:
>
>e.g., Professor Subha Chakraborty Dasgupta, 
>Professor Amlan Das Gupta, Dr Moinak
>
>Biswas, Dr Amitava Das, Dr Samantak Das, Dr Abhijit Gupta, Dr Rimi B.
>
>Chatterjee.
>
>       Among younger delegates and 'auditors', we would specially welcome the
>
>young project staff and ancillary workers attached to the School of Cultural
>
>Texts and Records, Jadavpur University, and various relevant units of the
>
>Faculty of Engineering and Technology.
>
>Note on Jadavpur University:
>
>Jadavpur University began as a technological 
>institution. But it is unique among
>
>Indian universities in that, over the last 20-30 
>years, it has built up one of
>
>India's most successful Arts Faculties, 
>including four departments of language
>
>and literature: Bengali, Comparative Literature, English and Sanskrit. It is
>
>arguably the most appropriate venue in India for literary and linguistic
>
>computing.
>
>       It has already made many contributions in 
> the field. The first electronic
>
>Bengali fonts were developed here. The people who developed them are still
>
>around (chiefly attached to the School of 
>Education Technology), and will take
>
>part in the Workshop. Notable work on Bengali and Devanagari OCR is currently
>
>going on in the CMATER Centre of the Department of Computer Science and
>
>Engineering: they have developed 'Anulikhan', 
>the first OCR system in any Indic
>
>script with editable format and original layout retention.
>
>       The School of Cultural Texts and Records 
> (comprising technologists as well
>
>as literary scholars and historians) conducts a 
>range of textual projects using
>
>electronic resources. These include an experimental collation software – the
>
>first in any Indic language. The School already 
>has a major digital archive of
>
>Bengali literary manuscripts in non-editable form, various bibliographical
>
>databases including the first Short-Title 
>Catalogue in any Indian language, and
>
>a large music archive in digitized form. There 
>is interaction between members of
>
>the Arts Faculty and the Faculty of Engineering and Technology in matters of
>
>textual computing. We hope the Workshop will enhance this.
>
>
>
>ALLC PARTICIPANTS: To be provided by ALLC. The 
>team will be led by Prof. Laszlo
>
>Hunyadi, Chairman, Department of General and 
>Applied Linguistics and Director,
>
>Centre for Digital Humanities, University of Debrecen, Debrecen, Hungary.
>
>
>
>PROGRAMME SCHEDULE (to be finalised after consultation with the ALLC):
>
>        The programme of papers for the 
> Conference will be finalized later. For
>
>the Workshop, we propose two sessions each day, divided as follows:
>
>   Improvement of Unicode font generation in Bengali, for manual keying-in of
>
>   texts in editable format.
>
>   Degraded document processing and core OCR engine.
>
>   Advanced OCR system with enhanced graphical user interface for document
>
>   digitization in editable format retaining original layout.
>
>   4.  Further development of the collation 
> programme for Indic scripts already
>
>   available with the School of Cultural Texts and Records.
>
>
>
>
>
>WORKSHOP OUTCOMES AND DISSEMINATION:
>
>As remarked above, despite the rich textual heritage in Bengali, there is
>
>relatively little textual and editorial 
>awareness. Today, electronic resources
>
>can enable us to achieve this awareness, and thus leap-frog into an advanced
>
>editorial culture, in a relatively short span of 
>time. Once we have a 'bank' of
>
>digital texts in editable format, we can proceed 
>to electronic editing and other
>
>advanced processing of texts.
>
>       This calls for the close interaction of 
> textual and documentary scholars
>
>in Indic languages with experts in electronic 
>texts and literary computing. As
>
>yet, there is little such contact. We hope the 
>Workshop will help to bridge the
>
>gap. It will bring technological and moral support to the people actually
>
>working on electronic texts in Bengali and other 
>Indic languages, and help them
>
>find their place in an international context. At 
>the same time, it will foster a
>
>more informed level of operational skill among general textual and literary
>
>scholars. Delegates from both these categories will form the first group of
>
>beneficiaries.
>
>       There will be a 'spread effect' extending 
> to other Indian languages, where
>
>the problems are often the same. Experts in 
>electronic texts in these language
>
>areas would constitute a second tier of beneficiaries.
>
>       There would also be 'spread effect' of another sort: raising textual
>
>awareness, and exercising that awareness through electronic texts, among all
>
>students and archivists of Indic languages, literature and history – in fact,
>
>any discipline requiring textual documentation. This raising of consciousness
>
>would confer an unquantifiable benefit at a third level.
>
>

++++++++++++++++++++++++++++++++++++++++
Simon Tanner
Director,  King's Digital Consultancy Services
King's College London
Kay House, 7 Arundel Street, London WC2R 3DX
tel: +44 (0)20 7848 1678 or +44 (0)7887 691716
email: [log in to unmask]	
www.digitalconsultancy.net

Connecting Culture and Commerce Conference: January 2007, National Gallery
http://www.digitalconsultancy.net/mcg2007/

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager