Hi Max
Thanks for your thoughts, all of which are very pertinent to our aims.
- The issue of what to do once the data points start accumulating is
an important one. This was all done in a relatively short project (1
year) and on a fairly small budget. We're looking to get follow-on
funding in order to approach precisely such issues and we will
certainly be taking the very impressive work you showed us in Beijing
on board.
- The visualisation app is built directly on top of an API which is I
believe is open but currently not able to support a lot of load. This
in turn runs from a Neo4J graph database into which we have imported
sets of RDF triples provided by the Pelagios consortium partners.
Rainer can give you a more complete picture but there are two
important consequences:
1. Our model is not to directly aggregate all the data in some
centralised repository but to let each partner host their own data in
whatever way is appropriate to them as long as it is RDF (RDF/XML,
SPARQL, RDFa, whatever) and follows a very basic OAC annotation
pattern. The raw data itself is (or will be) directly available from
each of the partners under an open license. For the time being you'll
probably need to go the blog site to find out where to get it but in a
future iteration we're planning an indexing service.
2. We don't centralise the data precisely because we believe that good
people such as yourself are likely to do far more imaginative things
with it than we are ;-) Likewise this visualization is not the be-all
and end-all of what we can achieve. What we ultimately want to see are
use-cases where different combinations of data are brought together
for targetted use. This will go some way (although obviously not the
whole way) to helping with the first problem. Likewise, each of the
individual partners may want to develop specific browsers or widgets
that give direct access to other resource sets within the consortium
(or all of them).
- The visualization is currently working over a subset so the numbers
are meaningless (arguably they'll be meaningless anyway, but you know
what I mean). It's a) only a handful of the Pelagios partners, and b)
only a subset of their data for testing purposes. Full datasets should
be available by the completion of the project (end of October).
Obviously I can't speak for ARACHNE personally but I suspect that
numbers of references will be much higher once all the data is
available.
- As regards normalization - yes all geoannotations are normalized.
All references are to a Pleaides URI so they are semantically
disambiguated regardless of the toponym used in the local resource set
(there's only one dataset which is an exception to this rule but we
hope that will also come with time). Currently search needs to be with
the main Pleaides toponym ('Aegyptus' for example), but we're planning
to use Pleiades+ (i.e. Pleaides merged with GeoNames) as a data
dictionary so that we should be able to handle multiple and
multi-lingual toponyms at that end as well.
So, just to re-emphasise, we really see this as one more step along
the way. Many of our partners (and others) have been laying the ground
work for this kind of thing for years. Equally, the big picture is to
increasingly start thinking of all these (and other) amazing resources
as an interconnected ecosystem, not walled gardens. There's no need to
stop at places - we need to start moving on to other areas of overlap
as well. At Pelagios we're also moving away form the 'consortium' word
(which implies we're a closed gang) towards more of a 'collective'. If
anyone has data they think could be usefully contextualised by the
data that's coming through now we want to give them tools and
documentation to do it - no 'membership' required. That way they gain
from what's already there and we all gain from them.
Hope that answers your questions. Rainer may wish to add some
technical details but feel free to ping us back if you want to know
more.
Best
L.
On Fri, Aug 5, 2011 at 3:53 PM, Maximilian Schich
<[log in to unmask]> wrote:
> Hi Leif and all,
>
> Nice extension to browse the database!
>
> In terms of the big picture - say, where you have to display more than 500
> locations and sources - though, I am pretty sure you will run into the same
> scaling issue Michele and I have resolved in part with our
> CAA-Beijing/KDD-MLG paper:
> http://www.cs.purdue.edu/mlg2011/papers/paper_22.pdf
> It is easy to translate the issue, as your "data(sub)sets plus locations"
> basically correspond 1:1 to our "publications and classification criteria".
>
> In this light - meaning to enable large scale overview visualizations from
> different perspectives - it would be great if you could harness all our
> curiosity and provide an api or even better periodic data dumps - plus a way
> to write back via the api, as possible in Freebase (also owned by Google ;)
> ).
>
> Minor issue: According to your video, the occurrence of "Athens" in
> "Arachne" is only "88". Is it really that low?
> "Athens" or "Athenai" should still be the second most frequent location in
> literature on Classical Archeology - and we are talking "fat tail" here,
> meaning if Athens has only 88, most other locations (except for Rome,
> Pompeji, Thebai and a few others) will occur maximally once. Of course this
> may be different for the set of ancient sources you have in GAP, but I am
> nevertheless pretty sure "Arachne" has more than 88 objects from Athens. In
> other words, let me ask the question: Is the low occurrence of "Athens" due
> to the fact that "Athenai", "Athens", and "Athen" are not normalized yet, or
> due to the fact that you use a subset of data for test purpose? Can you do
> the same video with "Athenai", "Athen" or say "Roma", or will that provoke
> the scaling problem with the app having a hard time to dynamically display
> thousands of nodes and links?
>
> This is not davastating critique. In fact I think your work is great and
> pretty fundamental.
> But if I am right, there is a very clear and hard mission to pursue:
> Namely to understand and represent the structure of our data as we put it in
> a huge pile.
> Again I guess, more is different.
>
> Best, Max
>
> Dr. Maximilian Schich
> http://www.schich.info
>
>
> Am 05.08.11 05:23, schrieb Leif Isaksen:
>>
>> Hi all
>>
>> Thanks to some sterling work by Rainer Simon at AIT, we're just
>> starting to see the first fruits of the Pelagios consortium's Linked
>> Geoannotations and, well, to be honest I just wanted to show them off
>> a bit. :-)
>>
>>
>> http://pelagios-project.blogspot.com/2011/08/pelagios-graph-explorer-first-look.html
>>
>> This is still very early days (with a very limited test dataset) and
>> all feedback is extremely welcome.
>>
>> Best
>>
>> Leif
>>
>> PS We're still looking for a name for the tool. We be delighted to
>> hear suggestions.
>> PPS the Pelagios consortium keeps growing so if you have an (open)
>> dataset that you think would be suitable, please get in touch.
>>
>
|