DefiningImageAccess/Tool/Eprints
From ImageWeb
| DefiningImageAccess/Tool/Eprints | |
|---|---|
| Link:=http://www.eprints.org/}} | |
| Status:=Active}} | |
| JISCTool:=False}} | |
| Focus:=Metadata Repository}} | |
| [[Publishes::{{{Publishes}}}]]}} | |
| [[Uses::{{{Uses}}}]]}} | |
| [[RelatedTo::{{{RelatedTo}}}]]}} | |
| [[Partner::{{{Partner}}}]]}} | |
| [[Contact::{{{Contact}}}]]}} | |
EPrints
OAI repository software from Southampton University.
Implemented in PERL, runs behind APache server on Linux, and maybe on other platforms. Relatively easy to set up and use.
Adaptation to special requirements is accomplished mainly by adding or modifying Perl code modules.
Installation
This installation is tested on Scientific Linux release 4.4 and it follows the documentation at Eprints wiki Installatiaon guide on RedHat Enterprise 4.
The updates can be run using "up2date" or "yum install" command.
Install Prerequisites
OS updates
up2date --nox -u httpd wget gzip xpdf lynx unzip
up2date --nox -u mod_perl perl-DBI perl-DBD-MySQL perl-XML-Parser perl-XML-LibXML
- Update to lynx failed as it is signed with an unknown GPG key. Come to updating lynx if needed later.
up2date --nox -u mysql-server
- update to mySQL is not done, in case to have negative effects on the existing mySQL data repository. The existing mySQL is Ver 14.7 Distrib 4.1.20. The recommended version is mySQL 5.0.
If the LaTeX functionality is needed, the TeX system and ImageMagick can be installed.
up2date --nox -u tetex-latex ImageMagick
Update glib-devel and libxml2-devel package for GDOME
up2date --nox -u glib-devel libxml2-devel-2.6.16-6
Note that in the new linux system running on Anddros, we need to update glib2-devel and libxml2-devel-2.6.26.1
yum install glib2-devel
wget ftp://xmlsoft.org/libxml2/libxml2-devel-2.6.26-1.i386.rpm
rpm -Uvh libxml2-devel-2.6.26-1.i386.rpm
Install Mod_Perl
This step is very important because most of the code from Eprints are implemented using Perl. This module has been updated in order to run Eprints properly. (This step is not necessary for the Centos linux system because it has an updated mod_perl module by default.)
- uninstall the old verion mod_perl
rpm -e mod_perl-1.99_16-4
- download the source from http://perl.apache.org/dist/mod_perl-2.0-current.tar.gz
- To build mod_perl, you must also use the same compiler that Perl was built with. You can find that out by running. To find out the configuration of the currently install Perl and looking at the Compiler: section.
perl -V
- Dynamic mod_perl
cd modperl-2.x.x
perl Makefile.PL MP_APXS=/usr/sbin/apxs MP_APR_CONFIG=/usr/bin/apr-config
- Adding the following line in /etc/httpd/conf/httpd.conf
LoadModule perl_module modules/mod_perl.so
Note, make sure that this points to where the mod_perl.so is actually located in. Run
/usr/sbin/apxs -q LIBEXECDIR
to check where exisitng modules are installed. (/usr/lib/httpd/modules) and adjust according to your installation.
- compile mod_perl
make
- Testing mod_perl (not as the root)
When mod_perl has been built, it's very important to test that everything works on your machine:
make test
If something goes wrong with the test phase and want to figure out how to run individual tests and pass various options to the test suite, see the corresponding sections of the bug reporting guidelines or the Apache::Test Framework tutorial.
- install it (as the root)
make install
Other Perl Modules
Using CPAN to update some perl modules. It would require to set up CPAN if it is its first time being launched.
cpan # (or: perl -MCPAN -e shell?)
install Data::ShowTable
install MIME::Base64
install Unicode::String
install Term::ReadKey
install Readonly ******NEW*******
install MIME::Lite
install XML::LibXML
install CGI (This is an update as I found RHEL dist has a bug for something else)
- We originally noted: "the second line didn't work for me. So I have run each install manually". I think this was because
cpanandperl -MCPAN -e shellare alternative ways to activate the CPAN utility command shell, so I've changed one to a comment in the above.)
- The standard Perl distribution in RHEL 4 (SL4) comes with a version of CGI.pm that is incompatible with mod_perl version 2 for Apache 2. The
install CGIcommand in the sequence below updates the installed version of CGI. Unfortunately, automatic updates of the Perl package (via Yum) can cause the installed version of CGI to revert, leading to errors of the form: "Can't locate Apache/Response.pm in @INC" when starting the httpd service. To correct this, re-install CGI thus:
cpan install CGI
- and restart httpd.
- make sure apache&mysql start when you reboot
/sbin/chkconfig mysqld on
/sbin/chkconfig httpd on
Installing GDOME and the perl GDOME interface
GDOME is the Gnome DOM Engine.
cd /root/
wget http://gdome2.cs.unibo.it/rpm/gdome2-0.8.1-1.i386.rpm
wget http://gdome2.cs.unibo.it/rpm/gdome2-devel-0.8.1-1.i386.rpm
rpm -Uvh gdome2-0.8.1-1.i386.rpm gdome2-devel-0.8.1-1.i386.rpm
Fix the bug in gdome-config
To see if the bug is a problem run:
gdome-config --libs
If you get something like:
/usr/bin/gdome-config: line 86: --libs: command not found
/usr/bin/gdome-config: line 87: --cflags: command not found
then you need to fix the bug. To do this, as root edit "/usr/bin/gdome-config"
vi /usr/bin/gdome-config
Around line 88 find these two lines:
the_libs="$the_libs -L${exec_prefix}/lib -lgdome ` --libs` `xml2-config --libs`"
the_flags="$the_flags -I${prefix}/include -I${prefix}/include/libgdome ` --cflags` `xml2-config --cflags`"
And change them to this:
the_libs="$the_libs -L${exec_prefix}/lib -lgdome `/usr/bin/glib-config --libs` `xml2-config --libs`"
the_flags="$the_flags -I${prefix}/include -I${prefix}/include/libgdome `/usr/bin/glib-config --cflags` `xml2-config --cflags`"
Install XML::GDOME
Do this from source, not via CPAN as one of the tests is broken.
cd /usr/local/build
wget http://cpan.uwinnipeg.ca/cpan/authors/id/T/TJ/TJMATHER/XML-GDOME-0.86.tar.gz
tar xzvf XML-GDOME-0.86.tar.gz
cd XML-GDOME-0.86
perl Makefile.PL
make
make install
Install Eprints Version 3
Download the latest version from Eprint web site to /usr/local/build.
tar xzvf eprints-3.0-rc-1.tar.gz
cd eprints-3.0-rc-1
./configure --with-smtp-server=smtp.ox.ac.uk --prefix=/var/lib/eprints3 --with-user=apache --with-group=apache
./install.pl
Note that:
User apache
Group apache
is what is used in the apache conf: /etc/httpd/conf/httpd.conf
Getting started
Basic setting up
Configure Apache
Edit the apache config:
vi /etc/httpd/conf/httpd.conf
add this to the end:
Include /var/lib/eprints3/cfg/apache.conf
This eprint directory helds the apache configuration for all the repositories. If you want to give repository-specific configurations, you need to edit the conf files in the directory of "/var/lib/eprints3/archives/ARCHIVEID/cfg/". Normally you should not edit these files, because they were automatically generated during the installation of Eprints.
An alternative way to handle the Apache configuration, which I (GK) prefer, is to place the configuration file in /etc/httpd/conf.d/eprints.conf. There is a wildcard include in the standard httpd.conf file that will pick up this file and include it in the server configuration without changing anything in the standard Apache installation.
Configure SELinux
SELinux security is quite difficult to get just right, so for the time being we are running with it disabled (or running in "audit" mode rather than "enforce", which means that SELinux access rule violations are logged but not enforced.)
Check if SELinux is either Permissive or Disabled:
root> /usr/bin/getenforce
And it returned Permissive. If getenforce gives Enforcing, then disable SELinux:
root> /usr/bin/setenforce 0
Set up the indexer as a service
Don't need to do this until you have built the repository. To make the indexer into a service which starts and stops on reboots etc. like httpd and mysqld do the following (as root):
ln -s /var/lib/eprints3/bin/epindexer /etc/init.d/epindexer
/sbin/chkconfig --add epindexer
/sbin/chkconfig epindexer on
Start the indexer:
root> /sbin/service epindexer start
Diagose indexer problems
Check the status of the indexer service:
service epindexer status
Find the pid of the indexer service
ps ax | grep indexer
or
less /var/lib/eprints3/var/indexer.pid
One effective way is to kill the indexer process and then remove /var/lib/eprints3/var/indexer.pid.
Configuring Eprints
All commands should be run as the Eprints user ('apache' in our case)
su -s /bin/sh apache
Start the configuration script :
cd /var/lib/eprints3
bin/epadmin create
The script will run through a number of configuration options. Please change the settings to suit your site configuration. The setting options are kept in a setting file in the local disk.
This script enables us to set a repository, the URI of the repository, etc.
Branding
Edit the following files:
- Update the default.xml in the $ARCHIVE_HOME/cfg/lang/en/template directory
- Update the *.xpage in the $ARCHIVE_HOME/cfg/lang/en/static directory
- Update the zzz_local.css in the $ARCHIVE_HOME/static/style/auto/ directory
- Update images in the $ARCHIVE_HOME/static/images/ directory
Import Subjects
The subject hierarchy provides a tree of subject codes that can be used to "tag" EPrints records. For special purpose repository content, special subject trees can be provided to reflect the domain of interest, and to help researchers find relevant content.
See also: http://wiki.eprints.org/w/EPrints_3_Organisation_Hierarchy
1. Edit the following subject file to match your subjects :
vi /var/lib/eprins3/archives/yourarchivename/cfg/subjects
2. Subjects are expressed in the format of:
KEY:TITLE:PARENTS_KEY(S):CANADD?
Note that, the key of the root element of an subject tree has to be named as subjects in order to be picked up by eprints:
subjects:TITLE:PARENTS_KEY(S):CANADD?
We can create several subject trees under the root of subjects, e.g. the Animal Behaviour Ontology, the Animal Welfare Ontology, etc.
3. After updating the subject file, run the following comman as the apache user:
/var/lib/eprints3/bin/import_subjects yourarchivename
Running a Live Archive
This step schedules how the backup, view update of the repository, etc is automatically, periodically done.
The best way to do this is to use "cron", a UNIX command. To set up cron, run (as the user "apache"):
crontab -e
[[Note:=See also: crontab.d, allows separate config file for each application schedules, but note that the file entry format here is slightly different.]]
Setting the run of generate_views
In the editor, adding the line:
23 0 * * 7 /var/lib/eprints/bin/generate_views coco
This runs at 23 minutes at 00.00am on every Sunday. If you have more than one archive, don't make them all start rebuilding stuff at the same time, stagger it. Otherwise once an hour everything will slow down as it fights to run several intensive scripts at once.
See the crontab man page for more information on using cron.
man 5 crontab
Setting up Alert
Alerts provide a way in which users of your system can receive regular updates, via email, when new items are added which match a search they specified.
To automate sending out these alerts you must add some entries in the crontab (as for views). You need one set of these per archive.
For example:
# 00:15 every morning
15 0 * * * /var/lib/eprints/bin/send_alerts coco daily
# 00:30 every sunday morning
30 0 * * 0 /var/lib/eprints/bin/send_alerts coco weekly
# 00:45 every first of the month
45 0 1 * * /var/lib/eprints/bin/send_alerts coco monthly
Setting up OAI
The setting for OAI are held in the oai.pl file, in the eprints3/archives/<archive id>/cfg/cfg.d/ directory. At the command prompt, backup then open the file:
cd /var/lib/eprints3/archives/coco/cfg/cfg.d/
cp oai.pl oai.pl.backup
vi oai.pl
The following has been changed:
- The archive ID. This needs to be unique, so check that it doesn't already exist at http://www.openarchives.org/.
Find the following line in oai.pl:
$oai->{v2}->{archive_id} = "generic.eprints.org";
and change it into:
$oai->{v2}->{archive_id} = "chios.zoo.ox.ac.uk";
- Content Description. What does your repository contain? Write a description, then find the lines:
$oai->{content}->{"text"} = latin1( <<END );
OAI Site description has not been configured.
END
Next you need to define a number of policies which will define how your repository may be used. It may be helpful for you to visit http://www.opendoar.org/tools/en/policies.php which has a step-by-step process to create these policies. It will even output EPrints 3 configuration code. which you can then copy and paste into the oai.pl file. These policies are:
* Metadata Policy * Data Policy * Submission Policy
These are updated in exactly the same way as the Content Description section. Just look for the following lines:
* $oai->{metadata_policy}->{"text"} = latin1( <<END );
* $oai->{data_policy}->{"text"} = latin1( <<END );
* $oai->{submission_policy}->{"text"} = latin1( <<END );
- Register the repository at http://www.openarchives.org/Register/ValidateSite using the endpoint:
http://chios.zoo.ox.ac.uk/cgi/oai2 (This is not done yet until the registry is ready for public.)
[[Note:=When we go public, name will be www.fly-ted.org]]
Domain metadata schema
This section is based on the HowTo Create Export Plugin provided in Eprints3 manual. This Export Plugin enables us to export data of our own type through the repository. To check the existing plugins provided by Eprints by default:
ls /var/lib/eprints3/perl_lib/EPrints/Plugin/Export
Then taking the following steps:
1. create the plugin file *.pm in the director /var/lib/eprints3/perl_lib/EPrints/Plugin/Export
2. modifying the repository-specific configuration file, e.g. the file located in "/var/lib/eprints3/archive/coco/cfg/cfg.d/oai.pl".
- Find the line:
# The output plugins must be loaded for the archive and have
# the methods xml_dataobj and properties for xmlns and schemaLocation.
#
# The keys of this hash are the OAI metadataPrefix to use, and the values
# are the ID of the output plugin to use for that prefix.
$oai->{v2}->{output_plugins} = {
"oai_dc" => "OAI_DC",
"didl" => "DIDL",
"uketd_dc" =>"OAI_UKETD_DC",
"context_object" => "ContextObject",
"mets" => "METS" };
- We tried to add a new domain-specifid metadata schema
$oai->{v2}->{output_plugins} = {
"oai_dc" => "OAI_DC",
"didl" => "DIDL",
"anim_bh" => "animbh",
"uketd_dc" =>"OAI_UKETD_DC",
"context_object" => "ContextObject",
"mets" => "METS" };
3. restart apache server in order to make the modification take effect
/sbin/service httpd restart
Note, to create the plugin.pm file requires the knowledge about Eprint APIs. Useful information can be found in the following places:
- /var/lib/eprints3/perl_lib/Eprints, it contains many perl scripts used by Eprints, which can be used to understand what methods can be used to manipulate an Eprint object, a Document object, etc.
- the documenation on Eprints wiki: http://wiki.eprints.org/w/Data_Object
Todo
- provide a schematic view of the relationship between these objects
Notes for Developers
Obtain documentations of perl scripts from the EPrints software
perldoc *.pl
Useful resources for developers
http://www.eprints.org/tech.php/index-80
Batch uploading images and their metadata files
- create a perl script to access the $sesssion, $repository, and $datasets variables.
- see http://www.eprints.org/tech.php/6942.html
- see also: http://imageweb.zoo.ox.ac.uk/wiki/index.php/DefiningImageAccess/SoftwareEvaluation#batch_uploading_images_and_their_metadata_files
Increase repository size
Edit the file document_upload.pl in the archives/cfg/cfg.d directory
Extend name sets
This can be used to extend the types of documents that can be deposited in the Eprint repository, the MIME types. And more general, the properties that can be associated with an Eprint object, a user. It can also be used to set up the security setting for a repository. We use the example of adding a mime type application/xml to configuation to explain how the steps needed:
- editing the files in /var/lib/eprints/archives/ARCHIVEID/cfg/namedsets files, according to the goal of the task.
- add a phase in the phase file located in /var/lib/eprints/archives/ARCHIVEID/cfg/lang/en/phrases/. In our example, we need to edit the document_formats.xml file
- reload the repository
bin/epadmin reload ARCHIVEID
Eprints Directory Structure notes
This is based on the documentation from Eprints wiki site (http://wiki.eprints.org/w/EPrints_Directory_Structure)
- archives/ - Contains configuration and data for each repository. This is where most of the configurations were tweaked in order to upload the drosophila images.
- ARCHIVEID
- cfg/ - The repository configuration files.
- apache.conf - Repository specific apache config options which appear OUTSIDE the virtual host block.
- apachevhost.conf - Repository specific apache config options which appear INSIDE the virtual host block. Nb. That this is in addition to all the directives created automatically.
- autocomplete/ - Location of autocomplete datafiles.
- cfg.d/ - General configuration files. The core place for starting customisation
- citations/ - Citation files (these describe how to show objects in search results and so forth).
- lang/ - Language specfic files for this repository (phrases, static pages and images, site template) Edit it if you want to extend the phrase recognisable for Eprints
- namedsets/ - These are files which contain lists of values for named set fields. Edit it if you want to extend the document or eprint types, the security setting for a document, types of users, etc.
- static/ - Source for pages and images in the repository which are not language dependent. Edit it if you want to brand the outlook of the Eprint web interface, e.g. logo or layout, etc.
- subjects - Text file containing initial subject tree that will be imported with eprints3/bin/import_subjects
- workflows/ - Workflow files describing the pages which edit EPrint and User records.
- documents/ - Uploaded documents. Also contains per-record files such as full-text caches, thumbnails and revision history. No need to edit the files in this directory by hand.
- html/ - The website for this repository. Do not edit by hand.
- var/ - Files generated by eprints specific to this repository. Do not edit by hand.
- cfg/ - The repository configuration files.
- ARCHIVEID
- bin/ - Contains command line tools. The follows are frequently used:
- epadmin - Perform admin tasks including creating repositories.
- generate_abstracts - Update all the Metadata-Summary pages. (The page for a single EPrint which links to the documents)
- generate_static - Update all the static web pages.
- generate_views - Update all the /view/ pages.
- import_subjects - Import the subject tree from a text file or XML file.
- lift_embargos - This script should be run once a day. It removes security on documents which embargo date has passed.
- the others can be referred on the page (http://wiki.eprints.org/w/EPrints_Directory_Structure/eprints3/bin)
- perl_lib/ - Contains the perl-libraries used by EPrints. The .pm files contained in this directory can be referred if you want to do some coding or extension to Eprints. Our batch upload script should be loaded in this directory in order to use the EPrints.pm script.
- cfg/ - Contains site-wide configuration files.
- cgi/ - Contains CGI files (dynamic web pages)
- lib/ - Contains the read-only data used by EPrints (do not edit!)
- testdata/ - Contains a tool + data to import example data into your repository.
- var/ - Used to store files output by EPrints including logs and process ID files.
Moving the data files to another drive
EPrints keeps its raw data files for all repositories (i.e. other than the database used for the metadata index), including images, in a directory whose default location is something like `/var/lib/eprints3/archives/`.
To store these files on another drive, try something like this:
- Shut down httpd, hence eprints
- Mount the new drive at a suitable mountpoint, say, /eprints
- Copy files to the new drive: `cp -ax /var/lib/eprints3/archives /eprints/`
- Rename the original directory: `mv /var/lib/eprints3/archives archives.yyyymmdd.old`
- Create a symlink to the new location: `ln -s /eprints/archives /var/lib/eprints3/archives`
- Restart httpd
Adapting Eprints for the FlyTed project
FlyTed is intended to be a publicly accessible database of Drosophila in situ images of gene expression in spermatogenesis, which we eventually aim to cross-link with other public Drosphila databases (e.g. Flybase, BDGP, FlyMine).
A related project, FlyData, will be looking at using similar data as part of a laboratory data management and decision support system for Drosophila post-genomic researchers.
Questions
Scenario: we wish to import several thousand microsocopy images and associated metadata given the following: (a) images as TIFF files (b) a number of spreadsheets with one row per image, with columns containing the image file name, metadata field values and free text description.
Our plan is to convert the metadata to a suitable file format (e.g. RDF/XML) and create an eprints record with two data streams: (1) the image data, and (2) the metadata. (But we are open to alternative suggestions for this).
Q1. is there an existing ePrints bulk upload mechanism that can support this? We have seen references to eprints bulk upload, but have not been able to locate the actual mechanism.
- According to http://wiki.eprints.org/w/XML_Export_Format, the ePrints XML format can be exported and imported. SO the question becomes, how to import the XML? The bulk import procedure is described as part of the v2 to v3 migration procedure: http://wiki.eprints.org/w/Migration.
Q2. is there a web programmatic interface that can be used to create entries of the form proposed? Again, we have seen references to one (cf. http://www.ukoln.ac.uk/repositories/digirep/index/Deposit_API) but have not yet found documentation for the interface.
Q3. having imported our images and metadata, we would like to be able to display them in a style similar to that provided by the SERPENT project repository. We guess this is a done using an export plug-in and XSLT style sheets - is this correct? If not, how is it being achieved?
In all cases, pointers to sample data and/or code would be of great help to us.
Separately from the above, we would also like to locate an RDF parser so that we can parse and display metadata. There are a number of optiuons listed in CPAN that might be usable: http://search.cpan.org/search?query=RDF&mode=all.
The workflow for depositing drosophila images
- convert xsl files into csv files: this is done manually using Microsoft Excel
- parse the image metadata contained in the csv files and create a metadata file for each image: this is done by a Python script. This metadata file can be represented in plain text (for the convenience of uploading metadata to Eprints) or RDF (for the convenience of semantic retrieval of images).
- batch upload each image together with its metadata file: this is done by a Perl script based on the code from the EPrints team
- disseminate the images and their metadata:
- export the image metadata using the drosophia export plugin: this is done by a Perl script based on the code from the EPrints team and Jon Bell from UWA
- export the image metadata through the Eprint web interface for the scientists: this is done by editing the eprint_render.pl file located in the /var/lib/eprints/archive/ARCHIVEID/cfg/cfg.d/
Customizing EPrints repository
Customising the Underlying Database
Adding extra domain-specific fields for an EPrint object
- see also http://wiki.eprints.org/w/Adding_a_Field_to_a_Live_Repository
- see also http://wiki.eprints.org/w/HOW_TO:_Add_a_New_Field
This description is applicable to a live Eprints 3.0 repository.
The following steps are needed:
- shutting down apache server
- adding new fields in the archives/ARCHIVEID/cfg/cfg.d/eprint_fields.pl
- adding the new field names in the archives/ARCHIVEID/cfg/lang/en/phrases/eprint_fields.xml file
- altering the MySQL table eprint and eprint__ordervalues_en
- restart apache
In more detail, in the 3rd step, the following commands were executed for the mysql tables:
- if the value is not multiple
ALTER TABLE eprint ADD geneid VARCHAR(255) default NULL AFTER column_name;
- if the value is multiple
create table eprint_expression (eprintid INT NOT NULL, pos INT, expression varchar(255) default NULL, KEY eprintid (eprintid), KEY pos (pos) );
- for the eprint__ordervalues_en table
ALTER TABLE eprint__ordervalues_en ADD geneid TEXT AFTER column_name;
In order to display these new fields in the summary page for each eprint record, the following needs to be done:
- editing the archives/ARCHIVEID/cfg/cfg.d/eprint_render.pl script to set up which extra fields to display
- editing the archives/coco/cfg/lang/en/phrases/render.xml file to add new field names
- rebuilding the abstract page by running bin/generate_abstract archivename [recordid]
Relevant references from the EPrints' wiki
Customise the views
- add new views generated using domain-specific metadata fields to cfg/cfg.d/views.pl
- add new view names to cfg/lang/en/phrases/views.xml
- add new citation views to cfg/citations/eprint/*.xml
Customizing the browsing interface
This customization process takes three steps: 1) the first step aims to display all the domain metadata information associated with each drosophia image file; 2) the second step aims to display each image as a thumbnail; 2) and the last step aims to customize the style of the summary page.
Branding
See DefiningImageAccess/Tool/Eprints#Branding above.
Customize metadata information displayed in each summary(abstract) page
This is done by editing the following files:
1. editing the ``eprint_render.pl in the archives/coco/cfg/cfg.d directory This file is the start point for customising the summary page. The script produces an XHTML object for an eprint object. To display extra metadata information, one can add the following example lines in the script file:
$table->appendChild( $session->render_row(
$session->html_phrase("page:probe"),
$session->make_text("CG8564")));
This code displays the probe ID of an image in a HTML table.
2. Note that, the field name used in the "html_phrase" needs to be added to the "render.xml" file in the archives/coco/cfg/lang/en/phrases/ directory. The following shows the example line added in the render.xml file:
<epp:phrase id="page:probe">Probe ID</epp:phrase>
3. Finally, we need to run the following command as an apache user in order to make the changes take effect.
bin/generate_abstract archivename [recordid]
Acknowledge: thanks for the advices given by the eprint tech mail list.
Display thumbnails
This should come as default with EPrints 3 installation. However, ImageMagick was not install on the server when EPrints was installed. This was fixed by installing ImageMagick and configuing the following places:
- in the perl_lib/EPrints/SystemSettings.pm file point the correct path of the "convert" tool:
'executables' => {
'convert' => '/usr/bin/convert',
...
- generate the thumbnails
bin/epadmin redo_thumbnails repository_id
- generate the abstract pages
bin/generate_abstracts repository_id
- restart the apache server
Acknowledgement: thanks for the advice given to achieve this are due to Tim Miles-Board from the EPrints developer group.
Configure the size of thumbnails
Editing the file perl_lib/EPrints/Plugin/Convert/ImageMagick/ThumbnailImages.pm.
For our repository, we have changed the original line in the above file:
my $geom = { small=>"66x50", medium=>"200x150",preview=>"400x300" }->{$1};
into
my $geom = { small=>"96x72", medium=>"128x96",preview=>"384x288" }->{$1};
and then regenerate the thumbnails.
Also, the following line in the file /var/lib/eprints3/perl_lib/EPrints/DataObj/Document.pm needs to be changed accordingly to take the size parameter. It is potential a bug in EPritns.
$a->appendChild( $self->{session}->make_element(
"img",
class=>"ep_doc_icon",
alt=>"[img]",
src=>$self->icon_url( public=>$opts{public}, size=>$opts{size} ),
border=>0 ));
This is an outcome from the EPrints surgery meeting.
- change png format thumbnails to jpg to improve the performance of image loading
This leads to the following modifications:
- change in the perl_lib/EPrints/Plugin/Convert/ImageMagick/ThumbnailImages.pm, png => jpg
- change in the perl_lib/EPrints/DataObj/Document.pm, png=>jpg
- restart the apache server
- regenerate the abstract page and the view page
The result shows that the performance is not significantly improved when the thumbnail format is changed from .png to .jpg, although it does show some improvements when the same change is given on the linux server (andros)
Display thumbnails for a browsing view
For example, to display thumbnails rather than a list of records in the page "Browse by year->2007", the following modifications are needed:
- editing the archives/ARCHIVEID/cfg/cfg.d/views.pl file to define the citation style for a view,
- define new_citation_style.xml in the directory of cfg/citations/eprint/ in order to display thumnails.
- update the style sheet in the archives/ARCHIVEID/cfg/static/style/auto/zzz_local.css script
For step one, we can edit views.pl by adding the following line:
$c->{browse_views} = [
{
id => "year",
citation => "new_citation_style",
...
},
For step two, the file result.xml, which is used to display search results, is helpful for making the changes. This file helped us to create the new_citation_style.xml that contains the following content:
<cite:citation xmlns="http://www.w3.org/1999/xhtml"
xmlns:cite="http://eprints.org/ep3/citation"
xmlns:epc="http://eprints.org/ep3/control"
type="table_row">
<table>
<!--caption of each figure-->
<!--calling the citation/eprint/brief.xml script-->
<caption><epc:print expr="$item.citation('brief')" /></caption>
<!--put each figure in a cell-->
<tr><td>
<thumbitem>
<!--calling the render_fileinfo function of the DataObj:EPrint class-->
<epc:print expr="fileinfo"/>
</thumbitem>
</td></tr></table>
</cite:citation>
For step three, the zzz_local.css is edited with the following lines:
div.ep_view_page h2
{
display:none;
}
div.ep_view_timestamp
{
display:none;
}
div.ep_view_page table
{
float: left;
margin: 20px 10px 5px 8px;
}
div.ep_view_page table { border-collapse: collapse; width: 140px; height: 90px;}
div.ep_view_page table td { padding: 0; }
div.ep_view_page table caption {
caption-side:bottom;
text-align:center;
width: 90px;
height:5em;
}
thumbitem img {
align: center;
border:1px solid black;
width: 100px;
height: 80px;
}
thumbitem img[src$="text_plain.png"]
{
display:none;
}
The first step is not used to configure the views of "geneid" and "strain" and "subject".
Paginate the thumbnail view
paginate_list function in perl_lib/EPrints/Paginate.pm
sub render_results {
$page->appendChild(
EPrints::Paginate->paginate_list(
$self->{session},
"search",
$self->{processor}->{results},
%opts ) );
}
Customizing the searching interface
After adding extra fields to the eprint objects, we can customize the search interface to enable scientists to use domain-specific metadata fields to search for images. The following steps are needed for this customization:
- editing the archives/coco/cfg/cfg.d/search.pl file
- updating the eprint_order.xml file for ordering by new metadata fields
- updating the citation view for the search results cfg/citations/eprint/result_alt.xml
- updating the CSS style for the search result page lib/static/style/auto/search.css
- rebuild the static view of the repository by running bin/generate_static archiveid
- restart apache server
Change the size of the keyword that can be used for searching
By default, the smallest size of keyword that can be used to search in EPrints is one containing 3 characters. To make it work with search by two-characeter keyword, two changes are needed:
- Change in the perl_lib/EPrints/Index.pm file, give the value of the parameter "FREETEXT_MIN_WORD_SIZE=2".
- Change in the archives/ARCHIVE_NAMES/cfg/cfg.d/index.pl, give the value of $c->{indexing}->{freetext_min_word_size} = 2;
Then reindex the repository as the apache user:
./bin/epadmin reindex flyted eprint --verbose
The above command only starts the queue of the index. To check how many remain in the queue we can query the sql database:
select count(*) from index_queue;
The actual index could take a couple of hours. Then the search should work.
Customizing the image uploading workflow (TODO)
Further Customisation for FlyTED Release V1.0 (2008_01)
Delete the RSS icons in the search result page
Edit the local search.css file, to put the RSS icons invisible.
Customize the Advanced Search page
- Make the "Don't panic! etc" in red font.
- Edit the lib/lang/en/phrases/system.xml file, create a div class for this phase ()
- define the font the colour for this class as red in the css file (.red p{color: red;})
- Change the help headings in the search form page
- Edit the perl_lib/EPrints/Session.pm, search for "inline_help",
- delete the help toggle bar
- change the heading of the help text to show different headings for different search fields
- Edit the css file to set the .ep_no_js as displaying always
- Edit the perl_lib/EPrints/Session.pm, search for "inline_help",
- Change the layout of the search page:
- Edit perl_lib/EPrints/Plugin/Screen/AbstractSearch.pm, in the "render_search_form" method, change what order/control/fields you want to display.
- also in this script, in the "render_controls" method, add the "render_order_field_bar" function in order to let the order menu display on the top of search form page.
Customize the view pages
- change the caption in order to let users to click on the caption to reveal the metadata page
edit the archives/***/cfg/citation/eprint/caption.xml file
- hover anywhere on the image to display a pop-up windown, as it is now, but without the i icon.
edit the archives/***/cfg/citation/eprint/new_thumbnail_style.xml
- click on images to get large image in a new window/tab
This is not achieved, because I do not how to replace the "cite:linkhere" keyword used by eprints for generating a html hyperlink tag.
- change the descriptions on the top of each view, to give the direction to users how to browse this view: edit the lib/lang/en/phrases/system.xml, and change the value of the bin/generate_views:blurb property.
Customize the expression view page
Restruct the expression tree in order to keep it more consistent with the FlyAnatomy ontology, shoving everything that cannot fit into that ontology into the "other" branch.
FlyTED Release V1.1 (2008_02)
The big challenge for this release is to deal with the new collection of images that Liz scanned and kept in very varied sizes and dimensions.
Eprints Cheatsheet
- index page
- lang/en/static/index.xpage
- lang/en/template/default.xml
- search page
- cfg.d/search.pl
- lang/en/phrases/eprint_fields.xml
- static/style/auto/search.css
- the abstract page
- eprint_render.pl for setting the content
- cfg/lang/en/phrases/render.xml, for adding new phrases
- browsing view page
- cfg.d/views.pl
- cfg/citations/eprint/*.xml for defining the style
- cfg/static/style/auto/zzz_local.css
Minutes
Meeting with Helen White-Cooper on 29/06/2007
For the browsing interface
- what Helen wants is quite similar to the interface of BDGP (http://www.fruitfly.org/cgi-bin/ex/bquery.pl?qtype=report&est_id=LP09189):
- she wants to have an overview page listing all the images by the gene ids, in order to assist users who have no knowledge about what the repository has, similar to the view on http://www.fruitfly.org/cgi-bin/ex/bquery.pl?qpage=entry&qtype=summary
- she wants users to brose the images by gene ids. After choosing a gene id, the interface will present a collection of images of this gene id and group and order them by the strain. The images of each strain should be grouped together. This is similar to the view on the page http://www.fruitfly.org/cgi-bin/ex/bquery.pl?qtype=report&est_id=LP09189.
- ideally there should be some hyperlins with FlyBase through the gene ids, or something else.
- IMPORTANTLY: there are images from certain strains that she would like not to be published at this stage.
For the searching interface
- Helens wants a "search by gene id" interface. This should be the most basic searching facility.
- For the advanced search, Helen expects something similar to http://www.fruitfly.org/cgi-bin/ex/basic.pl
For the controlled vocabulary
- Helen has provided the list of strains whose images could be published.
- Helen has provided a clarification of the "stage description" and "comments" columns.
- These two columns contain a mix of information about the stages and the description of the strength of the signals.
- No controlled vocabulary is used for the description, although the team agrees with the terms.
- It might be expected that Liz could provide a quick clean-up of the records.
Strain pattern (not complete)
- No signal
- spermatocytes to elongation
- early spermatocytes
- early-mid spermatocytes
- mid spermatocytes
- mid-late spermatocytes
- late spermatocytes
- all spermatocytes
- somatic
- AG
- SV
- terminal epithelium
- other
- comets
- cups
- really
Actions
- The outcome of this meeting drives the first customisation of the browsing interface to the standard of BDGP:
- browse information by the gene ids
- group images of the same gene id by the strain name.
- collect more feedbacks from the scientists on the meeting on 09/07/2007.
Meeting with Helen's Group on 09/07/2007
The following feedbacks are going as the priority:
- customize the front page
- make the abstract page more condense to show everything in one page
- customize the search page leaving only those fields that matter to the scientists
- give a title for each image in the tiled view
- in the view of browse by strain, order the images by their gene ids
- make the resolution of the tiled images better
- put wt on the top of the list
The following may impact on future directions of the repository:
- from Helen, browse from an image to the gene information
- from Helen, allow scientists clicking on related images in order to group and compare them.
- from David, allow third parties to comment on the images. Helen is willing to contribute her group to monitor the comments to avoid spams.
Meeting with Liz on 16/07/2007
Based on our (Liz, Elin and me) short discussion on 10/07/2007, Liz carried out the work of cleaning the spreadsheet and creating a subject tree in order to upload all the images into the repository. The result of this one week work leads to the following:
- an updated spreadsheet which contains:
- a new column to describe the signal of Atypical spermatogonia (in this phase, the images are not sufficiently detailed to distinguish germ-line and somatic cell expression),
- a revised column to describe the germ-line gene expression information, generally in the form of This gene is expressed in POSITION_NAME gets DESCRIPTIVE_SIGNAL to ANOTHER_POSITION_NAME (, persists to YET_ANOTHER_POSITION_NAME)
- a new column to describe somatic cell gene expression information that maybe reflected by the image.
- a strain tree (what does this mean? - #g) to describe the images
The plan for the next 1-2 weeks:
- build a parser to analyze and parse the new spreadsheet
- experiment with searching for images using the new subject tree
- finish uploading all the images to the EPrints repository
- give an internal demo to Graham and David by the end of the work
- give a demo to Helen's group.
EPrints Surgery in Southampton on 25/07/2007
Exchange with Serpent's experience
- practice in the FlyTED project
- image deposit
- describing images with domain specific metadata, which is kept in the Excel spreadsheet created by the scientists when they conduct their experiments
- uploading per-image based, and allowing scientists to provide a few key domain-specific descriptions about each image, e.g. its gene expression, strain, etc.
- bulk-uploading the images and their metadata created during the experiments in one week or one month, with the metadata captured in their routine Excel spreadsheets
- image retrieval
- search for images by a few domain-specific only metadata fields, such as gene id, etc.
- image deposit
- practice in the Serpent project
- image deposit
- interesting images are emailed by the project partners to the central Serpent administration team
- the Serpent team then analyzes the images and then upload them into repository
- prior the setting of the Serpent EPrints, the project has built an Access database, which was migrated to the EPrints and was not used any more
- now, whenever an image needs to be deposit, it is directly deposited into Serpent EPrints through the repository interface
- image search
- both DC metadata and domain-specific metadata are used for retrieving images
- image deposit
Discussions of technical problems
Customizing the image upload process
- to support uploading images together with their CSV files
- write an import plug-in to parse the csv file.
- upload a zip file, use Perl script to unzip and read the csv files (see DataObj/Document.pm)
- do a Test run to test whether there is any problem with the file.
- these images are then put into the user area and they need to be deposited, which can be done
- either by the users,
- or by administrators, through the import command provided in the bin/
Documentations about EPrints
Please see:
Other problems discussed
- thumnails problems
- Plugin/Convert/ImageMagick/ThumbnailImages.pm: setting the convert parameters
- Eprint.pm, icon_url(size=>"medium")
- visibility control of records, in order to control the set of records that Helen's group is ready to publish at the moment.:
- it is feasible to control the visibility of images, but
- it is hard to control the visibility of metadata. Although we can hide the metadata in the browsing interface, EPrints does not support hiding the metadata in the searching interface. The EPrints team is working on it at the moment.
- conclusion: it is safer to set up two repositories, one public and one private to solve the requirement at the moment.
- search for images using wildcards
- this is not support in EPrints, but
- a similar functionality can be achieved by configuring MetaField::Text, see manual documents above.
- See also: http://www.eprints.org/tech.php/6749.html
FlyTED prototype user feedback on 17/08/2007
Front page
- David: layout of the header images, give a full repository name
- David: all the species names should be italic
- David & Helen: copyright description at the footer
- Helen: more explicit searching tool bar (similar to Google) on the front page
- David/Helen: create a separate page to describe the methods used to create the data and the credibility of the images (contact Helen)
- David/Helen: get a logo from the FlyAtlas project
Images and the Abstract page
- David:
- change the thumbnails: medium as 128*96, preview as 512*384
- re-layout the abstract page
- put the preview image on the abstract pages
- Helen:
- each gene expression description should be linked with a help and a related link page. We need a set of images for each expression pattern and some description texts of these images (contact Liz)
- she was puzzled with the all of and any of
The future of the image migration
Helen:
- it is going to be batch uploading, ~20 genes
- allow Liz and then Helen to flat whether they want the images to be public or not
Test framework for image deposition scripts
Python test
Test the parsing of expression pattern keywords, using the germline column of the spreadsheet.
After the test succeed, update the python parsing code, for any changes to the patterns.
Perl test
- test that the perl script can deal with any new expression keyword patterns (testExpression.pl)
- test that the perl script can deal with any new strain name patterns (processStrainName.pl)
- test that the perl script can deal with any new somatic expression patterns (processSomaticExpression.pl)
Update the batch_upload.pl code wherever needed.

