I already responded directly to
Frank, but then thought of some things I'd like to add, and
figured others might have some discussion points. Particularly I'm
interested in thoughts about protecting emails, etc from crawlers.
I will say that I didn't do anything to protect instructor emails
on CE's course listings until today, primarily because the
information is publicly available in the first place. Instructor
emails also available in the course catalogue, and the websites of
several colleges. However, that doesn't mean there's nothing I can
do to prevent crawlers from grabbing contact information.
What I did was output the base64 of the mailto url instead of the
mailto url itself into a link's href attribute. Additionally, I
used CSS to hide email address container span. It then gets
Another alternative would be using something like person (at)
example (dot) com, but if I were to design a crawler with the
intent of making it grab email addresses, I'd have already started
looking for that.
What about instructors phone and fax numbers, as well as office
addresses? These are also listed on many pages throughout
different parts of UVM's website. What should my concerns be?
On 08/18/2011 03:09 PM, Francis Swasey wrote:
10:42 AM, Jacob Beauregard wrote:
labels aren't always intuitive. E.g. EduCause Affiliations,
what is EduCause?
... "EduCause is a nonprofit association whose mission is to
higher education by promoting the intelligent use of information
technology." As a group, they
define a list of Affiliations with defined meanings that we use
to enable our Federated Logins
to provide services to broad classes of affiliates. If you want
a much finer grained list of
UVM specific affiliations, check out the uvmEduAffiliation
This is interesting to know. I wasn't really asking what EduCause
is, but using it as an example of something someone wouldn't be
able to intuit. I'm not particularly well-versed in LDAP schemas,
and particularly the process of registering them. I remember when
I was first reading up on LDAP, I hit a wall when it came to how
oids get assigned.
(and I believe still stated) purpose of the UVM Directory was to
be an online
white pages. It has replaced the UVM paper phone book that was
published every year. It also
replaced the older CSO and ph commands (well, actually, we
implemented a ph command replacement
on the zoo.uvm.edu systems that performs its searches against
ldap.uvm.edu). If you don't know
what CSO and ph are, check out http://www.faqs.org/faqs/ph-faq/
It serves its purpose in that sense. It would be convenient if it
could be linked to for general contact information; more like a
simple profile page than a phone book. It would also better
centralize efforts of addressing security concerns like the ones
you mention below.
reasoning, I've decided I should probably query data from LDAP
and present it on my
Doing your own LDAP searches is probably best if you have a
specific format you want to display
own. However, I'm wondering what the preferred way to go about
that is. Since I can't pull
everyone at once, I'm pretty much limited to pulling by netid.
If multiple query overhead is
a concern, I could probably combine multiple netids into a
things in. I do hope you are thinking about things like the
security necessary to not put
people's names, addresses (physical and email), and phone
numbers within easy reach of spiders
crawling the web so that they become spammer fodder.
Speaking of keeping data away from crawlers, what approach do you
think would work best? I would think to encode the data in some
I'm leaning toward option 3, but wouldn't mind input/feedback
on my thought
Whether you cache locally or go against ldap.uvm.edu each time
you need to is going to boil
process (hence my posting to the it-discuss list).
down to a performance measurement. Which is faster - asking
ldap.uvm.edu or looking in your
cache? How you structure the LDAP search filter (you could
find an attribute that is not
indexed) will affect how fast ldap.uvm.edu will respond to your
Aside from performance, there's the value of immediately
reflecting new/updated data.
that make up ldap.uvm.edu each process between 2 and 3 million
operations per day,
and we know they are capable of at least double that load
without falling over.
I was just thinking, hey, so it must take between 2 and 3 MIPS
CPUs to keep it running! I also just got to imagine a sentient
server rack knocking itself over. Thanks for the imagery! I think
if I can hammer it with a hundred or so queries in a batch without
it raising any problems, I might as well.