I already responded directly to Frank, but then thought of some things I'd like to add, and figured others might have some discussion points. Particularly I'm interested in thoughts about protecting emails, etc from crawlers. I will say that I didn't do anything to protect instructor emails on CE's course listings until today, primarily because the information is publicly available in the first place. Instructor emails also available in the course catalogue, and the websites of several colleges. However, that doesn't mean there's nothing I can do to prevent crawlers from grabbing contact information.

What I did was output the base64 of the mailto url instead of the mailto url itself into a link's href attribute. Additionally, I used CSS to hide email address container span. It then gets decoded and unhidden using javascript clientside. If you don't have javascript enabled, then you don't see an email address.

Another alternative would be using something like person (at) example (dot) com, but if I were to design a crawler with the intent of making it grab email addresses, I'd have already started looking for that.

What about instructors phone and fax numbers, as well as office addresses? These are also listed on many pages throughout different parts of UVM's website. What should my concerns be?

On 08/18/2011 03:09 PM, Francis Swasey wrote:
On 8/18/11 10:42 AM, Jacob Beauregard wrote:
* The labels aren't always intuitive. E.g. EduCause Affiliations, what is EduCause?
http://www.educause.edu ... "EduCause is a nonprofit association whose mission is to advance
higher education by promoting the intelligent use of information technology."  As a group, they
define a list of Affiliations with defined meanings that we use to enable our Federated Logins
to provide services to broad classes of affiliates.  If you want a much finer grained list of
UVM specific affiliations, check out the uvmEduAffiliation attribute.
This is interesting to know. I wasn't really asking what EduCause is, but using it as an example of something someone wouldn't be able to intuit. I'm not particularly well-versed in LDAP schemas, and particularly the process of registering them. I remember when I was first reading up on LDAP, I hit a wall when it came to how oids get assigned.
The original (and I believe still stated) purpose of the UVM Directory was to be an online
white pages.  It has replaced the UVM paper phone book that was published every year.  It also
replaced the older CSO and ph commands (well, actually, we implemented a ph command replacement
on the zoo.uvm.edu systems that performs its searches against ldap.uvm.edu).  If you don't know
what CSO and ph are, check out http://www.faqs.org/faqs/ph-faq/ .
It serves its purpose in that sense. It would be convenient if it could be linked to for general contact information; more like a simple profile page than a phone book. It would also better centralize efforts of addressing security concerns like the ones you mention below.
 From this reasoning, I've decided I should probably query data from LDAP and present it on my
own. However, I'm wondering what the preferred way to go about that is. Since I can't pull
everyone at once, I'm pretty much limited to pulling by netid. If multiple query overhead is
a concern, I could probably combine multiple netids into a single query.
Doing your own LDAP searches is probably best if you have a specific format you want to display
things in.  I do hope you are thinking about things like the security necessary to not put
people's names, addresses (physical and email), and phone numbers within easy reach of spiders
crawling the web so that they become spammer fodder.
Speaking of keeping data away from crawlers, what approach do you think would work best? I would think to encode the data in some way during output, and using javascript to decode it when the page loads.
Personally, I'm leaning toward option 3, but wouldn't mind input/feedback on my thought
process (hence my posting to the it-discuss list).
Whether you cache locally or go against ldap.uvm.edu each time you need to is going to boil
down to a performance measurement.  Which is faster - asking ldap.uvm.edu or looking in your
cache?   How you structure the LDAP search filter (you could find an attribute that is not
indexed) will affect how fast ldap.uvm.edu will respond to your search.
Aside from performance, there's the value of immediately reflecting new/updated data.
The servers that make up ldap.uvm.edu each process between 2 and 3 million operations per day,
and we know they are capable of at least double that load without falling over.

I was just thinking, hey, so it must take between 2 and 3 MIPS CPUs to keep it running! I also just got to imagine a sentient server rack knocking itself over. Thanks for the imagery! I think if I can hammer it with a hundred or so queries in a batch without it raising any problems, I might as well.